Challenge 3: Choropleth maps II

Workshop on “Creating maps with Stata and thematic applications” by Ben Jann, Hermosillo, October 24–25, 2023

In this exercise we want to look at crime data by municipality. Information on crimes by municipality can be found at https://www.gob.mx/sesnsp/acciones-y-programas/datos-abiertos-de-incidencia-delictiva. Download "Cifras de Incidencia Delictiva Municipal, 2015 - septiembre 2023" (100mb zipfile) and extract the file. We will work with file "Municipal-Delitos-2015-2023_sep2023.csv".

Furthermore, we need shape files for municipalities. For example, go to https://www.gits.igg.unam.mx/idea/descarga and download the SHP variants of "Censo 2010 (Estatal)" and "Censo 2010 (Municipal)" under "Datos básicos". The shape files are high definition (many data points) and they also contain various statistics from the 2010 census such as population size (for details see the "Metadato PDF" documentation, i.e. file "inegi_refcenesta_2010.pdf"). An alternative would be to obtain the shape files from https://data.humdata.org/dataset/cod-ab-mex (file "mex_admbnda_govmex_20210618_SHP.zip"). These shape files are very similar, but they do not contain the extra statistical information.

1. Convert the downloaded shape files for states and municipalities to Stata format.

. geoframe convert Estatal using "data/Shapefile - Censo 2010 (Estatal)", replace
(translating inegi_refcenesta_2010 from directory data/Shapefile - Censo 2010 (Estatal))

  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file Estatal_shp.dta created
  file Estatal.dta     created

  (type geoframe create Estatal to load the data)

. geoframe convert Municipal using "data/Shapefile - Censo 2010 (Municipal)", replace
(translating inegi_refcenmuni_2010 from directory data/Shapefile - Censo 2010 (Municipal))

  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file Municipal_shp.dta created
  file Municipal.dta     created

  (type geoframe create Municipal to load the data)

2. Compute homicide counts by municipality for 2015–2022 from file "Municipal-Delitos-2015-2023_sep2023.csv".

The procedure is about as follows

  1. Import the CSV file using import delimited
  2. Drop all rows for year 2023 and drop all rows about delicts other than homicides. That is, select "Tipo de delito" equal to "Homicidio" or "Feminicidio",
  3. Compute the sum in each row across months to obtain the yearly total (e.g. using function rowtotal() of command egen).
  4. Aggregate the data by municipality using command collapse.
  5. Possibly save the data for later use.
. // Step 1
. import delimited data/Municipal-Delitos-2015-2023_sep2023.csv, clear
(encoding automatically selected: ISO-8859-1)
(21 vars, 2,075,738 obs)

. // Step 2
. drop if año==2023
(243,334 observations deleted)

. keep if inlist(tipodedelito,"Homicidio","Feminicidio")
(1,589,330 observations deleted)

. // Step 3
. egen Homicides = rowtotal(enero - diciembre)

. compress Homicides
  variable Homicides was float now int
  (486,148 bytes saved)

. // Step 4
. collapse (sum) Homicides, by(clave_ent entidad cvemunicipio municipio)

. isid cvemunicipio // just to be sure, confirm that municipalities are unique

. label variable Homicides "Number of homicides and femicides in 2015-2022"

. // Step 5
. saveold Homicides, replace
(saving in Stata 13 format)
(FYI, saveold has options version(12) and version(11) that write files in older Stata formats)
file Homicides.dta saved

3. Load the shape files for states and municipalities. Add the homicide count to the municipalities data using geoframe copy.

You will need to make sure that the municipality IDs have the same format before applying geoframe copy.

. geoframe create Estatal, replace
(reading shapes from Estatal_shp.dta)
(all observations in frame Estatal_shp matched)
(link to frame Estatal_shp added)
(current frame now Estatal)

            Frame name: Estatal
            Frame type: unit
          Feature type: <none>
         Number of obs: 32
               Unit ID: _ID
           Coordinates: _CX _CY
                  Area: <none>
    Linked shape frame: Estatal_shp

. geoframe create Municipal, replace
(reading shapes from Municipal_shp.dta)
(all observations in frame Municipal_shp matched)
(link to frame Municipal_shp added)
(current frame now Municipal)

            Frame name: Municipal
            Frame type: unit
          Feature type: <none>
         Number of obs: 2,456
               Unit ID: _ID
           Coordinates: _CX _CY
                  Area: <none>
    Linked shape frame: Municipal_shp

. destring cve_umun, replace
cve_umun: all characters numeric; replaced as int

. geoframe copy default Homicides, id(cve_umun cvemunicipio)
(all units matched)
(1 variable copied from frame default)

The shape data is very high-definition (with 659,531 observations for states and 3,283,138 observations for municipalities). This will put a lot of burden on the computer's graphics system and will greatly inflate the size of graph files (depending on export format). I thus apply geoframe simplify to reduce the complexity of the data (i.e. remove data points from the polygons). The default degree of simplification applied by geoframe simplify is chosen such that there should not be much effect on the visual appearance of the map, as long as the whole map is plotted.

. frame Estatal: geoframe simplify
(threshold = .0000762)
(simplifying shapes of 32 units)
(0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%)
(dropped 639,070 observations in frame Estatal_shp)
(dropped 0 observations in frame Estatal)

. frame Municipal: geoframe simplify
(threshold = .0000793)
(simplifying shapes of 2456 units)
(0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%)
(dropped 3148483 observations in frame Municipal_shp)
(dropped 0 observations in frame Municipal)

Note how many points have been removed: 639,070 of 659,531 points for states; 3,148,483 of 3,283,138 points for municipalities!

Disclaimer: Applying geoframe simplify in the above way does not ensure that shared borders among municipalities or shared borders between municipalities and states remain fully consistent (i.e., different points might be dropped for each state or municipality). geoframe simplify has an option to apply a more refined procedure to fix this issue, but the corresponding algorithm is slow on such large data. Since in the examples below the difference would not be noticeable, I stick to the less precise default method, which is much faster. However, if you want to simplify shape files for general purposes, then you should apply option jointly) to select the more refined algorithm (and let the computer run over night; to simplify the shapes of states and municipalities jointly you would, in addition, have to stack the data from the two shape files).

4. Now generate a map in which you color municipalities by the number of homicides. Also include state borders as an extra layer.

. geoplot (area Municipal Homicides) (area Estatal)
Challenge-3-solution_5.png

5. Try to improve the map by using an appropriate color scheme (option color()) and by increasing the number of color levels (option levels()). In addition, use suboption quantile in levels() to categorize homicide rates by quantiles rather then by a regular grid. Maybe also add a title to the graph.

. geoplot (area Municipal Homicides, levels(15, quantile) color(scico lajolla)) ///
>     (area Estatal), subtitle("Number of homicides 2015-2022")
Challenge-3-solution_6.png

6. Compute homicides rates per 100'000 inhabitants and redraw the graph.

The number of inhabitants from the census 2010 can be found in variable p_total. Using this variable to compute rates is, of course, not ideal since population counts have changed since 2010. It would be better to use more recent population statistics.

. generate double hrate = Homicides/8 / (p_total/100000) 

. format %9.0f hrate

. geoplot (area Municipal hrate, levels(15, quantile) color(scico lajolla)) ///
>     (area Estatal), subtitle("Avg. homicide rate 2015-2022 (per 100'000)")
Challenge-3-solution_7.png

7. Now compute the number of homicides and the homicide rate at the level of the state and draw a corresponding map.

Use command collapse to aggregate the homicide data by state.

. frame change default

. collapse (sum) Homicides, by(clave_ent entidad)

. isid clave_ent // just to be sure, confirm that municipalities are unique

. label variable Homicides "Number of homicides and femicides in 2022"

. frame change Estatal

. destring cve_ent, replace
cve_ent: all characters numeric; replaced as byte

. geoframe copy default Homicides, id(cve_ent clave_ent)
(all units matched)
(1 variable copied from frame default)

. geoplot (area Estatal Homicides, levels(15, quantile) color(scico lajolla)) ///
>     (area Estatal), clegend zlabel(#10) subtitle("Number of homicides 2015-2022")
Challenge-3-solution_8.png
. generate double hrate = Homicides / p_total * 100000

. format %9.0f hrate

. geoplot (area Estatal hrate, levels(15, quantile) color(scico lajolla)) ///
>     (area Estatal), clegend zlabel(#10) ///
>     subtitle("Avg. homicide rate 2015-2022 (per 100'000)") 
Challenge-3-solution_9.png