Workshop on “Creating maps with Stata and thematic applications” by Ben Jann, Hermosillo, October 24–25, 2023
In this exercise we want to look at crime data by municipality. Information on crimes by municipality can be found at https://www.gob.mx/sesnsp/acciones-y-programas/datos-abiertos-de-incidencia-delictiva. Download "Cifras de Incidencia Delictiva Municipal, 2015 - septiembre 2023" (100mb zipfile) and extract the file. We will work with file "Municipal-Delitos-2015-2023_sep2023.csv".
Furthermore, we need shape files for municipalities. For example, go to https://www.gits.igg.unam.mx/idea/descarga and download the SHP variants of "Censo 2010 (Estatal)" and "Censo 2010 (Municipal)" under "Datos básicos". The shape files are high definition (many data points) and they also contain various statistics from the 2010 census such as population size (for details see the "Metadato PDF" documentation, i.e. file "inegi_refcenesta_2010.pdf"). An alternative would be to obtain the shape files from https://data.humdata.org/dataset/cod-ab-mex (file "mex_admbnda_govmex_20210618_SHP.zip"). These shape files are very similar, but they do not contain the extra statistical information.
1. Convert the downloaded shape files for states and municipalities to Stata format.
. geoframe convert Estatal using "data/Shapefile - Censo 2010 (Estatal)", replace (translating inegi_refcenesta_2010 from directory data/Shapefile - Censo 2010 (Estatal)) (importing .shp file) (importing .dbf file) (creating _ID spatial-unit id) (creating _CX coordinate) (creating _CY coordinate) file Estatal_shp.dta created file Estatal.dta created (type geoframe create Estatal to load the data) . geoframe convert Municipal using "data/Shapefile - Censo 2010 (Municipal)", replace (translating inegi_refcenmuni_2010 from directory data/Shapefile - Censo 2010 (Municipal)) (importing .shp file) (importing .dbf file) (creating _ID spatial-unit id) (creating _CX coordinate) (creating _CY coordinate) file Municipal_shp.dta created file Municipal.dta created (type geoframe create Municipal to load the data)
2. Compute homicide counts by municipality for 2015–2022 from file "Municipal-Delitos-2015-2023_sep2023.csv".
The procedure is about as follows
import delimited
rowtotal()
of command egen
).
collapse
.
. // Step 1 . import delimited data/Municipal-Delitos-2015-2023_sep2023.csv, clear (encoding automatically selected: ISO-8859-1) (21 vars, 2,075,738 obs) . // Step 2 . drop if año==2023 (243,334 observations deleted) . keep if inlist(tipodedelito,"Homicidio","Feminicidio") (1,589,330 observations deleted) . // Step 3 . egen Homicides = rowtotal(enero - diciembre) . compress Homicides variable Homicides was float now int (486,148 bytes saved) . // Step 4 . collapse (sum) Homicides, by(clave_ent entidad cvemunicipio municipio) . isid cvemunicipio // just to be sure, confirm that municipalities are unique . label variable Homicides "Number of homicides and femicides in 2015-2022" . // Step 5 . saveold Homicides, replace (saving in Stata 13 format) (FYI, saveold has options version(12) and version(11) that write files in older Stata formats) file Homicides.dta saved
3. Load the shape files for states and municipalities. Add the homicide count
to the municipalities data using geoframe copy
.
You will need to make sure that the municipality IDs have the same format before
applying geoframe copy
.
. geoframe create Estatal, replace (reading shapes from Estatal_shp.dta) (all observations in frame Estatal_shp matched) (link to frame Estatal_shp added) (current frame now Estatal) Frame name: Estatal Frame type: unit Feature type: <none> Number of obs: 32 Unit ID: _ID Coordinates: _CX _CY Area: <none> Linked shape frame: Estatal_shp . geoframe create Municipal, replace (reading shapes from Municipal_shp.dta) (all observations in frame Municipal_shp matched) (link to frame Municipal_shp added) (current frame now Municipal) Frame name: Municipal Frame type: unit Feature type: <none> Number of obs: 2,456 Unit ID: _ID Coordinates: _CX _CY Area: <none> Linked shape frame: Municipal_shp . destring cve_umun, replace cve_umun: all characters numeric; replaced as int . geoframe copy default Homicides, id(cve_umun cvemunicipio) (all units matched) (1 variable copied from frame default)
The shape data is very high-definition (with 659,531 observations for states and
3,283,138 observations for municipalities). This will put a lot of burden
on the computer's graphics system and will greatly inflate the
size of graph files (depending on export format). I thus apply
geoframe simplify
to reduce the complexity of the data (i.e.
remove data points from the polygons). The default degree of simplification applied
by geoframe simplify
is chosen such that there should not be much effect
on the visual appearance of the map, as long as the whole map is plotted.
. frame Estatal: geoframe simplify (threshold = .0000762) (simplifying shapes of 32 units) (0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%) (dropped 639,070 observations in frame Estatal_shp) (dropped 0 observations in frame Estatal) . frame Municipal: geoframe simplify (threshold = .0000793) (simplifying shapes of 2456 units) (0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%) (dropped 3148483 observations in frame Municipal_shp) (dropped 0 observations in frame Municipal)
Note how many points have been removed: 639,070 of 659,531 points for states; 3,148,483 of 3,283,138 points for municipalities!
Disclaimer: Applying geoframe simplify
in the above way does not ensure
that shared borders among municipalities or shared borders between municipalities
and states remain fully consistent (i.e., different points might be dropped for each
state or municipality). geoframe simplify
has an option to apply
a more refined procedure to fix this issue, but the corresponding
algorithm is slow on such large data. Since in the examples below the difference
would not be noticeable, I stick to the less precise default method, which is
much faster. However, if you
want to simplify shape files for general purposes, then you should apply
option jointly
) to select the more refined algorithm (and let
the computer run over night; to simplify the shapes of states and municipalities jointly
you would, in addition, have to stack the data from the two shape files).
4. Now generate a map in which you color municipalities by the number of homicides. Also include state borders as an extra layer.
. geoplot (area Municipal Homicides) (area Estatal)
5. Try to improve the map by using an appropriate color scheme (option
color()
) and by increasing the number of color levels
(option levels()
). In addition, use suboption quantile
in levels()
to categorize homicide rates by quantiles rather
then by a regular grid. Maybe also add a title to the graph.
. geoplot (area Municipal Homicides, levels(15, quantile) color(scico lajolla)) ///
> (area Estatal), subtitle("Number of homicides 2015-2022")
6. Compute homicides rates per 100'000 inhabitants and redraw the graph.
The number of inhabitants from the census 2010 can be found in variable
p_total
. Using this variable to compute rates is, of course, not
ideal since population counts have changed since 2010. It would be better
to use more recent population statistics.
. generate double hrate = Homicides/8 / (p_total/100000) . format %9.0f hrate . geoplot (area Municipal hrate, levels(15, quantile) color(scico lajolla)) /// > (area Estatal), subtitle("Avg. homicide rate 2015-2022 (per 100'000)")
7. Now compute the number of homicides and the homicide rate at the level of the state and draw a corresponding map.
Use command collapse
to aggregate the homicide data by state.
. frame change default . collapse (sum) Homicides, by(clave_ent entidad) . isid clave_ent // just to be sure, confirm that municipalities are unique . label variable Homicides "Number of homicides and femicides in 2022" . frame change Estatal . destring cve_ent, replace cve_ent: all characters numeric; replaced as byte . geoframe copy default Homicides, id(cve_ent clave_ent) (all units matched) (1 variable copied from frame default) . geoplot (area Estatal Homicides, levels(15, quantile) color(scico lajolla)) /// > (area Estatal), clegend zlabel(#10) subtitle("Number of homicides 2015-2022")
. generate double hrate = Homicides / p_total * 100000 . format %9.0f hrate . geoplot (area Estatal hrate, levels(15, quantile) color(scico lajolla)) /// > (area Estatal), clegend zlabel(#10) /// > subtitle("Avg. homicide rate 2015-2022 (per 100'000)")