Workshop on “Creating maps with Stata and thematic applications” by Ben Jann, Hermosillo, October 24–25, 2023
In this challenge we will plot some data from the Encuesta Nacional de Salud
y Nutrición (ENSANUT) 2018 (see https://ensanut.insp.mx/). I prepared
a little aggregate dataset by state containing information on alcohol consumption,
smoking, and diabetes prevalence (ensanut18.dta
).
0. Load the data on Mexican states, rivers, and lakes from Challenge 1.
. geoframe create ../Challenge-1/States, replace (reading shapes from ../Challenge-1/States_shp.dta) (all observations in frame States_shp matched) (link to frame States_shp added) (current frame now States) Frame name: States Frame type: unit Feature type: <none> Number of obs: 32 Unit ID: _ID Coordinates: _CX _CY Area: <none> Linked shape frame: States_shp . geoframe create ../Challenge-1/Rivers, replace feature(water) (reading shapes from ../Challenge-1/Rivers_shp.dta) (all observations in frame Rivers_shp matched) (link to frame Rivers_shp added) (current frame now Rivers) Frame name: Rivers Frame type: unit Feature type: water Number of obs: 30 Unit ID: _ID Coordinates: _CX _CY Area: <none> Linked shape frame: Rivers_shp . geoframe create ../Challenge-1/Lakes, replace feature(water) (reading shapes from ../Challenge-1/Lakes_shp.dta) (all observations in frame Lakes_shp matched) (link to frame Lakes_shp added) (current frame now Lakes) Frame name: Lakes Frame type: unit Feature type: water Number of obs: 3 Unit ID: _ID Coordinates: _CX _CY Area: <none> Linked shape frame: Lakes_shp
1. Now load the ENSANUT data into an additional frame and copy the variables into the states frame.
Hint: You will need to make sure that the state codes have the same format in the two datasets. You will also need to fix the problem that the state codes in the shape data from www.efrainmaps.es are inconsistent because they use a wrong sort order ("ch" should be treated as a single letter placed after "c" in the sort order; in particular, you need to apply the following changes: code MX05 rather than MX07 for "Coahuila", code MX06 rather than MX08 for "Colima", code MX07 rather than MX05 for "Chiapas", code MX08 rather than MX06 for "Chihuahua").
Fix the state codes in frame States
.
. frame change States . replace CODE = "MX05" if NAME=="Coahuila" (1 real change made) . replace CODE = "MX06" if NAME=="Colima" (1 real change made) . replace CODE = "MX07" if NAME=="Chiapas" (1 real change made) . replace CODE = "MX08" if NAME=="Chihuahua" (1 real change made)
Load ENSANUT data into the default frame and add prefix "MX" to the state codes.
. frame change default . use ensanut18, clear (Aggregat statistics from ENSANUT 2018, adult population only (20 or older)) . replace ENT = "MX" + ENT variable ENT was str2 now str4 (32 real changes made)
Copy ENSANUT variables to frame States
using a 1:1 merge; also
adjust display format for the variables.
. frame change States . geoframe copy default *, id(CODE ENT) (all units matched) (4 variables copied from frame default) . format %9.1f alcohol tabaco* diabetes
2. Create a choropleth map illustrating prevalence of diabetes by state (using default settings). Also include lakes and rivers in the map.
. geoplot (area States diabetes) (area Lakes) (line Rivers)
3. Play around with options to tune the graph a bit.
For example, use
color()
to select a more appropriate color scheme (see
help colorpalette
for available palettes; maybe a "heat"
scheme would be good), use levels()
to set the number of color
levels or use cuts()
to set custom cut points, use
labels()
to determine how the levels will be labeled in the
legend, use global option legend()
to determine the position
and appearance of the legend.
Here is a variant with some improvements.
. geoplot ///
> (area States diabetes, levels(10) color(hcl heat, reverse) label(@lab)) ///
> (area Lakes) (line Rivers) ///
> , legend(position(sw) horizontal outside bmargin(t=2) region(color(gs15)))
Note that state outlines will be omitted as soon as you add fill color to
areas (to be precise, color()
will set both the color for the fill
and for the outline, but the width of the outline will also be set zero by
default so that lines of different color are not printed on top of each
other at shared borders). You can turn the outlines back on by specifying
option lcolor()
.
. geoplot ///
> (area States diabetes, lcolor(gray) ///
> levels(10) color(hcl heat, reverse) label(@lab)) ///
> (area Lakes) (line Rivers) ///
> , legend(position(sw) horizontal outside bmargin(t=2) region(color(gs15)))
This also adds outlines to the legend keys. If you do not want that, add the outlines as an additional layer.
. geoplot ///
> (area States diabetes, levels(10) color(hcl heat, reverse) label(@lab)) ///
> (area States) (area Lakes) (line Rivers) ///
> , legend(position(sw) horizontal outside bmargin(t=2) region(color(gs15)))
4. Use suboption quantile
in levels()
to
categorize diabetes prevalence based on quantiles across states and
explore global option clegend()
to generate a "continuous"
legend (requires Stata 18).
. geoplot (area States diabetes, levels(15, quantile) color(hcl heat, reverse) ///
> lcolor(gray)) (area Lakes) (line Rivers) ///
> , clegend zlabel(#10)
5. Now also generate plots for alcohol and tobacco consumption and
use graph combine
to combine the plots into a single graph.
. geoplot (area States diabetes, levels(15, quantile) color(hcl heat, reverse) /// > lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) /// > subtitle("Diabetes prevalence") name(a, replace) nodraw . geoplot (area States alcohol, levels(15, quantile) color(hcl heat, reverse) /// > lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) /// > subtitle("Alcohol (daily or weekly)") name(b, replace) nodraw . geoplot (area States tabaco_daily, levels(15, quantile) color(hcl heat, reverse) /// > lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) /// > subtitle("Tobacco (daily)") name(c, replace) nodraw . geoplot (area States tabaco_any, levels(15, quantile) color(hcl heat, reverse) /// > lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) /// > subtitle("Tobacco (any)") name(d, replace) nodraw . graph combine a b c d, colfirst imargin(zero) graphregion(margin(zero))