Challenge 2: Choropleth maps I

Workshop on “Creating maps with Stata and thematic applications” by Ben Jann, Hermosillo, October 24–25, 2023

In this challenge we will plot some data from the Encuesta Nacional de Salud y Nutrición (ENSANUT) 2018 (see https://ensanut.insp.mx/). I prepared a little aggregate dataset by state containing information on alcohol consumption, smoking, and diabetes prevalence (ensanut18.dta).

0. Load the data on Mexican states, rivers, and lakes from Challenge 1.

. geoframe create ../Challenge-1/States, replace
(reading shapes from ../Challenge-1/States_shp.dta)
(all observations in frame States_shp matched)
(link to frame States_shp added)
(current frame now States)

            Frame name: States
            Frame type: unit
          Feature type: <none>
         Number of obs: 32
               Unit ID: _ID
           Coordinates: _CX _CY
                  Area: <none>
    Linked shape frame: States_shp

. geoframe create ../Challenge-1/Rivers, replace feature(water)
(reading shapes from ../Challenge-1/Rivers_shp.dta)
(all observations in frame Rivers_shp matched)
(link to frame Rivers_shp added)
(current frame now Rivers)

            Frame name: Rivers
            Frame type: unit
          Feature type: water
         Number of obs: 30
               Unit ID: _ID
           Coordinates: _CX _CY
                  Area: <none>
    Linked shape frame: Rivers_shp

. geoframe create ../Challenge-1/Lakes, replace feature(water)
(reading shapes from ../Challenge-1/Lakes_shp.dta)
(all observations in frame Lakes_shp matched)
(link to frame Lakes_shp added)
(current frame now Lakes)

            Frame name: Lakes
            Frame type: unit
          Feature type: water
         Number of obs: 3
               Unit ID: _ID
           Coordinates: _CX _CY
                  Area: <none>
    Linked shape frame: Lakes_shp

1. Now load the ENSANUT data into an additional frame and copy the variables into the states frame.

Hint: You will need to make sure that the state codes have the same format in the two datasets. You will also need to fix the problem that the state codes in the shape data from www.efrainmaps.es are inconsistent because they use a wrong sort order ("ch" should be treated as a single letter placed after "c" in the sort order; in particular, you need to apply the following changes: code MX05 rather than MX07 for "Coahuila", code MX06 rather than MX08 for "Colima", code MX07 rather than MX05 for "Chiapas", code MX08 rather than MX06 for "Chihuahua").

Fix the state codes in frame States.

. frame change States

. replace CODE = "MX05" if NAME=="Coahuila"
(1 real change made)

. replace CODE = "MX06" if NAME=="Colima"
(1 real change made)

. replace CODE = "MX07" if NAME=="Chiapas"
(1 real change made)

. replace CODE = "MX08" if NAME=="Chihuahua"
(1 real change made)

Load ENSANUT data into the default frame and add prefix "MX" to the state codes.

. frame change default

. use ensanut18, clear
(Aggregat statistics from ENSANUT 2018, adult population only (20 or older))

. replace ENT = "MX" + ENT
variable ENT was str2 now str4
(32 real changes made)

Copy ENSANUT variables to frame States using a 1:1 merge; also adjust display format for the variables.

. frame change States

. geoframe copy default *, id(CODE ENT)
(all units matched)
(4 variables copied from frame default)

. format %9.1f alcohol tabaco* diabetes

2. Create a choropleth map illustrating prevalence of diabetes by state (using default settings). Also include lakes and rivers in the map.

. geoplot (area States diabetes) (area Lakes) (line Rivers)
Challenge-2-solution_5.png

3. Play around with options to tune the graph a bit.

For example, use color() to select a more appropriate color scheme (see help colorpalette for available palettes; maybe a "heat" scheme would be good), use levels() to set the number of color levels or use cuts() to set custom cut points, use labels() to determine how the levels will be labeled in the legend, use global option legend() to determine the position and appearance of the legend.

Here is a variant with some improvements.

. geoplot ///
>     (area States diabetes, levels(10) color(hcl heat, reverse) label(@lab)) ///
>     (area Lakes) (line Rivers) ///
>     , legend(position(sw) horizontal outside bmargin(t=2) region(color(gs15)))
Challenge-2-solution_6.png

Note that state outlines will be omitted as soon as you add fill color to areas (to be precise, color() will set both the color for the fill and for the outline, but the width of the outline will also be set zero by default so that lines of different color are not printed on top of each other at shared borders). You can turn the outlines back on by specifying option lcolor().

. geoplot ///
>     (area States diabetes, lcolor(gray) ///
>         levels(10) color(hcl heat, reverse) label(@lab)) ///
>     (area Lakes) (line Rivers) ///
>     , legend(position(sw) horizontal outside bmargin(t=2) region(color(gs15)))
Challenge-2-solution_7.png

This also adds outlines to the legend keys. If you do not want that, add the outlines as an additional layer.

. geoplot ///
>     (area States diabetes, levels(10) color(hcl heat, reverse) label(@lab)) ///
>     (area States) (area Lakes) (line Rivers) ///
>     , legend(position(sw) horizontal outside bmargin(t=2) region(color(gs15)))
Challenge-2-solution_8.png

4. Use suboption quantile in levels() to categorize diabetes prevalence based on quantiles across states and explore global option clegend() to generate a "continuous" legend (requires Stata 18).

. geoplot (area States diabetes, levels(15, quantile) color(hcl heat, reverse) ///
>     lcolor(gray)) (area Lakes) (line Rivers) ///
>     , clegend zlabel(#10)
Challenge-2-solution_9.png

5. Now also generate plots for alcohol and tobacco consumption and use graph combine to combine the plots into a single graph.

. geoplot (area States diabetes, levels(15, quantile) color(hcl heat, reverse) ///
>     lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) ///
>     subtitle("Diabetes prevalence") name(a, replace) nodraw

. geoplot (area States alcohol, levels(15, quantile) color(hcl heat, reverse) ///
>     lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) ///
>     subtitle("Alcohol (daily or weekly)") name(b, replace) nodraw

. geoplot (area States tabaco_daily, levels(15, quantile) color(hcl heat, reverse) ///
>     lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) ///
>     subtitle("Tobacco (daily)") name(c, replace) nodraw

. geoplot (area States tabaco_any, levels(15, quantile) color(hcl heat, reverse) ///
>     lcolor(gray)) (area Lakes) (line Rivers), clegend(height(30)) zlabel(#8) ///
>     subtitle("Tobacco (any)") name(d, replace) nodraw

. graph combine a b c d, colfirst imargin(zero) graphregion(margin(zero))
Challenge-2-solution_10.png