Stata绘图：绘制二维地图

我是张金康呀 2022-06-11 发布于江苏

展开全文

作者：陈卓然 (中山大学)
邮箱：chenzhr25@mail2.sysu.edu.cn

编者按：本文主要整理自下文，特此致谢！
Source：Asjad Naqvi, 2021, Stata graphs: Bi-variate maps. -Link1- -Link2- -Code-

温馨提示： 文中链接在微信中无法生效。请点击底部「阅读原文」。或直接长按/扫描如下二维码，直达原文：

1. 简介
2. 准备工作
3. 地图
4. 颜色
5. 图例
6. 中国地图
7. 相关推文

1. 简介

二维地图很复杂，但是并不在于他们有多难画，而在于我们往往难以预料到画完的结果。很多时候我们很难绘制想要的图，但是一个像下面这样非常漂亮的图，无疑可以为我们的论文增添不少光彩。下面就让我们看看如何绘制二维地图吧！

2. 准备工作

首先安装如下命令，并进行相关设置。

ssc install geo2xy,     replace  // 用来调整地图的映射
ssc install schemepack, replace  // Stata绘图模板仓库
set scheme white_tableau                  // 设置绘图模板
graph set window fontface 'Arial Narrow'  // 设置绘图字体

然后下载数据「CC-EST2020-ALLDATA6.csv」，并做一些数据清理的工作。

. import delim CC-EST2020-ALLDATA6.csv, clear
. tab year 
. keep if year==13   // 2020
. keep if agegrp==0  // TOTAL
. drop year agegrp
. keep sumlev state county stname ctyname tot_pop ba_male ba_female h_male h_female aa_male aa_female
. destring _all, replace
. ren state  STATEFP
. ren county COUNTYFP
. gen tot_afam  = ba_male  + ba_female  // african-american
. gen tot_hisp  = h_male   + h_female   // hispanic
. gen tot_asian = aa_male  + aa_female  // asian
. gen share_afam  = (tot_afam  / tot_pop) * 100
. gen share_hisp  = (tot_hisp  / tot_pop) * 100
. gen share_asian = (tot_asian / tot_pop) * 100
. compress
. save county_race.dta, replace

接着，下载州和县的 ShapeFiles，即「cb_2018_us_state_500k.zip」。现在，我们从边界开始画起。

. spshape2dta cb_2018_us_state_500k.shp, replace saving(usa_state)

以上命令语句是将 .shp 格式的文件转化为 Stata 认识的 .dta 格式的文件。

. use usa_state, clear
. spmap using usa_state_shp, id(_ID)

很不巧，阿拉斯加实在是太大的，为了后续作图的美观，我们将阿拉斯加和夏威夷群岛去掉，只留下美国本土。为了快速找出这几个不在美国本土的州，我们用下面这个巧妙的方法来得到这些岛屿的识别码。

. twoway (scatter _CY _CX, mlabel(STATEFP))

这样我们就找到了需要删掉掉的岛屿的识别码了，他们是 STATEFP = 02, 15, 60, 66, 69, 72, 78

. * clean up the boundaries 
. use usa_state_shp, clear
. merge m:1 _ID using usa_state  // to get the identifiers
. destring STATEFP, replace 
. drop if inlist(STATEFP,2,15,60,66,69,72,78)
. twoway (scatter _Y _X)

好像和我们平时看到的美国地图不太一样，我们可以采用 geo2xy 这个命令来将经纬度转换到笛卡尔坐标系当中。

. geo2xy _Y _X, proj(albers) replace
. twoway (scatter _Y _X)

看上去好多了！下面我们简单清理一下数据，然后保存起来。

. drop _CX- _merge 
. sort _ID shape_order
. save usa_state_shp_clean.dta, replace

让我们看下此时的地图长啥样。

. use usa_state, clear
. spmap using usa_state_shp_clean, id(_ID)

看上去很不错，下面我们再对不同的县做同样的处理，这里需要下载文件「cb_2018_us_county_500k.zip」。

. * get the counties in order 
. spshape2dta 'cb_2018_us_county_500k.shp', replace saving(usa_county)

. * clean up the boundaries 
. use usa_county_shp, clear
. merge m:1 _ID using usa_county
. destring STATEFP, replace 
. drop if inlist(STATEFP,2,15,60,66,69,72,78)
. drop _CX- _merge 
. geo2xy _Y _X, proj(albers) replace
. sort _ID shape_order
. save usa_county_shp_clean.dta, replace
 
. * 绘制地图
. use usa_county, clear
. destring _all, replace
. spmap using usa_county_shp_clean, id(_ID)

3. 地图

首先让我们把需要的数据全部合并到一份文件当中。

. use usa_county, clear
. destring _all, replace
. merge 1:1 STATEFP COUNTYFP using county_race
. keep if _m==3
. drop _m
. spmap share_afam using usa_county_shp_clean, id(_ID) clm(custom) clb(0(10)100) fcolor(Heat)  // 非裔美国人的比例

. spmap share_hisp using usa_county_shp_clean, id(_ID) clm(custom) clb(0(10)100)  fcolor(Heat)  // 拉美裔美国人比例

可以看到拉美裔和非裔美国人的分布存在显著差异。我们也可以画一个散点图来直观体现非裔美国人和拉美裔美国人之间的交叠部分。

. scatter share_afam share_hisp

可以发现二者几乎没有什么重叠区域。下面我们将这个散点图按种类分开。对于双变量的地图，我们需要分配 9 种或者 16 种颜色到不同的种类组合。但是再具体分配之前，我们需要做一件重要的事情就是弄清楚颜色分配的顺序。作者给出这样的颜色分配顺序：

接下来要做的是将数据按照顺序分为 9 组，分组的方式有很多，我们可以采用等分的方式或分位数分组的方式。首先让我们看一下如何采用等分的方式分组，作者给出了一种非常巧妙的方法。

. * 将数据进行切分
. cap drop cut*
. egen cut_afam = cut(share_afam), at(0,33,66,100) icodes
. egen cut_hisp = cut(share_hisp), at(0,33,66,100) icodes
. egen cut_hisp2 = cut(share_hisp), group(3) icodes // 这个就是分位数

. * 分组
. sort cut_afam cut_hisp
. egen grp_cut = group(cut_afam cut_hisp)

分位数分组数命令如下：

. * 分位数
. cap drop xtile*
. xtile xtile_afam = share_afam, n(3)
. xtile xtile_hisp = share_hisp, n(3)

. * 分组
. sort xtile_afam xtile_hisp
. egen grp_xtile = group(xtile_afam xtile_hisp)

下面我们来比较一下这两种分组的结果：

. twoway (scatter share_hisp share_afam, msize(small) mc(%10)), yline(0 33 66 100) ///
>         xline(0 33 66 100) xlabel(0(20)100) ylabel(0(20)100) aspect(1) 

. * get the cut-off points from tabstat
. tabstat share_hisp, by(xtile_hisp) c(stat) stat(max) 
. tabstat share_afam, by(xtile_afam) c(stat) stat(max)

. * based on percentiles
. twoway (scatter share_hisp share_afam, msize(small) mc(%10)), yline(0 3 7.4 100) ///
>         xline(0 1.2 6.5 100) xlabel(0(20)100) ylabel(0(20)100) aspect(1)

很明显，在等分法画出的图大约 80% 的点都落在左下角的那个方格当中，而在右上角的方格中却没有点，因此等分法的结果并不理想。相比之下以分位数法分组的结果就很理想，分位数的设置使得每一个方格中都有大约相等的点数。所以相比于等分法而言，分位数分组的结果可能会更加理想一些。下面我们检查一下两种分组的结果如何改变地图的格式。

首先是等分的结果：

. spmap grp_cut using usa_county_shp_clean, id(_ID) clm(unique) fcolor(Heat)

可以看出在等分法分组的情形下并没有什么特别之处，我们划分的 9 类中只有 6 类在图中体现出来了。而且在不同种类中划分地并不均匀。

其次是分位数的结果：

. spmap grp_xtile using usa_county_shp_clean, id(_ID) clm(unique) fcolor(Heat)

可以看到我们有 1-9 组，而且每一组都大致相等的点。然而有一个小缺陷就是颜色的深浅并不代表任何顺序。

4. 颜色

二维地图的颜色设计需要仔细考虑，它们需要很容易区分并且有着足够的对比度以呈现更为有意义的信息。首先我们挑选一个紫色的模板：

. colorpalette #e8e8e8 #dfb0d6 #be64ac #ace4e4 #a5add3 #8c62aa #5ac8c8 #5698b9 #3b4994

下面让我们把颜色放进地图：

. * 颜色放进地图
. colorpalette #e8e8e8 #dfb0d6 #be64ac #ace4e4 #a5add3 #8c62aa #5ac8c8 #5698b9 #3b4994, nograph 
. local colors `r(p)'
. spmap grp_xtile using usa_county_shp_clean, ///
>     id(_ID) clm(unique)  fcolor('`colors'') ///
>     ocolor(white ..) osize(0.02 ..)         /// 
>     ndfcolor(gs14) ndocolor(gs6 ..) ndsize(0.03 ..) ndlabel('No data') ///
>     polygon(data('usa_state_shp_clean') ocolor(white) osize(0.15) )    ///
>     legend(pos(5) size(2.5))  legstyle(2) ///
>     title('Comparison of African American versus Hispanic shares', size(5)) ///
>     note('Source: United States Census Bureau', size(vsmall))

下面让我们去掉一些额外的信息：

. colorpalette #e8e8e8 #dfb0d6 #be64ac #ace4e4 #a5add3 #8c62aa #5ac8c8 #5698b9 #3b4994, nograph 
. local colors `r(p)'
. spmap grp_xtile using usa_county_shp_clean, ///
>     id(_ID) clm(unique)  fcolor('`colors'') ///
>     ocolor(white ..) osize(none ..)         /// 
>     ndfcolor(gs14) ndocolor(gs6 ..) ndsize(0.03 ..) ndlabel('No data') ///
>     polygon(data('usa_state_shp_clean') ocolor(white) osize(0.12) )    ///
>     legend(off) name(bivar_map, replace)

5. 图例

首先我们先在一个干净的画布上生成九个格点：

. clear
. set obs 9
. egen y = seq(), b(3)  
. egen x = seq(), t(3)
. twoway (scatter y x)

现在将这些点转化为方格，然后增加坐标轴的大小：

. twoway (scatter y x, msymbol(square) msize(18)), xlabel(0 4) ylabel(0 4) aspect(1)

接下来把颜色放进去了，我们先看如何把对角的颜色填充进去：

. colorpalette #e8e8e8 #dfb0d6 #be64ac #ace4e4 #a5add3 #8c62aa #5ac8c8 #5698b9 #3b4994, nograph 
. return list
. local color11 `r(p1)'
. local color12 `r(p2)'
. local color13 `r(p3)'
. local color21 `r(p4)'
. local color22 `r(p5)'
. local color23 `r(p6)'
. local color31 `r(p7)'
. local color32 `r(p8)'
. local color33 `r(p9)'
. twoway (scatter y x if x==1 & y==1, msymbol(square) msize(18) mc('`color11'')) ///
>     (scatter y x if x==2 & y==2, msymbol(square) msize(18) mc('`color22''))    ///
>     (scatter y x if x==3 & y==3, msymbol(square) msize(18) mc('`color33'')),   ///
>     xlabel(0 4) ylabel(0 4) aspect(1)  legend(off)

然后我们用循环的方法来快速填充其余的方格：

. * 填充颜色
. levelsof x, local(xlvl) 
. levelsof y, local(ylvl)
. local boxes
. foreach x of local xlvl {
  2.     foreach y of local ylvl {
  3.         local boxes `boxes' (scatter y x if x==`x' & y==`y', msymbol(square) ///
>            msize(19) mc('`color`x'`y''')) 
  4.     }
  5. }
. cap drop spike*
. gen spike1_x1  = 0.2 in 1
. gen spike1_x2  = 3.6 in 1 
. gen spike1_y1  = 0.2 in 1 
. gen spike1_y2  = 0.2 in 1 
. gen spike1_m   = 'More African Americans'   
. gen spike2_y1  = 0.2 in 1
. gen spike2_y2  = 3.2 in 1 
. gen spike2_x1  = 0.2 in 1  
. gen spike2_x2  = 0.2 in 1  
. gen spike2_m   = 'More Hispanics'
. twoway `boxes' (pcarrow spike1_y1 spike1_x1 spike1_y2 spike1_x2, lcolor(gs12) lw(thin)   ///
>     mcolor(gs12) mlabel(spike1_m) mlabpos(7) headlabel) (pcarrow spike2_y1 spike2_x1     ///
>     spike2_y2 spike2_x2, lcolor(gs12) lw(thin) mcolor(gs12) mlabel(spike2_m) mlabpos(10) ///
>     headlabel mlabangle(90) mlabgap(1.8)), xlabel(0 4) ylabel(0 4) aspect(1) legend(off)

接下来将这张图的尺寸缩小以匹配我们之前绘制的地图，同时删掉原来的坐标轴和格点：

. colorpalette #e8e8e8 #dfb0d6 #be64ac #ace4e4 #a5add3 #8c62aa #5ac8c8 #5698b9 #3b4994, nograph 
. return list
. local color11 `r(p1)'
. local color12 `r(p2)'
. local color13 `r(p3)'
. local color21 `r(p4)'
. local color22 `r(p5)'
. local color23 `r(p6)'
. local color31 `r(p7)'
. local color32 `r(p8)'
. local color33 `r(p9)'
. levelsof x, local(xlvl) 
. levelsof y, local(ylvl)
. local boxes
. foreach x of local xlvl {
  2.     foreach y of local ylvl {
  3.         local boxes `boxes' (scatter y x if x==`x' & y==`y', msymbol(square) msize(5) ///
>             mc('`color`x'`y'''))
  4.     }
  5. }
. twoway `boxes' (pcarrow spike1_y1 spike1_x1 spike1_y2 spike1_x2, lw(thin) lcolor(gs12)     ///
>     mcolor(gs12) mlabel(spike1_m) mlabpos(7 ) msize(0.8) headlabel mlabsize(2))            ///
>     (pcarrow spike2_y1 spike2_x1 spike2_y2 spike2_x2, lw(thin) lcolor(gs12) mcolor(gs12)   ///
>     mlabel(spike2_m) mlabpos(10) msize(0.8) headlabel mlabangle(90) mlabgap(1.8)           ///
>     mlabsize(2)), xlabel(0 4, nogrid) ylabel(0 4, nogrid) aspectratio(1) xsize(1) ysize(1) ///
>     fxsize(20) fysize(100) legend(off) ytitle('')  xtitle('') xscale(off) yscale(off)      ///
>     name(bivar_legend, replace)

最后我们将地图和这张小图例合在一起：

. * 合并两张图
. graph combine bivar_map bivar_legend, imargin(zero) ///
>     title('{fontface Arial Bold:Share of African Americans and Hispanics in the USA}', size(4.5)) ///
>     note('Source of data, and county and state layers: United States Census Bureau.', size(2))

6. 中国地图

set scheme white_tableau                  // 设置绘图模板
cap spshape2dta 省级数据
/*
这条命令可以将 dbf 和 shp 格式的地图文件转为两个 dta 格式的文件，分别以省级数据.dta 
和省级数据_shp.dta命名，其中后者省级数据_shp.dta为坐标数据，前者省级数据.dta为标签数据，
在绘制地图时，我们主要是通过标签数据中的 ID 变量连接 (合并) 地图数据和作图数据
*/

* 绘制一维地图
import excel 北京大学数字普惠金融指数.xlsx,firstrow sheet(Provinces) clear

// 使用总的数字金融普惠指数
collapse (mean) index_aggregate, by(prov_name)
drop in 1
save 普惠金融指数.dta,replace
ren prov_name name
merge 1:1 name using '省级数据.dta'
// drop in 32 // 由于北大普惠指数没有考虑香港、澳门、台湾，因此这些地区的数据缺失
drop _merge
spmap index_aggregate using 省级数据_shp, id(_ID) fcolor(Blues)  mocolor(black) mosize(0.1)

* 二维地图
import excel 北京大学数字普惠金融指数.xlsx, firstrow sheet(Provinces) clear
collapse (mean) coverage_breadth usage_depth, by(prov_name)
drop in 1
save 数字金融广度和深度.dta,replace
ren prov_name name
merge 1:1 name using '省级数据.dta'

// drop in 32
drop _merge
xtile xtile_coverage = coverage_breadth, n(3) 
xtile xtile_usage = usage_depth, n(3)
sort xtile_coverage xtile_usage
cap drop xtile2
egen xtile2 = group(xtile_coverage xtile_usage)

// map with new color scheme
colorpalette #c8b35a #b0d5df #64acbe #e4acac #ad9ea5 #627f8c #c85a5a #985356 #574249, nograph 
local colors `r(p)'
spmap xtile2 using 省级数据_shp, id(_ID) clm(unique)  fcolor('`colors'') ///
    ocolor(black ..) osize(0.02 ..) /// 
    ndfcolor(gs14) ndocolor(gs6 ..) ndsize(0.03 ..) ndlabel('No data')   ///
    polygon(data('省级数据_shp') ocolor(black) osize(0.12) )             ///
    legend(off) name(bivar_map2, replace) mosize(vvvthick)

// legend with new color scheme
clear
set obs 9
egen y = seq(), b(3)  
egen x = seq(), t(3)
cap drop spike*
gen spike1_x1  = 0.2 in 1
gen spike1_x2  = 3.6 in 1 
gen spike1_y1  = 0.2 in 1 
gen spike1_y2  = 0.2 in 1 
gen spike1_m   = '数字普惠广度更强'   
gen spike2_y1  = 0.2 in 1
gen spike2_y2  = 3.2 in 1 
gen spike2_x1  = 0.2 in 1  
gen spike2_x2  = 0.2 in 1  
gen spike2_m   = '数字普惠深度更强'
colorpalette #c8b35a #b0d5df #64acbe #e4acac #ad9ea5 #627f8c #c85a5a #985356 #574249, nograph
 
local color11 `r(p1)'
local color12 `r(p2)'
local color13 `r(p3)'
local color21 `r(p4)'
local color22 `r(p5)'
local color23 `r(p6)'
local color31 `r(p7)'
local color32 `r(p8)'
local color33 `r(p9)' 
  
levelsof x, local(xlvl) 
levelsof y, local(ylvl)
local boxes
foreach x of local xlvl {
    foreach y of local ylvl {
        local boxes `boxes' (scatter y x if x==`x' & y==`y', msymbol(square)  ///
            msize(5) mc('`color`x'`y'''))
    }
}
 
twoway`boxes' (pcarrow spike1_y1 spike1_x1 spike1_y2 spike1_x2, lw(thin)        ///
    lcolor(gs12) mcolor(gs12) mlabel(spike1_m) mlabpos(7) msize(0.8)            ///
    headlabel mlabsize(2)) (pcarrow spike2_y1 spike2_x1 spike2_y2 spike2_x2,    ///
    lw(thin) lcolor(gs12) mcolor(gs12) mlabel(spike2_m) mlabpos(10) msize(0.8)  ///
    headlabel mlabangle(90) mlabgap(1.8) mlabsize(2)), xlabel(0 4, nogrid)      ///
    ylabel(0 4, nogrid) aspectratio(1) xsize(1) ysize(1) fxsize(20) fysize(100) ///
    legend(off) ytitle('') xtitle('') xscale(off) yscale(off)                   ///
    name(bivar_legend2, replace)
    
// combine the two graphs
graph combine bivar_map2 bivar_legend2, imargin(zero) ///
    title('{fontface Arial Bold:中国数字普惠金融的广度和深度}', size(4.5)) ///
    subtitle('Colors defined by tercile cut-offs of digital finance extent in pronvinces of China', size(2.8)) ///
    note('郭峰等. 测度中国数字普惠金融发展: 指数编制与空间特征[J]. 经济学 (季刊), 2020, 19(4): 1401-1418.',    ///
    size(2))