本文转自:数据小魔方 最近在梳理Python中可以制作数据地图的可视化工具包,分别实践了geopandas、folium、Basemp,通过对比发现,静态地图中最为成熟的最终还得是Basemap工具,它是mpl_toolkits包中的一个专门用于构建地理信息数据可视化的扩展库。
Basemap工具在地理信息读写、坐标映射、空间坐标转化与投影等方面做的要比geopandas更加成熟,它可以使用常规的地图素材数据源(shp)作为底图进行叠加绘图,效果与精度控制比较方便,图表质量堪比R语言中的ggplot2绘图包(geom_polygon),唯一不足的是它是一个底层构建工具,所有的多边形映射都需要手动构造循环(目前还没有发现比较好用的基于basemap的扩展工具),作图效率与速度上自然无法媲美R语言的ggplot2(缺少一套健全的顶层语法支撑)。 接下来会用3~5篇的篇幅分享给大家基于basemap包的应用场景,包含散点图(气泡图)、折现图(路径图等线图类型)以及最常用的热力填充地图。 本小节介绍填充地图与散点图应用,案例是使用itchat接口抓取的本人微信好友信息。 import itchat import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib from matplotlib.patches import Polygon from mpl_toolkits.basemap import Basemap from matplotlib.collections import PatchCollection
1、微信网页版登录: itchat.login() #使用手机微信扫一扫扫描弹出二维码即可登录。 #Getting uuid of QR code. #Downloading QR code. #Please scan the QR code to log in. #Please press confirm on your phone. #Loading the contact, this may take a little while. #Login successfully as 杜雨
#提取微信好友信息: friends = itchat.get_friends(update=True)df_friends = pd.DataFrame(friends)df_friends.to_csv('wechat_friends.csv',encoding = 'utf_8_sig') #friends = pd.read_csv('D:/Python/File/wechat_friends.csv') mydata = friends.loc[:,['NickName','Province','Signature']]
2、聚合计算好友地区分布: aggResult = mydata.groupby(['Province'])['NickName'].agg({'人数': np.size}).reset_index()aggResult.sort_values(by = ['人数'],ascending = False,inplace=True)
#拆分国内城市与国外城市:
def match_str(item): result = [] for i in item: try: m = re.search('^[\u4e00-\u9fa5]{1,}',i).group() result.append(m) except: continue return(result)Domestic = match_str(aggResult['Province'].tolist())Domestic = aggResult.loc[aggResult.Province.isin(Domestic),:]Foreign = aggResult.loc[aggResult.Province.isin([i for i in aggResult.Province.tolist() if i not in Domestic.Province.tolist()]),:]Domestic['scala'] = (Domestic.人数-Domestic.人数.min())/(Domestic.人数.max()-Domestic.人数.min())
清洗与矫正省份(地区)名称 def correct(name_list): name = [] for i in name_list: if i in ['内蒙古','西藏']: i += '自治区' elif i == '宁夏': i += '回族自治区' elif i == '新疆': i += '维吾尔自治区' elif i == '广西': i += '壮族自治区' elif i in ['香港','澳门','台湾']: i += '特别行政区' elif i in ['北京','天津','重庆','上海']: i += '市' else: i += '省'name.append(i) return(name)Domestic['Province'] = correct(Domestic['Province'])
3、合并本地经纬度数据: #散点图数据源: point_data = pd.read_csv('D:/R/rstudy/Province/chinaprovincecity.csv',encoding = 'gbk') Domestic = Domestic.merge(point_data.loc[:,['province','jd','wd']],how = 'left',left_on = 'Province',right_on = 'province')
实例化地图对象,并导入本地shp中国地图 basemap = Basemap(llcrnrlon= 75,llcrnrlat=10,urcrnrlon=150,urcrnrlat=55,projection='poly',lon_0 = 116.65,lat_0 = 40.02,ax = ax)basemap.readshapefile(shapefile = 'D:/R/rstudy/CHN_adm/bou2_4p',name = 'china')
导入的shp格式地图中很多行政区划信息乱码,需要纠正编码mapData = pd.DataFrame(basemap.china_info)mapData['NAME'] = mapData['NAME'].map(lambda x: x.decode('gbk') if len(x) != 0 else x) #mapData['NAME'] = [i.decode('gbk') if len(i) !=0 else i for i in mapData['NAME'].tolist()] mapData = mapData.merge(Domestic,how = 'left',left_on='NAME', right_on='Province')
4、数据可视化 font = {'family' : 'SimHei'};matplotlib.rc('font', **font);fig = plt.figure(figsize=(16,12))ax = fig.add_subplot(111)
###构建省份填充函数(按照各省好友人数比例): def plotProvince(row): mainColor = (42/256, 87/256, 141/256,row['scala']); patches = [] for info,shape in zip(mapData['NAME'].tolist(),basemap.china): if info == row['Province']: patches.append(Polygon(xy = np.array(shape), closed=True)) ax.add_collection(PatchCollection(patches,facecolor=mainColor,edgecolor=mainColor,linewidths=1.,zorder=2))Domestic.apply(lambda row: plotProvince(row), axis=1) #构建散点图(基于各省好友数量)
def create_great_points(df): lon = np.array(df['jd']) lat = np.array(df['wd']) pop = np.array(df['scala'],dtype=float) x,y = basemap(lon,lat) for lon,lat,pop in zip(x,y,pop*50): basemap.scatter(lon,lat,color = '#c72e29',marker = 'o',s = pop*25)create_great_points(Domestic)plt.axis('off') #关闭坐标轴 plt.savefig('D:/Python/Image/杜雨/itwechat.png') #保存图表到本地 plt.show() #显示图表
整个内容中涉及到的bou2_4p.shp,chinaprovincecity.csv均为之前推送过的R语言ggplot2系列所用数据源
|