10 行 Python 代码创建可视化地图-36大数据
作者:renwofei423
import vincent world_countries = r'world-countries.json' world = vincent.Map(width= 1200 , height= 1000 ) world.geo_data(projection= 'winkel3' , scale= 200 , world=world_countries) world.to_json(path)
当我开始建造Vincent时, 我的一个目的就是使得地图的建造尽可能合理化. 有一些很棒的python地图库-参见Basemap 和 Kartograph能让地图更有意思. 我强烈推荐这两个工具, 因为他们都很好用而且很强大. 我想有更简单一些的工具,能依靠Vega的力量并且允许简单的语法点到geoJSON文件,详细描述一个投影和大小/比列,最后输出地图.
例如, 将地图数据分层来建立更复杂的地图:
vis = vincent.Map(width= 1000 , height= 800 ) #Add the US county data and a new line color vis.geo_data(projection= 'albersUsa' , scale= 1000 , counties=county_geo) vis + ( '2B4ECF' , 'marks' , 0 , 'properties' , 'enter' , 'stroke' , 'value' ) #Add the state data, remove the fill, write Vega spec output to JSON vis.geo_data(states=state_geo) vis - ( 'fill' , 'marks' , 1 , 'properties' , 'enter' ) vis.to_json(path)
加之,等值线地图需绑定Pandas数据,需要数据列直接映射到地图要素.假设有一个从geoJSON到列数据的1:1映射,它的语法是非常简单的:
#'merged' is the Pandas DataFrame vis = vincent.Map(width= 1000 , height= 800 ) vis.tabular_data(merged, columns=[ 'FIPS_Code' , 'Unemployment_rate_2011' ]) vis.geo_data(projection= 'albersUsa' , scale= 1000 , bind_data= 'data.id' , counties=county_geo) vis + ([ "#f5f5f5" , "#000045" ], 'scales' , 0 , 'range' ) vis.to_json(path)
我们的数据并非没有争议无需改造——用户需要确保 geoJSON 键与熊猫数据框架之间具有1:1的映射。下面就是之前实例所需的简明的数据框架映射:我们的国家信息是一个列有FIPS 码、国家名称、以及经济信息(列名省略)的 CSV 文件:
00000 ,US,United States, 154505871 , 140674478 , 13831393 , 9 , 50502 , 100 01000 ,AL,Alabama, 2190519 , 1993977 , 196542 , 9 , 41427 , 100 01001 ,AL,Autauga County, 25930 , 23854 , 2076 , 8 , 48863 , 117.9 01003 ,AL,Baldwin County, 85407 , 78491 , 6916 , 8.1 , 50144 , 121 01005 ,AL,Barbour County, 9761 , 8651 , 1110 , 11.4 , 30117 , 72.7
在 geoJSON 中,我们的国家形状是以 FIPS 码为id 的(感谢 fork 自 Trifacta 的相关信息)。为了简便,实际形状已经做了简略,在示例数据可以找到完整的数据集:
{ "type" : "FeatureCollection" , "features" :[ { "type" : "Feature" , "id" : "1001" , "properties" :{ "name" : "Autauga" } { "type" : "Feature" , "id" : "1003" , "properties" :{ "name" : "Baldwin" } { "type" : "Feature" , "id" : "1005" , "properties" :{ "name" : "Barbour" } { "type" : "Feature" , "id" : "1007" , "properties" :{ "name" : "Bibb" } { "type" : "Feature" , "id" : "1009" , "properties" :{ "name" : "Blount" } { "type" : "Feature" , "id" : "1011" , "properties" :{ "name" : "Bullock" } { "type" : "Feature" , "id" : "1013" , "properties" :{ "name" : "Butler" } { "type" : "Feature" , "id" : "1015" , "properties" :{ "name" : "Calhoun" } { "type" : "Feature" , "id" : "1017" , "properties" :{ "name" : "Chambers" } { "type" : "Feature" , "id" : "1019" , "properties" :{ "name" : "Cherokee" }
我们需要匹配 FIPS 码,确保匹配正确,否则 Vega 无法正确的压缩数据:
import json import pandas as pd #Map the county codes we have in our geometry to those in the #county_data file, which contains additional rows we don't need with open(county_geo, 'r' ) as f: get_id = json.load(f) #Grab the FIPS codes and load them into a dataframe county_codes = [x[ 'id' ] for x in get_id[ 'features' ]] county_df = pd.DataFrame({ 'FIPS_Code' : county_codes}, dtype=str) #Read into Dataframe, cast to string for consistency df = pd.read_csv(county_data, na_values=[ ' ' ]) df[ 'FIPS_Code' ] = df[ 'FIPS_Code' ].astype(str) #Perform an inner join, pad NA's with data from nearest county merged = pd.merge(df, county_df, on= 'FIPS_Code' , how= 'inner' ) merged = merged.fillna(method= 'pad' ) >>>merged.head() FIPS_Code State Area_name Civilian_labor_force_2011 Employed_2011 \ 0 1001 AL Autauga County 25930 23854 1 1003 AL Baldwin County 85407 78491 2 1005 AL Barbour County 9761 8651 3 1007 AL Bibb County 9216 8303 4 1009 AL Blount County 26347 24156 Unemployed_2011 Unemployment_rate_2011 Median_Household_Income_2011 \ 0 2076 8.0 48863 1 6916 8.1 50144 2 1110 11.4 30117 3 913 9.9 37347 4 2191 8.3 41940 Med_HH_Income_Percent_of_StateTotal_2011 0 117.9 1 121.0 2 72.7 3 90.2 4 101.2
现在,我们可以快速生成不同的等值线:
vis.tabular_data(merged, columns=[ 'FIPS_Code' , 'Civilian_labor_force_2011' ]) vis.to_json(path)
这只能告诉我们 LA 和 King 面积非常大,人口非常稠密。让我们再看看中等家庭收入:
vis.tabular_data(merged, columns=[ 'FIPS_Code' , 'Median_Household_Income_2011' ]) vis.to_json(path)
明显很多高收入区域在东海岸或是其他高密度区域。我敢打赌,在城市层级这将更加有趣,但这需要等以后发布的版本。让我们快速重置地图,再看看国家失业率:
#Swap county data for state data, reset map state_data = pd.read_csv(state_unemployment) vis.tabular_data(state_data, columns=[ 'State' , 'Unemployment' ]) vis.geo_data(bind_data= 'data.id' , reset= True , states=state_geo) vis.update_map(scale= 1000 , projection= 'albersUsa' ) vis + ([ '#c9cedb' , '#0b0d11' ], 'scales' , 0 , 'range' ) vis.to_json(path)
地图即是我的激情所在——我希望 Vincent 能够更强,包含轻松的添加点、标记及其它的能力。如果各位读者对于映射方面有什么功能上的需求,可以在Github上给我发问题。
End.
转载请注明来自36大数据(36dsj.com): 36大数据 » 10 行 Python 代码创建可视化地图