将熊猫数据框转换为GeoDataFrame

这似乎是一个足够简单的问题，但是我不知道如何将pandas DataFrame转换为用于空间连接的GeoDataFrame。

以下是使用df.head()的数据的示例：

    Date/Time           Lat       Lon       ID
0   4/1/2014 0:11:00    40.7690   -73.9549  140
1   4/1/2014 0:17:00    40.7267   -74.0345  NaN

实际上，此数据框是从CSV创建的，因此，如果也可以将它作为GeoDataFrame直接读入CSV，这也很好。

使用GeoPandas

#1 楼

首先将DataFrame的内容（例如Lat和Lon列）转换为适当的Shapely几何形状，然后将它们与原始DataFrame一起使用以创建GeoDataFrame。

 from geopandas import GeoDataFrame
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

结果：

    Date/Time           ID      geometry
0   4/1/2014 0:11:00    140     POINT (-73.95489999999999 40.769)
1   4/1/2014 0:17:00    NaN     POINT (-74.03449999999999 40.7267)

由于几何图形通常采用WKT格式，因此我想我也应该提供这种情况的示例：

import geopandas as gpd
import shapely.wkt

geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

再次感谢！这要简单得多并且运行速度非常快-比在n = 500,000时遍历df的每一行要好得多：）

– atkat12
2015年12月16日在22:42

天哪，谢谢！我每2天检查一次此答案：）

–欧文
16年12月21日在16:25

您会认为这将是文档中的第一项！

–多米尼克
17年5月14日在16:53

为shapely.wkt +1。我花了一段时间才弄清楚！

– StefanK
17/12/12在15:14

为了避免从pandas df中删除纬度/经度列（以防以后需要使用），我建议在创建gdf时删除纬度/经度，例如gdf = GeoDataFrame（df.drop（['Lon '，'Lat']，axis = 1），crs = crs，geometry = geometry）

–Gene Burinsky
5月27日19:43

#2 楼

更新201912：https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html上的官方文档使用geopandas.points_from_xy简洁地完成了此操作，例如：

gdf = geopandas.GeoDataFrame(
    df, geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)

如果需要，还可以设置crs或z（例如海拔）值。

旧方法：匀称地使用

单线！加上一些针对大数据人群的性能指标。

给出一个具有x经度和y纬度的pandas.DataFrame，如下所示：

df.head()
x   y
0   229.617902  -73.133816
1   229.611157  -73.141299
2   229.609825  -73.142795
3   229.607159  -73.145782
4   229.605825  -73.147274

让我们转换将pandas.DataFrame转换为geopandas.GeoDataFrame，如下所示：

库导入和匀称的加速：

import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0

我在撒谎的测试数据集上的代码+基准时间周围：

#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                                crs={'init': 'epsg:4326'},
                                geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])



#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                       crs={'init': 'epsg:4326'},
                       geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))

使用pandas.apply出奇的慢，但可能更适合其他一些工作流程（例如，在使用dask库的较大数据集上）：

信用：

从Pandas数据框中制作shapefile？（用于pandas应用方法）

使用Geopandas加速多边形中的行方向点（用于加速提示）

一些尚在进行中的引用（截至2017年））处理大型dask数据集：

http://matthewrocklin.com/blog/work/2017/09/21/accelerating-geopandas-1
https：// github.com/geopandas/geopandas/issues/461
https://github.com/mrocklin/dask-geopandas

感谢您的比较，确实zip版本速度更快

– MCMZL
19 Mar 27 '19 10:58

编程黑洞网

将熊猫数据框转换为GeoDataFrame

评论

#1 楼

评论

#2 楼

评论