Slimming down the LION

New York City’s Department of City Planning provides a wonderful resource in the LION dataset, which contains lines for streets, paths, railroads, and administrative boundaries. There is one problem with the most recent version– it is only available in ESRI’s proprietary File GeoDatabase (or GDB) format. While the format has some advantages (more on these below), it is not easily accessible in many free/open source tools. I recently had occasion to work on converting LION to a more accessible format for display and light analysis, and this post outlines that methodology and provides the results.

My go-to tool for converting between geodata formats is ogr2ogr in the GDAL package. It supports a stunning number of formats for conversion. One of these is GDB via the OpenFileGDB driver, available out of the box in newer releases of GDAL.

Using the default driver with the LION GDB, however, will not work (more on this in a bit). The solution that I found was to ignore the built-in GDB driver and, in true Linux fashion, re-compile GDAL with the ESRI FileGDB library . Follow the directions.

The actual conversion is pretty simple, but it took several tries to get it down to a manageable size. I’ve included links to either the SHP or PGDump of the Lion (As of July 2014, version 14BAV) results. The Metadata does not appear to include distribution restrictions, so feel free to mirror.

Try 1 (all data into a shapefile):
ogr2ogr lion.shp lion.gdb/
Result: 1.5G+ Shapefile / DBF. This is the result of losing GDBs #1 advantage: SDC, or Smart Data Compression. In QGIS on a machine with 8GB ram, it is pretty much unmanageable to use. The PGDump copy came in at a more manageable 150MB, so I’m including it here.

Try 2 made use of ogr2ogr’s SQL dialect to select specific columns, which greatly cut down on the size of the attribute table:
ogr2ogr lion.shp lion/lion.gdb/ -sql "select street, featuretyp, segmenttyp, rw_type, nonped, bikelane from lion"
Result: 120MB SHP. This is manageable on QGIS. After looking at it, however, there still were a lot of non-road lines included.

Try 3 took the SQL approach further, restricting to only “Street other than vehicle only streets, ” whatever that means:
ogr2ogr lion_smaller.shp lion/lion.gdb/ -sql "select street, featuretyp, segmenttyp, rw_type, nonped, bikelane from lion where featuretyp='0'"
Result: 100MB. It’s available here.