Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About compression: is it normal for it to be so low? #143

Open
aborruso opened this issue Feb 7, 2024 · 1 comment
Open

About compression: is it normal for it to be so low? #143

aborruso opened this issue Feb 7, 2024 · 1 comment

Comments

@aborruso
Copy link

aborruso commented Feb 7, 2024

Hi,
I'm testing gpq on the official administrative boundaries of Italy. The source file is this zip file:
https://www.istat.it/storage/cartografia/confini_amministrativi/non_generalizzati/2023/Limiti01012023.zip

It has a folder structure, with shapefiles in it. I am doing the tests on the Limiti01012023/Com01012023/Com01012023_WGS84.shp file:

  • I convert it to geojson using ogr2ogr;
  • using this geojson I create a gzip compressed geoparquet file, it has the size of 70 MB
  • using the same geojson I create an uncompressed geoparquet file, it has the size of 76 MB

They are almost equal in size. Some notes:

  • if I gzip the uncompressed parquet file I get a 57 MB file
  • if I create a sozip shp version of the source file, I get a 59 MB file

I know, I can't compare these outputs, however, it seems to me very limited compression in gpq output. Is it normal?
Am I doing something wrong?

Below the way I have tested all.

Thank you

wget -O file.zip "https://www.istat.it/storage/cartografia/confini_amministrativi/non_generalizzati/2023/Limiti01012023.zip"

unzip -o file.zip -d .

ogr2ogr -f GeoJSON -t_srs EPSG:4326 comuni.geojson Limiti01012023/Com01012023/Com01012023_WGS84.shp -lco "RFC7946=YES"

gpq convert --compression="gzip" --max 1000 --from="geojson" comuni.geojson comuni_compressed.parquet

gpq convert --compression="uncompressed" --max 1000 --from="geojson" comuni.geojson comuni_uncompressed.parquet

ogr2ogr -t_srs EPSG:4326 Com01012023_WGS84.shp.zip Limiti01012023/Com01012023/Com01012023_WGS84.shp
@aborruso
Copy link
Author

aborruso commented Feb 7, 2024

I have tested the parquet gzip compression using gdal, and I have a 49 MB output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant