-
Notifications
You must be signed in to change notification settings - Fork 888
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix issue with no decompressed data in ORC reader (#13609)
Currently, the ORC reader assumes that if data in the stripes of the current level are not empty, the decompressed data will also not be empty. However, there is a corner case where the stripe is empty, but data blocks are still compressed so they contain the compression header. In this case, decompressed data is empty even with non-empty compressed blocks. This PR removes the assertion in the reader to allow for this corner case. Also adds a short-circuit to `decompress_stripe_data` to return early if the decompressed data is empty. Issue #13608 Authors: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #13609
- Loading branch information
Showing
3 changed files
with
17 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+373 Bytes
python/cudf/cudf/tests/data/orc/TestOrcFile.Spark.EmptyDecompData.orc
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters