Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question and suggestion - behavior with small buffers and errors #4183

Open
Eugeny1 opened this issue Oct 31, 2024 · 3 comments
Open

Question and suggestion - behavior with small buffers and errors #4183

Eugeny1 opened this issue Oct 31, 2024 · 3 comments
Assignees
Labels

Comments

@Eugeny1
Copy link

Eugeny1 commented Oct 31, 2024

I am a starter with zstd (using simple ZSTD_compress and ZSTD_decompress) and have three things to say after some research and having specific task.

Most probably I can find the answers looking into the code, but I'd like to state the issues I have at least as a first step.

  1. I need to compress the data and see if compressed size is less than uncompressed (to decide what data to go with). I'd allocate new buffer of the same size of source data and try to compress there. I would expect if compressed data is bigger, I'd get an out of buffer error. However, the following things make me wondering:
  • I read that "bigger buffer makes compression faster". Because logically extrapolating this statement the smaller buffer makes compression slower, and what happens when buffer size is so small it is about to get out of memory (compression stalls)? While my question may sound funny and stupid, I believe it must be addressed.
  • I can't find clear definition of errors returned. Apart from all the errors, I'd like to check for exactly this out of memory error, but struggling to figure out how to do it.
  1. In general, looking at the documentation here herel I was unable to quickly find how to handle the errors except using

    unsigned ZSTD_isError(size_t code); /!< tells if a size_t function result is an error code /
    const char
    ZSTD_getErrorName(size_t code); /
    !< provides readable string from an error code */

with first function saying that there's an error, and second returning the string. But what is exact error? This lists errors, am I expected to compare against those identifiers using ZSTD_getErrorCode()? The documentation I linked to earlier does not mention this function! Also what is the error enum for out of destination buffer issues - ZSTD_error_noForwardProgress_destFull? What does this name mean?

  1. Small, but I found it kind of bothering. When looking at the error enum identifiers, I see some of them label destination as "dest" and some as "dst". That would be very convenient if uniform approach would be selected, so that searching for the keyword shows all the instances related to destination.

Thanks.

@Cyan4973 Cyan4973 self-assigned this Oct 31, 2024
@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 31, 2024

I'd allocate new buffer of the same size of source data and try to compress there. I would expect if compressed data is bigger, I'd get an out of buffer error.

Yes, this design works, and is in active usage.
A prime example would be compressed filesystems, like BtrFS or ZFS.

Because logically extrapolating this statement the smaller buffer makes compression slower,

Yes, but don't overthink it. The difference is small, and only perceptible at fastest compression levels.

what happens when buffer size is so small it is about to get out of memory (compression stalls)?

We probably assign different meaning to these words.
When destination buffer is too small, compression simply aborts, with an error code.

I can't find clear definition of errors returned.

They are all published in zstd_errors.h, which you seem to have found later in the issue.

I expected to compare against those identifiers using ZSTD_getErrorCode()?

Yes

The documentation I linked to earlier does not mention this function!

The API manual is automatically generated from zstd.h,
but this specific method is defined in zstd_errors.h.

Ideally, we would duplicate or expose its definition in zstd.h, so that the manual generator picks it up.
But this method requires the ZSTD_ErrorCode type, which is in zs2_errors.h.

Possibly including zs2_errors.h from zstd.h could solve that.

edit : oh, but this would require moving zs2_errors.h from "static linking only" mode to "stable" status.
Quite an upgrade.
But I think we are ready for that now.

what is the error enum for out of destination buffer issues

ZSTD_error_dstSize_tooSmall.
You could also find it by creating a local example and looking at the outcome.

I see some of them label destination as "dest" and some as "dst".

The naming convention has mostly converged overtime to use dst.
Though there might be some left over using dest here and there.

The only error code using dst is ZSTD_error_noForwardProgress_destFull.
All other ones use dst.

While we could consider updating this specific error name to use dst instead,
we have a responsibility to keep this interface working for existing applications,
meaning the current dest variant must remain available anyway.

@Eugeny1
Copy link
Author

Eugeny1 commented Oct 31, 2024

Thank you so much for your help and support, much appreciated!

May I have some more suggestions please

The API manual is automatically generated from zstd.h

At the very top of the manual add statement that it is auto-generated using sources. Then people (like me) will be discounting the quality as it was not written by the professional manual writers. To ensure reader's expectations from the very start.

ZSTD_error_dstSize_tooSmall

Add a section with error codes into the manual, with corresponding enum values (names) and descriptions what are they about. I hope ZSTD_getErrorName() are descriptive enough and you can just automatically create the table with corresponding names and description strings. That may help and be a quick reference for newbies.

@codesparrow2
Copy link

Behavior with Small Buffers and Errors**

  1. Buffer Overflow/Underflow: Small buffers can lead to overflows if the input exceeds the allocated size, causing undefined behavior, crashes, or security vulnerabilities.

  2. Error Handling: When a buffer is too small, robust error handling is critical to avoid issues. Common practices include:

    • Checking input size before writing.
    • Using safe buffer-handling functions (e.g., snprintf instead of sprintf in C).
    • Allocating dynamic buffers when needed.
  3. Performance Considerations: Small buffers might be faster but can require frequent resizing, impacting performance if poorly managed.

  4. Debugging: Errors related to small buffers often manifest as segmentation faults, corrupted data, or subtle logic bugs. Using tools like Valgrind or AddressSanitizer can help identify such issues.


Suggested Answer for #4183 (Generalized)

If issue #4183 refers to small buffer behavior and errors:

  • Clarify Requirements: Ensure you understand the expected buffer size for the data being processed. Adjust buffer allocation or dynamically resize as needed.
  • Implement Boundary Checks: Use boundary checks to ensure no out-of-bounds access.
  • Use Error Codes: Return meaningful error codes or messages when a buffer is insufficient.
  • Use Modern APIs: Opt for high-level APIs or languages that handle memory management for you (e.g., std::vector in C++, or Python lists).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants