Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M3DB Additional (optional) compression #15

Open
richardartoul opened this issue Jul 16, 2019 · 0 comments
Open

M3DB Additional (optional) compression #15

richardartoul opened this issue Jul 16, 2019 · 0 comments
Assignees

Comments

@richardartoul
Copy link

richardartoul commented Jul 16, 2019

Even though M3DB achieves great compression (10x compression of timestamp/float in the cluster I will be describing below) some of our clusters that retain data for 40 days are still bottlenecked by disk more than anything else due to existing hardware choices.

I used some production data to run some basic analysis / benchmarking and found that by combining two additional forms of compression we could reduce the disk usage of M3DB by 20%.

Optional Compression #1 (compressing M3TSZ series data): 15% savings

Compression Technique Size (bytes)
Raw size (8 bytes per timestamp and 8 bytes per float64) 160140176
M3TSZ - what M3DB currently stores on disk 16578169 (roughly 10x)
Snappy (compressing each series M3TSZ stream individually) 14350499
GZIP (compressing each series M3TSZ stream individually) 13577783
ZSTD (compressing each series M3TSZ stream individually) 13663157

I also did some benchmarking where I measured the compression level if I batched multiple series together into one compressed block (simulating compression applied to the file at a block level instead of a per-series level) and found that it was only marginally better (so probably not worth the complexity).

This benchmark shows that we can easily achieve at least 15% additional compression of our data files for this specific workload (bringing the overall compression over raw data from 9.6x to 11.7x which is quite substantial). While not massive, this would reduce the overall disk usage of this cluster (which is completely bottlenecked by disk space) by 10% (less than 15% because the data portion only represents 80% of the total disk usage of the host) which would be a substantial hardware savings considering the size of the cluster. This is also appealing because an end-to-end implementation of this feature could be achieved in 3-4 days in a completely backwards compatible way (feature would be optional and turned off by default). The performance impact would also likely be quite small since decompression only needs to occur once per time series lookup and data would be stored in M3DBs LRU uncompressed meaning frequently retrieved series would not need to be decompressed more than once.

Optional Compression #2

Custom ZSTD trained dictionaries for compressing time series IDs in the index files. I benchmarked the performance of this on a production dataset using (basically) the following code:

dict := gozstd.BuildDict(ids, 2*1024)
cdict, err := gozstd.NewCDict(dict)
if err != nil {
    panic(err)
}

var (
    uncompressed int
    compressed   int
)
for _, id := range ids {
    uncompressed += len(id)
    compressed += len(gozstd.CompressDict(nil, id, cdict))
}

fmt.Println("uncompressed: ", uncompressed)
fmt.Println("compressed:   ", compressed)

The result of this was a massive 2.8x improvement in compression using only a 2KiB dictionary:

Compression Technique Size (bytes)
Uncompressed 3129912
ZSTD Custom Dictionary 945849

In addition, I was able to achieve a 4x+ compression ratio by simply increasing the dictionary size to 40KiB, although its likely we would want to keep the dictionary size small to avoid storing a large object in memory.

For this particular workload this would reduce the overall disk usage of the nodes by 10%. The nice thing about this implementation is that it would also be very easy to implement because M3DB already holds all the time series IDs for a given fileset in memory while flushing. This means that it would be very easy to train a custom zstd dictionary while performing a flush, store the compressed dictionary in the fileset file, and then use the dictionary in the seekers when seeking through the IDs to decompress them.

This technique would likely have more of a performance impact than the previous one, primarily because M3DB does need to perform some linear scans through small portions of the series index files which means that retrieving a single time series may require decompressing more than one time series ID, but its likely that the performance impact would be tolerable in a lot of cases considering that zstd is known for its very fast decompression speeds.

The only hang up to adding support for this feature is that M3DB currently avoids any and all cgo dependencies and we wouldn't want to add a cgo dependency for a small optional feature like this, however, there is now a [pure Go zstd implementation(https://github.com/klauspost/compress/tree/master/zstd) although unfortunately it does not currently support custom dictionaries. Once they add support for that, we should be able to implement this feature quite quickly and keep M3DB completely written in pure Go.

Summary

By adding two additional optional compression features M3DB could reduce overall disk usage by 20%. This could lead to substantial savings for workloads that are bottlenecked by disk space more than anything else. It will also make it easier for users to retain data for longer periods of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants