- Update Rust code style to conform to new
rustc
requirements (preventing builds onrustc 1.79.0
and further) [@jaseemabid, #321].
- Pull out the
arm64
platform from the Docker image, since it does not build in acceptable time via GitHub Actions due to using QEMU emulation (will wait that GitHub Actions provides a nativearm64
runner) [@valeriansaliou].
- Fixed non-working
arm64
builds due to hardcodedx86_64-unknown-linux-gnu
Rust target in theDockerfile
[@valeriansaliou].
- The Docker image is now also available for the
arm64
platform, in addition toamd64
[@PovilasID, #310].
- Fixed an issue where system clock can move back to the past on a virtualized system, resulting in client threads entering a crash loop due to mutex poisoning [@valeriansaliou].
- Fixed
rocksdb
not building due to arust-bindgen
version which was not compatible withclang
version 16 [@anthonyroussel, #316].
- Dependencies have been bumped to latest versions (namely:
rocksdb
,toml
,regex-syntax
,hashbrown
,lindera-core
,lindera-dictionary
,lindera-tokenizer
) [@valeriansaliou].
- Publish
.deb
packages for Debian 12 onx86_64
architecture [@valeriansaliou].
- Produce
glibc
builds from GitHub Actions whenever a new Sonic version gets released [@valeriansaliou]. - Pull out
tokenizer-japanese
from the default features, as it x10 the final binary size [@valeriansaliou].
- Added support for Japanese word segmentation in tokenizer (note that as this adds quite some size overhead to the final binary size, the feature
tokenizer-japanese
can be disabled when building Sonic) [@nmkj-io, #311].
- Fixed typo in README abstract [@remram44, #295].
- Fixed typos in code and documentation [@kianmeng, #294].
- Rolled back
rocksdb
version, as the latest version does not link properly in--release
mode [@valeriansaliou].
- Dependencies have been bumped to latest versions (namely:
rocksdb
,clap
,regex
) [@valeriansaliou].
- Dependencies have been bumped to latest versions (namely:
hashbrown
,whatlang
,regex
) [@valeriansaliou]. - Moved the release pipeline to GitHub Actions [@valeriansaliou].
- The language detection system is now about 2x faster (due to the upgrade of
whatlang
pastv0.14.0
) [@valeriansaliou]. - Added Armenian stopwords [@valeriansaliou].
- Added Georgian stopwords [@valeriansaliou].
- Added Gujarati stopwords [@valeriansaliou].
- Added Tagalog stopwords [@valeriansaliou].
- Fixed Norwegian stopwords [@valeriansaliou, #239].
- Code has been formatted according to
clippy
recommendations. This does not change the way Sonic behaves [@pleshevskiy, #233].
- Added support for Chinese word segmentation in tokenizer (note that as this adds quite some size overhead to the final binary size, the feature
tokenizer-chinese
can be disabled when building Sonic) [@vincascm, #209].
- Apple Silicon is now supported [@valeriansaliou].
- Added Norwegian stopwords [@mikalv, #236].
- Added Catalan stopwords [@coopanio, #227].
- Dependencies have been bumped to latest versions (namely:
rocksdb
,fst-levenshtein
,fst-regex
,hashbrown
,whatlang
,byteorder
,rand
) [@valeriansaliou].
- A few rarely-used languages have been removed, following
whatlang
v0.12.0
release, see the notes here [@valeriansaliou, 940d3c3].
- Added support for Slovak, which is now auto-detected from terms [@valeriansaliou, 19412ce].
- Added Slovak stopwords [@valeriansaliou, 19412ce].
- Dependencies have been bumped to latest versions (namely:
whatlang
) [@valeriansaliou, 19412ce].
- Fixed multiple deadlocks, which where not noticed in practice by running Sonic at scale, but that are still theoretically possible [@BurtonQin, #213, #211].
- Added support for Latin, which is now auto-detected from terms [@valeriansaliou, e6c5621].
- Added Latin stopwords [@valeriansaliou, e6c5621].
- Dependencies have been bumped to latest versions (namely:
rocksdb
,radix
,hashbrown
,whatlang
) [@valeriansaliou].
- Added a release script, with cross-compilation capabilities (currently for the
x86_64
architecture, dynamically linked against GNU libraries) [@valeriansaliou, 961bab9].
- RocksDB compression algorithm has been changed from LZ4 to Zstandard, for a slightly better compression ratio, and much better read/write performance; this will be used for new SST files only [@valeriansaliou, cd4cdfb].
- Dependencies have been bumped to latest versions (namely:
rocksdb
) [@valeriansaliou, cd4cdfb].
- Fixed a regression on optional configuration values not working anymore, due to an issue in the environment variable reading system introduced in
v1.2.1
[@valeriansaliou, #155].
- Optimized some aspects of FST consolidation and pending operations management [@valeriansaliou, #156].
- FST graph consolidation is now able to ignore new words when the graph is over configured limits, which are set with the new
store.fst.graph.max_size
andstore.fst.graph.max_words
configuration variables [@valeriansaliou, 53db9c1]. - An integration testing infrastructure has been added to the Sonic automated test suite [@vilunov, #154].
- Configuration values can now be sourced from environment variables, using the
${env.VARIABLE}
syntax inconfig.cfg
[@perzanko, #148]. - Dependencies have been bumped to latest versions (namely:
rand
,radix
andhashbrown
) [@valeriansaliou, c1b1f54].
- Fixed a rare deadlock occurring when 3 concurrent operations get executed on different threads for the same collection, in the following timely order:
PUSH
thenFLUSHB
thenPUSH
[@valeriansaliou, d96546b].
- Reworked the KV store manager to perform periodic memory flushes to disk, thus reducing startup time [@valeriansaliou, 6713488].
- Stop accepting Sonic Channel commands when shutting down Sonic [@valeriansaliou, #131].
- Introduced a server statistics
INFO
command to Sonic Channel [@valeriansaliou, #70]. - Added the ability to disable the lexer for a command with the command modifier
LANG(none)
[@valeriansaliou, #108]. - Added a backup and restore system for both KV and FST stores, which can be triggered over Sonic Channel with
TRIGGER backup
andTRIGGER restore
[@valeriansaliou, #5]. - Added the ability to disable KV store WAL (Write-Ahead Log) with the
write_ahead_log
option, which helps limit write wear on heavily loaded SSD-backed servers [@valeriansaliou, #130].
- RocksDB has been bumped to
v5.18.3
, which fixes a dead-lock occurring in RocksDB at scale when a compaction task is ran under heavy disk writes (ie. disk flushes). This dead-lock was causing Sonic to stop responding to any command issued for the frozen collection. This dead-lock was due to a bug in RocksDB internals (not originating from Sonic itself) [@baptistejamin, 19c4a10].
- Reworked the
FLUSHB
command internals, which now use the atomicdelete_range()
operation provided by RocksDBv5.18
[@valeriansaliou, 660f8b7].
- Added the
LANG(<locale>)
command modifier forQUERY
andPUSH
, that lets a Sonic Channel client force a text locale (instead of letting the lexer system guess the text language) [@valeriansaliou, #75]. - The FST word lookup system, used by the
SUGGEST
command, now support all scripts via a restricted Unicode range forward scan [@valeriansaliou, #64].
- A store acquire lock has been added to prevent 2 concurrent threads from opening the same collection at the same time [@valeriansaliou, 2628077].
- A superfluous mutex was removed from KV and FST store managers, in an attempt to solve a rare dead-lock occurring on high-traffic Sonic setups in the KV store [@valeriansaliou, 60566d2].
- Reverted changes made in
v1.1.5
regarding the open filesrlimit
, as this can be set from outside Sonic [@valeriansaliou, f6400c6]. - Added Chinese Traditional stopwords [@dsewnr, #87].
- Improved the way database locking is handled when calling a pool janitor; this prevents potential dead-locks under high load [@valeriansaliou, fa78372].
- Added the
server.limit_open_files
configuration variable to allow configuringrlimit
[@valeriansaliou].
- Added Kannada stopwords [@dileepbapat].
- The Docker image is now much lighter [@codeflows].
- Automatically adjust
rlimit
for the process to the hard limit allowed by the system (allows opening more FSTs in parallel) [@valeriansaliou].
- Limit the size of words that can hit against the FST graph, as the FST gets slower for long words [@valeriansaliou, #81].
- Rework Sonic Channel buffer management using a VecDeque (Sonic should now work better in harsh network environments) [@valeriansaliou, 1c2b9c8].
- FST graph consolidation locking strategy has been improved even further, based on issues with the previous rework we have noticed at scale in production (now, consolidation locking is done at a lower-priority relative to actual queries and pushes to the index) [@valeriansaliou, #68].
- FST graph consolidation locking strategy has been reworked as to allow queries to be executed lock-free when the FST consolidate task takes a lot of time (previously, queries were being deferred due to an ongoing FST consolidate task) [@valeriansaliou, #68].
- Removed special license clause introduced in
v1.0.2
, Sonic is fullMPL 2.0
now. [@valeriansaliou]
- Change how buckets are stored in a KV-based collection (nest them in the same RocksDB database; this is much more efficient on setups with a large number of buckets -
v1.1.0
is incompatible with thev1.0.0
KV database format) [@valeriansaliou].
- Bump
jemallocator
to version0.3
[@valeriansaliou].
- Re-license from
MPL 2.0
toSOSSL 1.0
(Sonic has a special license clause) [@valeriansaliou].
- Added automated benchmarks (can be ran via
cargo bench --features benchmark
) [@valeriansaliou]. - Reduced the time to query the search index by 50% via optimizations (in multiple methods, eg. the lexer) [@valeriansaliou].
- Initial Sonic release [@valeriansaliou].