Skip to content

A Kotlin Multiplatform implementation of the SymSpell algorithm.

License

Notifications You must be signed in to change notification settings

Wavesonics/SymSpellKt

Repository files navigation

SymSpell Spell Check Kotlin

Build Status Maven Central License

Kotlin

KMP badge-jvm badge-android badge-wasm badge-wasmi badge-jsir badge-linux badge-windows badge-mac-x86 badge-mac-arm badge-ios badge-ios-sim

This is a Kotlin Multiplatform implementation of the symspell fuzzy search algorithm. It has been ported from this Java implementation of symspell.

Dependency

implementation("com.darkrockstudios:symspellkt:3.1.0")

Sample

Try out the sample desktop application:

gradlew sampleCompose:run

Try sample here Sample Compose Screenshot

SymSpell v6.6 (Bigrams)

  • the optional bigram dictionary in order to use sentence level context information for selecting best spelling correction.

SymSpell

  • The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance.
  • It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.
  • Opposite to other algorithms only deletes are required, no transposes + replaces + inserts. Transposes + replaces + inserts of the input term are transformed into deletes of the dictionary term.
  • The speed comes from the inexpensive delete-only edit candidate generation and the pre-calculation.

Fdic: Binary Frequency Dictionary file format

In order to optimize for size on disk, and speed of loading and parsing, I made a little file format to encode the common plain text frequency dictionaries use with SymSpell style spell checkers.

fdic is both smaller on disk, and faster to load than either plain text or gzipped dictionaries. In some cases being 70% faster to load and parse, and more than 40% smaller on disk.

There is a CLI program for producing .fdic files from a standard plain text frequency dictionary, as well as some extension functions in an addon library for loading them into a SymSpellKt SpellChecker object.