Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to be able to add a user-defined dictionary #66

Open
lisongxi opened this issue Nov 14, 2023 · 1 comment
Open

I want to be able to add a user-defined dictionary #66

lisongxi opened this issue Nov 14, 2023 · 1 comment

Comments

@lisongxi
Copy link

It is desirable to be able to add a user-defined dictionary to indicate which words can be considered the same and which words can be considered very different

@mjpieters
Copy link
Contributor

mjpieters commented Jan 20, 2024

You can already do this; either with a custom scorer or a custom processor.

You could use a wrapping technique to apply your similar words dictionary lookups with either a scorer or a processor.

E.g., using a wrapper function to let you use any default scorer:

import typing as t

def similar_words_scorer(similar_words: t.Mapping[str, str], scorer: t.Callable[[str, str], float]) -> t.Callable[[str, str], float]:
    def wrapper(s1, s2, *args, **kwargs):
        s1 = similar_words.get(s1, s1)
        s2 = similar_words.get(s2, s2)
        return scorer(s1, s2, *args, **kwargs)
    return wrapper

If you wanted to use the default scorer:

from thefuzz import process

similar_words = {"foo": "fooz", ...}
result = process.extractOne(some_query, choices, scorer=similar_words_scorer(similar_words, process.default_scorer))

Or, you could use a processor to do the same; here is an example processor that uses the same wrapping technique to first process the input and then map the processed result through a similar word mapping:

import typing as t

def similar_word_processor(similar_words: t.Mapping[str, str], processor: t.Callable[[str], str]) -> t.Callable[[str], str]:
    def wrapper(value):
        processed = processor(value)
        return similar_words.get(processed, processed)
    return wrapper

and then use that with, say, the default processor:

from thefuzz import process

similar_words = {"foo": "fooz", ...}
result = process.extractOne(some_query, choices, processor=similar_word_processor(similar_words, process.default_processor))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants