Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Stemming and Stopwords in Search #21

Open
3 tasks
iwillspeak opened this issue Aug 28, 2022 · 0 comments
Open
3 tasks

Support Stemming and Stopwords in Search #21

iwillspeak opened this issue Aug 28, 2022 · 0 comments

Comments

@iwillspeak
Copy link
Owner

iwillspeak commented Aug 28, 2022

We should add in support for stemming and stopword removal. We should also consider switching from term frequency to
some variant of TFIDF for search too. This would normalise search term frequency accross documents to hopefully filter out common words. We could also consider some kind of cutoff to prevent words common to all texts from being included in the index.

Originally posted by @iwillspeak in #20 (comment)

  • Stemming and stopword removal
  • TFIDF
  • Drop insignificant terms?
@iwillspeak iwillspeak changed the title We should add in support for stemming and stopword removal here. Support Stemming and Stopwords in Search Aug 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant