A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
-
Updated
Nov 16, 2024 - HTML
A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
PySpark phonetic and string matching algorithms
Implements Rocchio Query Expansion - similar to "related searches:" found at popular search engines but based on relevant documents selected by the end-user
A Natural Language Processing with SMS Data to predict whether the SMS is Spam/Ham with various ML Algorithms like multinomial-naive-bayes,logistic regression,svm,decision trees to compare accuracy and using various data cleaning and processing techniques like PorterStemmer,CountVectorizer,TFIDF Vetorizer,WordnetLemmatizer. It is implemented usi…
Implementation of a Vector Space Retrieval Model using TF-IDF and cosine similarity on the Cranfield document corpus
Advanced model of Bm25 is Bm25+, which was implemented and compared with the baseline model (bm25)
Performs tokenization, stemming, lemmatization, index creation, index compression and ranked retrieval of Cranfield documents
Snowball version of the Porter stemmer for the Lithuanian language.
The MOOC Recommender System utilizes NLP techniques for course recommendations in Massive Open Online Courses (MOOCs). It processes raw data, leveraging Tokenization, Porter Stemming, Cosine Similarity, etc., to extract tags from course descriptions, summaries, syllabuses, instructors, and subjects.
Created Hate speech detection model using Count Vectorizer & XGBoost Classifier with an Accuracy upto 0.9471, which can be used to predict tweets which are hate or non-hate.
An efficient implementation of the German porter-stemming algorithm in Golang.
Small code snippets written in Python covering fundamental concepts in NLP used in all major NLP projects.
A Search Engine based on the principle of TF-IDF and comparing documents in a vector space using Cosine Similarity
Web search engine to retrieve most relevant web-pages for user search query from web-pages crawled on the UIC domain
🔝 HW1 of Intelligent Information Retrieval MSc Course ECE@UT
Collection of stemming algorithms in Rust
MacOS desktop application for processing Google Takeout export files
Python Implimentation of the Famous Porter Stemmer Algorithm used in Morphological Analysis of english text corpora.
Add a description, image, and links to the porter-stemmer topic page so that developers can more easily learn about it.
To associate your repository with the porter-stemmer topic, visit your repo's landing page and select "manage topics."