The Stroke Risk Predictor is a machine learning-based web application designed to assess an individual's risk of stroke based on various health and lifestyle factors. This tool aims to aid healthcare professionals and individuals in identifying potential stroke risks early, enabling timely intervention and preventive measures.
- Comprehensive Analysis: Analysis of a stroke prediction dataset.
- Advanced Feature Engineering: Creation of meaningful predictors.
- Model Evaluation: Comparison of multiple machine learning models (Logistic Regression, XGBoost, CatBoost).
- Optimization: Focused on high recall to minimize missed stroke cases.
- User-Friendly Interface: Web interface for easy risk assessment.
After rigorous testing and optimization, CatBoost was selected as the final model due to its superior performance in balancing recall and precision for this imbalanced dataset.
The project includes a Flask-based web application that provides an intuitive interface for users to input their health information and receive a stroke risk prediction.
The Stroke Risk Predictor is deployed on Google Cloud Platform App Engine, ensuring high availability and scalability. You can access the live application here.
To set up this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/vytautas-bunevicius/stroke-risk-predictor.git cd stroke-risk-predictor
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS and Linux:
source venv/bin/activate
- On Windows:
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Flask application:
python src/app.py
-
Open a web browser and navigate to
http://localhost:5000
to view the application.
To run the test suite:
python -m pytest tests/
To deploy the application to Google Cloud Platform App Engine:
- Ensure you have the Google Cloud SDK installed and configured.
- Run the following command in the project root directory:
gcloud app deploy
- Continuous model updates with new data.
- Expansion of the feature set to include more relevant health indicators.