Spaceship Titanic Predictor

Project Overview

This project aims to predict passenger transportation outcomes on the Spaceship Titanic using machine learning techniques. It includes exploratory data analysis (EDA), statistical inference, machine learning model development, and deployment of a prediction service.

Setup

Prerequisites

Python 3.12+
Docker

Installation

Clone the repository:

git clone https://github.com/vytautas-bunevicius/kaggle-spaceship-titanic.git
cd kaggle-spaceship-titanic

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt

Build the Docker image:

docker build -t spaceship-titanic-predictor .

Run the Docker container:

docker run -p 8080:8080 spaceship-titanic-predictor

Access the application at http://localhost:8080

Project Structure

notebooks/: Jupyter notebook containing EDA, statistical analysis, and model development
src/: Source code for the prediction service
data/: Dataset files
models/: Saved machine learning models
templates/: HTML templates for the web interface
Dockerfile: Instructions for building the Docker image
requirements.txt: Python dependencies

Exploratory Data Analysis

Our EDA process included:

Statistical summaries of passenger data
Visualization of key features and their relationships
Anomaly detection in numerical features
Correlation analysis between variables

Key findings and visualizations can be found in the spaceship_titanic_analysis.ipynb notebook.

Statistical Inference

We conducted statistical inference to understand the relationships between various features and the likelihood of transportation. This included:

Defining the target population (all passengers on the Spaceship Titanic)
Formulating hypotheses about factors influencing transportation
Constructing confidence intervals
Conducting t-tests and chi-square tests

Detailed analysis and results are available in the spaceship_titanic_analysis.ipynb notebook.

Machine Learning Models

We experimented with several machine learning models, including:

Logistic Regression
Random Forest
XGBoost
Stacked Ensemble (using H2O AutoML)

Hyperparameter tuning was performed using Optuna, and model ensembling was done using H2O's AutoML capabilities. The final model achieved a Kaggle score of > 0.79.

Model development and evaluation can be found in the spaceship_titanic_analysis.ipynb notebook.

Model Deployment

The best-performing model (Stacked Ensemble) was deployed as a Flask web application, containerized using Docker for easy deployment and scalability.

UI Guide

The web interface allows users to input passenger information and receive a prediction on whether the passenger will be transported.

Fill in the passenger details in the form:
- Home Planet
- CryoSleep status
- Destination
- Age
- VIP status
- Expenses (Room Service, Food Court, Shopping Mall, Spa, VR Deck)
- Cabin
Click the "Predict" button.
The prediction result will be displayed, including:
- Transportation outcome (Transported or Not Transported)
- Probability of transportation
- Interpretation of the probability (e.g., "There is a high chance that you will be transported.")
A visual probability bar indicates the likelihood of transportation.
The interface also displays a feature importance chart to help understand which factors most influence the prediction.

UI Screenshot

Caption: The Spaceship Titanic Predictor web interface, showing the input form and prediction results

Improvements and Future Work

Collect more data to improve model accuracy
Experiment with deep learning models
Implement real-time model updating
Optimize the model for faster prediction times
Implement user feedback mechanism to continually improve the model
Explore additional feature engineering techniques
Conduct more in-depth analysis of feature interactions
Implement A/B testing for different model versions

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
images		images
models		models
notebooks		notebooks
src/spaceship_titanic		src/spaceship_titanic
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cspell.json		cspell.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spaceship Titanic Predictor

Project Overview

Table of Contents

Setup

Prerequisites

Installation

Project Structure

Exploratory Data Analysis

Statistical Inference

Machine Learning Models

Model Deployment

UI Guide

UI Screenshot

Improvements and Future Work

About

Releases

Packages

Languages

License

vytautas-bunevicius/kaggle-spaceship-titanic

Folders and files

Latest commit

History

Repository files navigation

Spaceship Titanic Predictor

Project Overview

Table of Contents

Setup

Prerequisites

Installation

Project Structure

Exploratory Data Analysis

Statistical Inference

Machine Learning Models

Model Deployment

UI Guide

UI Screenshot

Improvements and Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages