Credit Risk Prediction Model

Project Overview
Live Application
Dataset Description
Approach
- Initial Model Development
- Manual Fine-Tuning
Results
Key Findings
Feature Importance
Installation Guide
- Prerequisites
- Steps
Running the Application
Jupyter Book
Note on Dependencies
Pydantic Deprecation Warnings
Deployment
API Endpoints
User Interface
Future Improvements

Project Overview

This project focuses on developing, deploying, and serving a machine learning model for credit risk prediction using the Home Credit dataset. The goal is to create an interpretable, deployable, and financially sound model that effectively identifies potential loan defaulters while maintaining a balance between precision and recall.

Live Application

The application is deployed and accessible at: https://retail-bank-risk-app-562777194669.us-central1.run.app/

Dataset Description

The Home Credit dataset contains information about loan applications, including:

Applicant demographics
Financial history
Loan specifics
External data sources

The main data tables used are:

application_train.csv
application_test.csv

Approach

Initial Model Development

Data Preprocessing:
- Loaded and cleaned raw data
- Performed memory optimization
- Handled missing values and outliers
- Created derived features
Feature Engineering:
- Binned continuous variables (age, income, credit amount)
- Created financial ratios (debt-to-income, credit-to-goods, annuity-to-income)
- Engineered time-based features
Model Development:
- Used XGBoost algorithm
- Optimized hyperparameters using Optuna (200 trials)
- Selected 40 key features for interpretability and relevance
Evaluation Metrics:
- Focused on recall and F2-score
- Analyzed precision-recall trade-offs

Manual Fine-Tuning

Enhanced Financial Analysis:
- Incorporated existing mortgage and loan payments
- Implemented a comprehensive debt-to-income ratio calculation
- Set a 40% threshold for total debt-to-income ratio
Improved Risk Assessment:
- Adjusted default probability based on debt-to-income ratio
- Implemented a more nuanced risk level determination
Realistic Financial Assumptions:
- Used a 5% annual interest rate for loan calculations
- Improved monthly payment calculations
Expanded Anomaly Detection:
- Set specific bounds for key financial variables
- Flagged and reported anomalies in model output
Enhanced Error Handling and Logging:
- Improved input validation and error messaging
- Added detailed logging of financial ratios and decision points

Results

Kaggle Competition Score: 67%
Test Set Performance:
- Recall: 74.42%
- Precision: 11.23%
- F1-Score: 19.52%
- F2-Score: 35.02%
- AUC-ROC: 0.6754

While these metrics indicate that the final model may not be the best in terms of raw performance, it's important to note that our manual fine-tuning process has significantly improved the model's effectiveness on edge cases. The incorporation of domain knowledge and financial best practices allows the model to make more nuanced and accurate decisions in complex scenarios that may not be well-represented in the general test set.

Key Findings

The model demonstrates high recall (74.42%) for detecting defaults, crucial in credit risk management.
This high recall comes at the cost of low precision (11.23%), indicating a tendency to overpredict defaults.
The model errs on the side of caution, which may be acceptable if the cost of missing a default significantly outweighs the cost of false alarms.
The precision-recall curve suggests the model performs moderately well but is dealing with imbalanced data.
Manual fine-tuning improved the model's alignment with real-world financial decision-making processes, particularly for edge cases and complex scenarios.

Feature Importance

Top features influencing the model's predictions include:

External source scores
Age
Income-related features
Loan amount and goods price
Various derived financial ratios

Installation Guide

Prerequisites

Python 3.10+
pip
virtualenv (optional but recommended)

Steps

Clone the repository:

git clone https://github.com/vytautas-bunevicius/retail-bank-risk-evaluation.git
cd retail-bank-risk-evaluation

(Optional) Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the project and its dependencies:
```
pip install -e .
```

Running the Application

To run the application, use the following command:

uvicorn app.main:app

This command:

Uses uvicorn to run the FastAPI application
Specifies app.main:app as the application import string, where:
- app.main is the Python module path
- app is the FastAPI application instance within that module

By default, this will run the server on http://127.0.0.1:8000. If you need to specify a different host or port, you can use the --host and --port options:

uvicorn app.main:app --host 0.0.0.0 --port 8080

Jupyter Book

To view all notebooks in one place, you can run the Jupyter Book in the root directory. Follow these steps:

Ensure you have Jupyter Book installed (you should if you have installed the requirements.txt):
```
pip install jupyter-book
```
Build the book:
```
jupyter-book build .
```
Open the generated _build/html/index.html file in your web browser to view the compiled book.

This Jupyter Book provides a comprehensive view of all project notebooks, making it easier to navigate and understand the entire workflow.

Note on Dependencies

The setup.py file in this project is configured to read and install dependencies from requirements.txt. When you run pip install -e ., it installs both the project and all dependencies listed in requirements.txt.

If you make changes to requirements.txt, you may need to run pip install -e . again to update the installed dependencies.

Pydantic Deprecation Warnings

You may see deprecation warnings related to Pydantic validators. These are not errors, but suggestions to update to the newer Pydantic V2 style validators. Consider updating these in future maintenance of the project.

Deployment

The application is deployed on Google Cloud Platform using Cloud Run. To deploy your own instance:

Install and set up the Google Cloud SDK
Authenticate with Google Cloud:
```
gcloud auth login
```

Set your project ID:

gcloud config set project YOUR_PROJECT_ID

Build and deploy the application using Cloud Build:
```
gcloud builds submit --config cloudbuild.yaml .
```

This command uses the cloudbuild.yaml configuration file to build and deploy the application, ensuring consistency and reproducibility in the deployment process.

API Endpoints

/: Serves the loan application form (GET)
/predict: Makes a loan risk prediction (POST)
/health: Health check endpoint (GET)

User Interface

The application features a user-friendly interface for loan applications. Here are some screenshots of the UI:

Future Improvements

Incorporate additional data sources beyond the application data
Explore advanced ensemble techniques
Consider adding features like AMT_CREDIT_SUM_DEBT from the "bureau.csv" file
Further refine the financial analysis based on industry feedback
Continuously monitor and update anomaly detection thresholds
Conduct more extensive testing on edge cases to quantify the improvements from manual fine-tuning

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
app		app
data		data
images		images
models		models
notebooks		notebooks
src/retail_bank_risk		src/retail_bank_risk
tests		tests
.dockerignore		.dockerignore
.gcloudignore		.gcloudignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
cloudbuild.yaml		cloudbuild.yaml
cspell.json		cspell.json
intro.md		intro.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Prediction Model

Table of Contents

Project Overview

Live Application

Dataset Description

Approach

Initial Model Development

Manual Fine-Tuning

Results

Key Findings

Feature Importance

Installation Guide

Prerequisites

Steps

Running the Application

Jupyter Book

Note on Dependencies

Pydantic Deprecation Warnings

Deployment

API Endpoints

User Interface

Future Improvements

About

Releases

Packages

Languages

License

vytautas-bunevicius/retail-bank-risk-evaluation

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Prediction Model

Table of Contents

Project Overview

Live Application

Dataset Description

Approach

Initial Model Development

Manual Fine-Tuning

Results

Key Findings

Feature Importance

Installation Guide

Prerequisites

Steps

Running the Application

Jupyter Book

Note on Dependencies

Pydantic Deprecation Warnings

Deployment

API Endpoints

User Interface

Future Improvements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages