This repository houses the code for the project of the "Systems Programming" Lecture, conducted at Harokopio's University of Athens (HUA) Department of Informatics and Telematics (DIT).
A list of clients (in this case, just one) send the user, coupon and event data to the system using a kafka producer (or multiple, depending on the configuration). Each class of data is produced in its own topic. The default configuration of the system will create one kafka consumer for each topic. The consumers send the data to the corresponding endpoint of the API. That data is stored in a datastore, in this case it's an sqlite database. The recommendation generator algorithm produces recommendations based on the frequency of the betting sport of each user. The generation process is executed every time an event is uploaded to the system or, additionally, it can be executed manually, using the corresponding endpoint. The end-users of the system can access their recommendations by providing their Id to the /recommendations/{user_id} endpoint.
The data processing pipeline is handled in the following fashion:
- A Kafka setup with 1 producer per client and 1 consumer per topic
- Each kafka producer is a FastAPI server that supports 3 endpoints:
POST /copupons
,POST /events
andPOST /users
(producers/main.py). No validation is done at this level. - Each kafka consumer makes use of the
httpx
library (consumers/main.py) to send the produced data to the corresponding endpoint at the main API (api/main.py). No validation is done at this level. - Each POST request made to the main API is validated based on the models found in
api/models/
. If the validation succeeds, then the data is added to the database and commited.
Retrieval of data other than recommendation is supported through the main API, however the use cases of the project do not require to make use of this functionality.
The recommendations are generated using a simple frequency-based algorithm. The modularity of the system, however, allows for the use of any type of algorithm.
- There is one database which is indicated by the gray color. Different "database icons" have been used to indicate different tables of the database.
- Producers and consumers are developed using the AIOKafka Python library.
- The SQL database can be easily changed in /api/dependencies/database.py by modifying the URL used in
create_engine
. sqlmodel is used to handle model files and the ORM of the whole project, which works great with FastAPI -- the framework used to develop the system's API.
Tip
Take a look at the main API's endpoints in api/openapi.json. Additionally, if you run the docker-compose setup of the system you can visit the following page for a more interactive look (model schemas, example requests, etc.) at the endpoints: 0.0.0.0:8098/docs
. Change the IP or the port of the URL in case the default parameters have been changed.
GET /users/{user_id}
POST /users
GET /events/{event_id}
POST /events
GET /coupons/{user_id}
GET /coupons/user/{user_id}
POST /coupons
GET /recommendation/{user_id}
POST /recommendation/generate
POST /users
POST /events
POST /coupons
CPU Used: Apple M3 Pro, 12-core
Configuration: docker-compose
Uploading concurrently a list of dummy data for each entity of users, events and coupons using an external httpx server (file is found in create_dummy.py) takes the following time:
Entity | Amount | Time (seconds) |
---|---|---|
Users | 100 | 2.6263 |
Events | 100 | 2.4140 |
Coupons | 100 | 2.4488 |
Note: Docker in MacOS is notoriously known for its slow performance. No optimizations were done to improve the time of the run
To initialize the appropriate Python environemnt a conda
installation is recommended.
Run the following line to install and use the environemnt.
conda env create -f environment.yml
conda activate sys_prog
python -m api.main
To upload some dummy data in the dummy database, please run the following lines while the API is running:
python -m api.data.create_dummy
python -m api.data.upload_dummy
Now the sqlite database contains some dummy data examples.
Using the provided compose.yaml file that incorporates all the containers listed below, in addition to the kafka container and configuration, you can easily install and run the whole system using the following command:
docker compose up
Important
Keep in mind that if you do not want to use the docker-compose solution you will have to
manually start a kafka server and point the the producer and consumer containers
to the correct IP and port using the BOOTSTRAP_SERVER
environemnt variable available
in both containers.
PORT
: Indicates the port that the container will expose. Should be a string. Defaults to"8098"
. Required
Build the container without a tag:
docker build . -f api/api.Dockerfile
or additionally provide a tag
docker build -t hua-sysprog-project-api . -f api/api.Dockerfile
Run the container using the following
docker run \
-e PORT="8098" \
hua-sysprog-project-api:latest
BOOTSTRAP_SERVER
: The kafka server that the producer should be listening to. Defaults tolocalhost:9094
. RequiredTOPICS
: Either a single string or a list of comma separated strings of topics. No default value. Required
Build the container without a tag:
docker build . -f consumers/consumer.Dockerfile
or additionally provide a tag
docker build -t hua-sysprog-project-consumers . -f consumers/consumer.Dockerfile
Run the container using the following
docker run \
-e TOPICS="test_topic" \
-e BOOTSTRAP_SERVER="0.0.0.0:2994" \
hua-sysprog-project-consumer:latest
BOOTSTRAP_SERVER
: The kafka server that the producer should be listening to. Defaults tolocalhost:9094
. Required
Build the container without a tag:
docker build . -f consumers/consumer.Dockerfile
or additionally provide a tag
docker build -t hua-sysprog-project-consumers . -f consumers/consumer.Dockerfile
Run the container using the following
docker run \
-e BOOTSTRAP_SERVER="0.0.0.0:2994" \
hua-sysprog-project-producer:latest
Development of each module is done locally first.
Take note of the following caveats for local environemnt to work correctly:
- To generate a local
sqlite
database when running the API you'll need to do the run the api using this command:ENV=dev python -m api.main
Warning
This environment variable needs to be added in every local execution of the modules and submodules. For example, if you want to run the main function of api/recommendations/frequency.py you will need to include this, as the program will not know the correct path to the database.
- To upload the dummy data created in create_dummy.py and uploaded using upload_dummy.py please specify the port of the API using the following environemnt variable:
PORT=8098 python -m api.data.upload_dummy
- The default behaviour of upload_dummy is to point to the docker-compose instance of the API. With this environemnt variable change you can point to whatever port the API works in