This repository contains the code used to train a reinforcement learning agent to play the game of Pong. The agent is trained using a hybrid learning approach, combining supervised learning and reinforcement learning. The agent is based on a convolutional neural network and is finetuned using the Proximal Policy Optimization (PPO) algorithm. The code is uses the PyTorch Lightning Fabric, Stable_Baselines3, Gymnasium, and the Arcade Learning Environment (ALE) libraries.
The code is structured as follows:
assets/
: Contains the images and gifs used in this README.logs/
: Contains the TensorBoard logs for all experiments.models/
: Contains the weights for the trained models.experiments.ipynb
: The main notebook containing the experiments.extras.ipynb
: The code for some additional visualizations.dataset.py
: The code for creating the dataset.supervised.py
: The code for training the supervised model.utils.py
: The code for some utility functions.requirements.txt
: Contains the required libraries for this project.
This gif shows the supervised pre-trained model playing Pong. The agent is able to score points but far less consistent compared to the finetuned agent. Since the value network is not pre-trained, the predicted value is just a randomly distributed variable and does not provide any useful information.
The final gif shows a policy based on selecting random actions. Therefore, the random policy is not able to score any points. Since the paddle is not controlled by a neural network, the predicted actions and values are calculated using the finetuned model. As expected, the state values are fluctuating dramatically without the showing signs of the dominant cyclical pattern observed in the finetuned agent.
This project was developed as part of the course "Reinforcement Learning" at the International University of Applied Sciences (IU) and is not intended for use other than educational purposes.