GazeTransformer: Egocentric Gaze Forecasting with Transformers

This repository contains the source code for my Masterthesis. The checkpoints used for the comparison in the evaluation can be found in this folder.

Citation

Tim Rolff, H. Matthias Harms, Frank Steinicke, Simone Frintrop: GazeTransformer: Gaze Forecasting for Virtual Reality Using Transformer Networks, Pattern Recognition. DAGM GCPR 2022, Lecture Notes in Computer Science, vol 13485. Springer, [PDF]

Abstract

During the last decade, convolutional neural networks have become the state-of-the-art approach for many computer vision problems. Recent publications in natural language processing boost the state-of-the-art performance for sequence-to-sequence models significantly by applying a novel Transformer architecture based on self-attention. Recently, researchers applied Transformers to computer vision tasks, such as object detection, image completion, and saliency prediction, competing with the state-of-the-art.

Human gaze information in virtual reality is essential for many applications, such as gaze-contingent rendering or eye movement-based interactions. By defining gaze forecasting as a time-series prediction problem, we propose a novel Transformer-based architecture, called GazeTransformer, forecasting users' gaze in dynamic virtual reality environments. Based on provided raw data, we generated an unfiltered dataset containing all gaze behavior and compared GazeTransformer to two state-of-the-art methods for gaze forecasting. Further, we evaluated different image encodings, enabling us to combine data from different sources in virtual reality, building a time-dependent sequence. As a result, GazeTransformer improved the baseline, using the current gaze for the prediction, by 8.2% (from a mean error of 3.67° to 3.37°). Further, GazeTransformer beat the prior state-of-the-art significantly (3.37° vs. 7.04° mean error), tested on the generated dataset containing all gaze behavior.

Usage

Requirements

The requirements are listed in the requirements.txt.

Dataset

Step 1: Download the dataset from the FixationNet project homepage: https://cranehzm.github.io/FixationNet

Step 2: Place the dataset in the ./dataset folder. E.g. ./dataset/rawData and ./dataset/dataset

Step 3: Generate our unfiltered dataset. Either run ./dataloader/generate.py or run specific scripts ./dataloader/generation/.

Training and Evaluation

Run the train.*.py and test.*.py scripts. The models can be found in the ./model folder. The checkpoints are stored in the same folder. ./scripts contains all scripts used during the evaluation of the thesis. ./eval_video.py generates videos for qualitative analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dataloader		dataloader
model		model
scripts		scripts
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
eval_video.py		eval_video.py
requirements.txt		requirements.txt
test.py		test.py
test_fixationnet.py		test_fixationnet.py
train.py		train.py
train_fixationnet.py		train_fixationnet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GazeTransformer: Egocentric Gaze Forecasting with Transformers

Citation

Abstract

Usage

Requirements

Dataset

Training and Evaluation

About

Releases

Packages

Languages

harm-matthias-harms/GazeTransformer

Folders and files

Latest commit

History

Repository files navigation

GazeTransformer: Egocentric Gaze Forecasting with Transformers

Citation

Abstract

Usage

Requirements

Dataset

Training and Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages