Skip to content

medialab/twitter-incremental-clustering

Repository files navigation

Workshop 2024 on Event Detection

This repository contains the Python code to reproduce the experiments presented in our paper:

An Incremental Clustering Baseline for Event Detection on Twitter.

Table of contents

Installation

We encourage you to create a virtual environment to install Python 3.8.2. Below are two examples, one with conda, another with pyenv-virtualenv.

With conda

git clone https://github.com/medialab/twitter-incremental-clustering.git
cd twitter-incremental-clustering
conda create -n workshop python=3.8.2
source activate workshop
pip install -U pip setuptools
pip install -r requirements.txt

With pyenv-virtualenv

git clone https://github.com/medialab/twitter-incremental-clustering.git
cd twitter-incremental-clustering
pyenv virtualenv 3.8.2 workshop
pyenv activate workshop
pip install -U pip setuptools
pip install -r requirements.txt

Download data

We test our method on 2 datasets, Event2012 [McMinn et al., 2013] and Event2018 [Mazoyer et al., 2020]. Follow the instructions by [Cao et al., 2024] here to download the data. Place the entire ./raw_data folder under the root folder.

Preprocess data

python preprocess.py

Run event detection

  1. Run event detection on Event2018 dataset with Sentence-CamemBERT Large (GPU required):
    python run_detection.py --model sbert --sub-model "dangvantuan/sentence-camembert-large" --lang fr --dataset event2018.tsv
  2. Run event detection on Event2012 dataset with all-mpnet-base-v2 (GPU required):
     python run_detection.py --model sbert --sub-model "sentence-transformers/all-mpnet-base-v2" --lang en --dataset event2012.tsv

Generate latex table

python generate_table.py

The table is saved in ami_ari_metrics.tex

Plot execution time

After running the event detection several times with several --batch-size values, plot the effect of the parameter on AMI and execution time with the command:

python plot_time.py

The figure is saved in timeplot.pdf

About

Embeddings for event detection and tracking in social media data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages