This repository contains code and data associated with the CLiC-it 2024 paper:
Giovanni Valer, Nicolò Penzo and Jacopo Staiano. 2024. Nesciun Lengaz Lascià Endò: Machine Translation for Fassa Ladin. In Proceedings of the Tenth Italian Conference on Computational Linguistics, Pisa, Italy. [cite] [paper]
- Introduction
- Data
- Experiments
- Preliminary Experiments
- Finetuning
- Evaluation (with Transfer learning across domains and Forgetting of previous knowledge)
We built the first Fassa Ladin-Italian-English parallel corpus, and trained a machine translation model on it.
You can try translating text from English/Italian to Fassa Ladin using the model on Hugging Face Spaces 🦀
The dataset draws from multiple resources in 5 different domains: literature, news, games, laws, and brochures. It is available in the data
directory, either as a single file or split into train, validation, in-domain test, and out-of-domain test sets.
Evaluate the performance of the pre-trained models.
Fine-tune the pre-trained models on the Fassa Ladin-Italian-English parallel corpus, with the two approaches: Multilingual translation and Zero-shot Pivot-based transfer learning.
Evaluate the models' performance, investigate Transfer learning across domains, and Forgetting of previous knowledge.
If you use or build on top of this work, please cite our paper as follows:
@inproceedings{valer-etal-2024-nesciun,
title={Nesciun Lengaz Lascià Endò: {M}achine Translation for {F}assa {L}adin},
author={Valer, Giovanni and Penzo, Nicolò and Staiano, Jacopo},
booktitle={Proceedings of the 10th Italian Conference on Computational Linguistics},
publisher={CEUR-ws.org},
year={2024},
month={december},
address={Pisa, Italy}
}