Please note that this newsitem has been archived, and may contain outdated information or links.

11 October 2022, Computational Linguistics Seminar, Prof. Dr. Albert Gatt

Speaker: Prof. Dr. Albert Gatt (Utrecht University)

Title: Language modelling in low-resource scenarios: Two case studies

Date: Tuesday 11 October 2022

Time: 16:00

Location: Room L3.36, Lab42, Science Park 900, Amsterdam / online (Zoom)

Abstract:
The success of large-scale neural language models has brought about new challenges for low-resource languages. For such languages, training data is not as easily available as it is for languages such as English. To take an example, widely-used multilingual models such as mBERT exclude languages with a small Wikipedia footprint. By the same token, in massively multilingual resources harvested from the web, the data for these languages also tends to be of very low quality. In this seminar, I will discuss work in progress which addresses low-resource scenarios for multilingual NLP.

First, I describe some efforts towards making existing multilingual models transferrable to new languages, using adversarial techniques. It turns out that the effectiveness of such techniques is strongly influenced by the fine-tuning we perform to adapt models to downstream tasks, as well as by the nature of the tasks themselves.

I will then consider some more recent work on training Transformer models from scratch in a low-resource setting. Here, our research shows that in the absence of very large pretraining datasets, excellent results can be achieved if we trade off limited size in favour of quality and diversity

For more information, see https://projects.illc.uva.nl/LaCo/CLS/ or contact Alina Leidinger at a.gatt at uu.nl.

Please note that this newsitem has been archived, and may contain outdated information or links.

News and Events: Upcoming Events

11 October 2022, Computational Linguistics Seminar, Prof. Dr. Albert Gatt