Please note that this newsitem has been archived, and may contain outdated information or links.
11 October 2022, Computational Linguistics Seminar, Prof. Dr. Albert Gatt
Abstract:
The success of large-scale neural language models has brought about new challenges for low-resource languages. For such languages, training data is not as easily available as it is for languages such as English. To take an example, widely-used multilingual models such as mBERT exclude languages with a small Wikipedia footprint. By the same token, in massively multilingual resources harvested from the web, the data for these languages also tends to be of very low quality. In this seminar, I will discuss work in progress which addresses low-resource scenarios for multilingual NLP.
First, I describe some efforts towards making existing multilingual models transferrable to new languages, using adversarial techniques. It turns out that the effectiveness of such techniques is strongly influenced by the fine-tuning we perform to adapt models to downstream tasks, as well as by the nature of the tasks themselves.
I will then consider some more recent work on training Transformer models from scratch in a low-resource setting. Here, our research shows that in the absence of very large pretraining datasets, excellent results can be achieved if we trade off limited size in favour of quality and diversity
Please note that this newsitem has been archived, and may contain outdated information or links.