BEGIN:VCALENDAR
VERSION:2.0
PRODID:ILLC Website
X-WR-TIMEZONE:Europe/Amsterdam
BEGIN:VTIMEZONE
TZID:Europe/Amsterdam
X-LIC-LOCATION:Europe/Amsterdam
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700329T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701025T030000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:/NewsandEvents/Archives/2022/newsitem/13901/11
 -October-2022-Computational-Linguistics-Seminar-Pr
 of-Dr-Albert-Gatt
DTSTAMP:20221003T134310
SUMMARY:Computational Linguistics Seminar, Prof. D
 r. Albert Gatt
ATTENDEE;ROLE=Speaker:Prof. Dr. Albert Gatt (Utrec
 ht University)
DTSTART;TZID=Europe/Amsterdam:20221011T160000
LOCATION:Room L3.36, Lab42, Science Park 900, Amst
 erdam / online (Zoom)
DESCRIPTION:Abstract:  The success of large-scale 
 neural language models has brought about new chall
 enges for low-resource languages. For such languag
 es, training data is not as easily available as it
  is for languages such as English. To take an exam
 ple, widely-used multilingual models such as mBERT
  exclude languages with a small Wikipedia footprin
 t. By the same token, in massively multilingual re
 sources harvested from the web, the data for these
  languages also tends to be of very low quality. I
 n this seminar, I will discuss work in progress wh
 ich addresses low-resource scenarios for multiling
 ual NLP.  First, I describe some efforts towards m
 aking existing multilingual models transferrable t
 o new languages, using adversarial techniques. It 
 turns out that the effectiveness of such technique
 s is strongly influenced by the fine-tuning we per
 form to adapt models to downstream tasks, as well 
 as by the nature of the tasks themselves.  I will 
 then consider some more recent work on training Tr
 ansformer models from scratch in a low-resource se
 tting. Here, our research shows that in the absenc
 e of very large pretraining datasets, excellent re
 sults can be achieved if we trade off limited size
  in favour of quality and diversity
X-ALT-DESC;FMTTYPE=text/html:\n  <p>Abstract:<br>\
 n  The success of large-scale neural language mode
 ls has brought about new challenges for low-resour
 ce languages. For such languages, training data is
  not as easily available as it is for languages su
 ch as English. To take an example, widely-used mul
 tilingual models such as mBERT exclude languages w
 ith a small Wikipedia footprint. By the same token
 , in massively multilingual resources harvested fr
 om the web, the data for these languages also tend
 s to be of very low quality. In this seminar, I wi
 ll discuss work in progress which addresses low-re
 source scenarios for multilingual NLP.</p>\n  <p>F
 irst, I describe some efforts towards making exist
 ing multilingual models transferrable to new langu
 ages, using adversarial techniques. It turns out t
 hat the effectiveness of such techniques is strong
 ly influenced by the fine-tuning we perform to ada
 pt models to downstream tasks, as well as by the n
 ature of the tasks themselves.</p>\n  <p>I will th
 en consider some more recent work on training Tran
 sformer models from scratch in a low-resource sett
 ing. Here, our research shows that in the absence 
 of very large pretraining datasets, excellent resu
 lts can be achieved if we trade off limited size i
 n favour of quality and diversity</p>\n
URL:https://projects.illc.uva.nl/LaCo/CLS/
CONTACT:Alina Leidinger at a.gatt at uu.nl
END:VEVENT
END:VCALENDAR
