Please note that this newsitem has been archived, and may contain outdated information or links.
28 June 2022, Computational Linguistics Seminar, Hila Chefer
Transformers have revolutionized deep learning research across many disciplines, starting from NLP and expanding to vision, speech, and more. In my talk, I will explore several milestones toward interpreting all families of Transformers, including unimodal, bi-modal, and encoder-decoder Transformers. I will present working examples and results that cover some of the most prominent models, including CLIP, BERT, LXMERT, and ViT. I will then present our recent explainability-driven fine-tuning technique that significantly improves the robustness of Vision Transformers (ViTs). The loss we employ ensures that the model bases its prediction on the relevant parts of the input, rather than supportive cues (e.g., background). This can be done with very little added supervision in the form of foreground masks, or without any such supervision.
Please note that this newsitem has been archived, and may contain outdated information or links.