Please note that this newsitem has been archived, and may contain outdated information or links.

14 March 2002, Co-training in Text Classification, Stan Matwin

Speaker: Stan Matwin
(University of Ottawa and Universite de Paris XI)
Date: Thursday 14 March 2002
Time: 16:00-17:00
Location: Room E.020, Roetersstraat 11 (Faculty of Economics), Amsterdam.

Text classification is a building block for many intelligent systems, in particular in Information Extraction and Text Mining. Classifier induction is often a technique of choice to build text classifiers. Past experience in text mining shows that, by and large, there are limited differences in performance between different existing classifier induction systems. Therefore, there is interest in research on other aspects of text classification, in particular on representation of the text given to the classifier, and on limiting the effort of the user in labeling the examples. In this presentation, we will give an account of an experiment following the above line of research. In the experiment, we have used the co-training idea in which two classifiers obtained from mutually redundant representations were used to train each other. This approach to learning can be viewed as a multi-agent learning task, with a natural cognitive justification. In the email classification task, and using Support Vector Machine as the classifier induction system, we have obtained significant performance improvement w.r.t. the use of a single classifier. We will conclude in outlining how we plan to use some of the above techniques in the Information Extraction system Caderige at Universite Paris XI.

Stan Matwin is a Professor of Information Technology and Engineering, Director of Graduate Studies in Computer Science, and Director of the Graduate Certificate on Electronic Commerce at the University of Ottawa. He is the former President of the Canadian Society for Computational Studies of Intelligence, and and former Head of IFIP WG 12.2 (Machine Learning). His research interests are in Data and Text Mining and Knowledge-based Systems. He has authored and co-authored some 100 research papers in refereed conferences and journals. Currently on sabbatical, he is a Visting Professor at the Laboratoire de Recherche en Informatique, Universite Paris XI et CNRS. He is Programme Chair of the 12th International Conference on Inductive Logic Programming in Sydney, Australia, in July 2002.

