Feb 1, 2012. Eli Cortez
Title: Information Extraction over Textual Sources
Abstract: The growing use of text files for information exchange, such as HTML pages, XML documents, e-mail, blogs posts, tweets, RSS and SMS messages, brings numerous problems related to how to properly exploit the information contained therein. In particular, problems related to Information Extraction (IE) from text have motivated several works in various scientific communities in areas such as Databases, Data Mining, Information Retrieval and Artificial Intelligence. In this talk, it will be presented an overview of the IE problem and methods that have been proposed in recent literature to deal with it. The IE problem consists in extracting values of interest arranged in unstructured texts, such as postal addresses, bibliographic citations, classified ads, that are implicitly present in textual sources from a variety of different domains. It will be discussed the main and most recent approaches proposed in the literature, with particular emphasis on probabilistic methods.