Title: Exploration of Contextual Information for Semantic Enrichment in Text Representations
Abstract: Text Mining process is essential to knowledge discovery in textual databases. In order to extract patterns from a textual collection, it is important to provide meaning to data, reflecting its characteristics by an efficient representation that transmits the original relationships of the database. One of the most used representation technique is Bag of Words (BoW), which relates documents by frequency of their terms based on Vector Space Model. Among the limitations of this technique is the loss of semantic aspects in the construction process of the matrix structure, which considers only lexical features of the text. To attenuate this problem, which impairs the reliability of extracted patterns, a new method is proposed to semantically enrich textual representations. External sources of knowledge are used to identify contexts (groups of concepts) in documents, representing them in the Vector Space Model. This method is being evaluated through the classification task of English databases: 20Ng (e-mail messages), BBC (news) and SemEval (reviews); and Portuguese databases: BestSports (news), Manchetômetro (news) and Buscapé (reviews), using accuracy measure.