Automatic Extraction of Document Topics - Technological Innovation for Sustainability
Conference Papers Year : 2011

Automatic Extraction of Document Topics

Gabriel Lopes
  • Function : Author
  • PersonId : 986647

Abstract

A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented.
Fichier principal
Vignette du fichier
978-3-642-19170-1_11_Chapter.pdf (68.3 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01566554 , version 1 (21-07-2017)

Licence

Identifiers

Cite

Luís Teixeira, Gabriel Lopes, Rita A. Ribeiro. Automatic Extraction of Document Topics. 2nd Doctoral Conference on Computing, Electrical and Industrial Systems (DoCEIS), Feb 2011, Costa de Caparica, Portugal. pp.101-108, ⟨10.1007/978-3-642-19170-1_11⟩. ⟨hal-01566554⟩
246 View
139 Download

Altmetric

Share

More