Automatic Extraction of Document Topics

Luís Teixeira; Gabriel Lopes; Rita A. Ribeiro

doi:10.1007/978-3-642-19170-1_11

Conference Papers Year : 2011

Automatic Extraction of Document Topics

(1) , (2) , (1)

1
2

Luís Teixeira

Function : Author
PersonId : 1013297

NOVA - Universidade Nova de Lisboa = NOVA University Lisbon

Gabriel Lopes

Function : Author
PersonId : 986647

DI - Departamento de Informática

Rita A. Ribeiro

Function : Author
PersonId : 985916

NOVA - Universidade Nova de Lisboa = NOVA University Lisbon

Abstract

A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented.

Keywords

Domains

Computer Science [cs]

Fichier principal

978-3-642-19170-1_11_Chapter.pdf (68.3 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01566554

Submitted on : Friday, July 21, 2017-11:25:18 AM

Last modification on : Wednesday, November 10, 2021-5:18:05 PM

Dates and versions

hal-01566554 , version 1 (21-07-2017)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01566554 , version 1
DOI : 10.1007/978-3-642-19170-1_11

Cite

Luís Teixeira, Gabriel Lopes, Rita A. Ribeiro. Automatic Extraction of Document Topics. 2nd Doctoral Conference on Computing, Electrical and Industrial Systems (DoCEIS), Feb 2011, Costa de Caparica, Portugal. pp.101-108, ⟨10.1007/978-3-642-19170-1_11⟩. ⟨hal-01566554⟩

Automatic Extraction of Document Topics

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share