Interpretable Topic Extraction and Word Embedding Learning Using Row-Stochastic DEDICOM

The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.

Keywords

Domains

Fichier principal

497121_1_En_22_Chapter.pdf (3.5 Mo)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-03414746

Submitted on : Thursday, November 4, 2021-3:58:20 PM

Last modification on : Friday, June 23, 2023-4:24:04 PM

Long-term archiving on : Saturday, February 5, 2022-7:10:30 PM

Dates and versions

hal-03414746 , version 1 (04-11-2021)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-03414746 , version 1
DOI : 10.1007/978-3-030-57321-8_22

Cite

Lars Hillebrand, David Biesner, Christian Bauckhage, Rafet Sifa. Interpretable Topic Extraction and Word Embedding Learning Using Row-Stochastic DEDICOM. 4th International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Aug 2020, Dublin, Ireland. pp.401-422, ⟨10.1007/978-3-030-57321-8_22⟩. ⟨hal-03414746⟩