Short Text Feature Extension Based on Improved Frequent Term Sets - Intelligent Information Processing VIII
Conference Papers Year : 2016

Short Text Feature Extension Based on Improved Frequent Term Sets

Abstract

A short text feature extension algorithm based on improved frequent word set is proposed. By calculating support and confidence, the same category tendencies of frequent term sets are extracted. Correlations based frequent term sets are defined to further extend the term set. Meanwhile, information gain is introduced to traditional TF-IDF, better expressing the category distribution information and the weight of word for each category is enhanced. All term pairs with external relations are extracted and the frequent term set is expanded. Finally, the word similarity matrix is constructed via the frequent word set, and the symmetric non-negative matrix factorization technique is applied to extend the feature space. Experiments show that the constructed short text model can improve the performance of short text clustering.
Fichier principal
Vignette du fichier
433802_1_En_18_Chapter.pdf (501.85 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01614992 , version 1 (11-10-2017)

Licence

Identifiers

Cite

Huifang Ma, Lei Di, Xiantao Zeng, Li Yan, Yuyi Ma. Short Text Feature Extension Based on Improved Frequent Term Sets. 9th International Conference on Intelligent Information Processing (IIP), Nov 2016, Melbourne, VIC, Australia. pp.169-178, ⟨10.1007/978-3-319-48390-0_18⟩. ⟨hal-01614992⟩
99 View
128 Download

Altmetric

Share

More