Building a Knowledge Based Summarization System for Text Data Mining - Machine Learning and Knowledge Extraction
Conference Papers Year : 2018

Building a Knowledge Based Summarization System for Text Data Mining

Andrey Timofeyev
  • Function : Author
  • PersonId : 1043668
Ben Choi
  • Function : Author
  • PersonId : 1043669

Abstract

This paper provides details on building a knowledge based automatic summarization system for mining text data. The knowledge based system mines text data on documents and webpages to create abstractive summaries by generalizing new concepts, deriving main topics, and creating new sentences. The knowledge based system makes use of the domain knowledge provided by Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The system extracts syntactic structures and semantic features by employing natural language processing techniques and Cyc knowledge base and reasoning engine. The system creates a summary of the given documents in three stages: knowledge acquisition, knowledge discovery, and knowledge representation for human readers. The knowledge acquisition derives syntactic structure of each sentence in the documents and maps their words and their syntactic relationships into Cyc knowledge base. The knowledge discovery abstracts novel concepts and derives main topics of the documents by exploring the ontology of the mapped concepts and by clustering the concepts. The knowledge representation creates new English sentences to summarize the documents. This system has been implemented and integrated with Cyc knowledge based system. The implementation encodes a process consisting seven stages: syntactic analysis, mapping words to Cyc, concept propagation, concept weights and relations accumulation, topic derivation, subject identification, and new sentence generation. The implementation has been tested on various documents and webpages. The test performance data suggests that such a system could benefit from running on parallel and distributed computing platforms. The test results showed that the system is capable of creating new sentences that include abstracted concepts not explicitly mentioned in the original documents and that contain information synthesized from different parts of the documents to compose a summary.
Fichier principal
Vignette du fichier
472936_1_En_8_Chapter.pdf (1 Mo) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-02060037 , version 1 (07-03-2019)

Licence

Identifiers

Cite

Andrey Timofeyev, Ben Choi. Building a Knowledge Based Summarization System for Text Data Mining. 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Aug 2018, Hamburg, Germany. pp.118-133, ⟨10.1007/978-3-319-99740-7_8⟩. ⟨hal-02060037⟩
110 View
142 Download

Altmetric

Share

More