%0 Conference Proceedings
%T Building a Knowledge Based Summarization System for Text Data Mining
%+ Computer Science [Louisiana]
%A Timofeyev, Andrey
%A Choi, Ben
%Z Part 1: MAKE-Main Track
%< avec comité de lecture
%( Lecture Notes in Computer Science
%B 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE)
%C Hamburg, Germany
%Y Andreas Holzinger
%Y Peter Kieseberg
%Y A Min Tjoa
%Y Edgar Weippl
%I Springer International Publishing
%3 Machine Learning and Knowledge Extraction
%V LNCS-11015
%P 118-133
%8 2018-08-27
%D 2018
%R 10.1007/978-3-319-99740-7_8
%K Data mining
%K Text summarization
%K Artificial intelligence
%K Knowledge extraction
%K Knowledge-based systems
%Z Computer Science [cs]
%Z Humanities and Social Sciences/Library and information sciencesConference papers
%X This paper provides details on building a knowledge based automatic summarization system for mining text data. The knowledge based system mines text data on documents and webpages to create abstractive summaries by generalizing new concepts, deriving main topics, and creating new sentences. The knowledge based system makes use of the domain knowledge provided by Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The system extracts syntactic structures and semantic features by employing natural language processing techniques and Cyc knowledge base and reasoning engine. The system creates a summary of the given documents in three stages: knowledge acquisition, knowledge discovery, and knowledge representation for human readers. The knowledge acquisition derives syntactic structure of each sentence in the documents and maps their words and their syntactic relationships into Cyc knowledge base. The knowledge discovery abstracts novel concepts and derives main topics of the documents by exploring the ontology of the mapped concepts and by clustering the concepts. The knowledge representation creates new English sentences to summarize the documents. This system has been implemented and integrated with Cyc knowledge based system. The implementation encodes a process consisting seven stages: syntactic analysis, mapping words to Cyc, concept propagation, concept weights and relations accumulation, topic derivation, subject identification, and new sentence generation. The implementation has been tested on various documents and webpages. The test performance data suggests that such a system could benefit from running on parallel and distributed computing platforms. The test results showed that the system is capable of creating new sentences that include abstracted concepts not explicitly mentioned in the original documents and that contain information synthesized from different parts of the documents to compose a summary.
%G English
%Z TC 5
%Z TC 8
%Z TC 12
%Z WG 8.4
%Z WG 8.9
%Z WG 12.9
%2 https://inria.hal.science/hal-02060037/document
%2 https://inria.hal.science/hal-02060037/file/472936_1_En_8_Chapter.pdf
%L hal-02060037
%U https://inria.hal.science/hal-02060037
%~ SHS
%~ IFIP-LNCS
%~ IFIP
%~ IFIP-TC
%~ IFIP-TC5
%~ IFIP-WG
%~ IFIP-TC12
%~ IFIP-TC8
%~ IFIP-WG8-4
%~ IFIP-WG8-9
%~ IFIP-CD-MAKE
%~ IFIP-WG12-9
%~ IFIP-LNCS-11015