Techniques of Czech Language Lossless Text Compression - Computer Information Systems and Industrial Management (CISIM 2016)
Conference Papers Year : 2016

Techniques of Czech Language Lossless Text Compression

Abstract

For lossless data compression of the texts of natural language and for achieving better compression ratio we can use linguistic and grammatical properties extracted from the text analysis. This work deals with usage of word order, word categories and grammatical rules in sentences and sentence units in Czech language. Special grammatical properties of this language which are different from for example English language are used here. Further, there is an algorithm designed for searching similarities in analyzed sentence structures and its next processing to final compressed file. For analysis of the sentence units a special tool is used which allows parsing on more levels.
Fichier principal
Vignette du fichier
419526_1_En_24_Chapter.pdf (178.46 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01637512 , version 1 (17-11-2017)

Licence

Identifiers

Cite

Jiří Ševčík, Jiří Dvorský. Techniques of Czech Language Lossless Text Compression. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.265-276, ⟨10.1007/978-3-319-45378-1_24⟩. ⟨hal-01637512⟩
67 View
127 Download

Altmetric

Share

More