Techniques of Czech Language Lossless Text Compression

Jiří Ševčík; Jiří Dvorský

doi:10.1007/978-3-319-45378-1_24

Conference Papers Year : 2016

Techniques of Czech Language Lossless Text Compression

(1) , (1)

Jiří Ševčík

Function : Author
PersonId : 1023072

IT4Innovations - National Supercomputing Center [Ostrava]

Jiří Dvorský

Function : Author
PersonId : 994861

IT4Innovations - National Supercomputing Center [Ostrava]

Abstract

For lossless data compression of the texts of natural language and for achieving better compression ratio we can use linguistic and grammatical properties extracted from the text analysis. This work deals with usage of word order, word categories and grammatical rules in sentences and sentence units in Czech language. Special grammatical properties of this language which are different from for example English language are used here. Further, there is an algorithm designed for searching similarities in analyzed sentence structures and its next processing to final compressed file. For analysis of the sentence units a special tool is used which allows parsing on more levels.

Keywords

Domains

Fichier principal

419526_1_En_24_Chapter.pdf (178.46 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01637512

Submitted on : Friday, November 17, 2017-3:45:51 PM

Last modification on : Saturday, June 1, 2019-11:34:02 AM

Long-term archiving on : Sunday, February 18, 2018-2:36:28 PM

Dates and versions

hal-01637512 , version 1 (17-11-2017)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01637512 , version 1
DOI : 10.1007/978-3-319-45378-1_24

Cite

Jiří Ševčík, Jiří Dvorský. Techniques of Czech Language Lossless Text Compression. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.265-276, ⟨10.1007/978-3-319-45378-1_24⟩. ⟨hal-01637512⟩

Techniques of Czech Language Lossless Text Compression

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share