Prayat Le-WanChouvalit Khancome
This research article introduces the TCEVP and TCEVPaA data compression algorithms. They employ a novel dictionary design that utilizes patterns of word construction based on vowel and article formats to reference words during both compression and decompression. The compressed data is stored in a bit format before being written to the compressed file. In theoretical experimental results, when considering vowel patterns of 1-12 characters, the percentage of bit compression per word is as follows: when using ASCII encoding, compression ranges from 37.5% to 90.38%; Unicode achieves compression percentages ranging from 68.80% to 95.20%; and Long Unicode can compress data by 84.40% to 97.60%. When examining text ranging from 1 kiloword to 1 teraword with mixed formats, encoded using ASCII, Unicode, big endian Unicode, and Long Unicode encoding schemes, compression rates can reach from 82.42% to 95.61%, with compression ratios ranging from 5.69 to 22.80 times.
Maxime CrochemoreThierry Lecroq
Maxime CrochemoreThierry Lecroq
Mikhail J. AtallahMarina Blanton