Compression algorithm principle

The compression algorithm works as follows: Data storage in solidDB^® disk based tables is based on algorithm called B+ tree. In compression, consequent nodes generated in a node split taking place in solidDB^® Merge operation are compressed to fit into a single compressed node an disk block. Compressing the node will help to avoid splitting the block. When a compressed node is retrieved from disk, it is stored in solidDB^® cache in compressed format. It will not be decompressed in cache. This will lead to either lower cache requirement or higher cache hit rates than with uncompressed data. solidDB^® will do the entire compression and decompression based on contents of disk block only. Compression of a block is fully self-contained and does refer to any external dictionary. Because of this, there will be no compression related administrative tasks such as compression dictionary refresh operations.

solidDB^® compression takes place inside the merge task of the database. The table-specific setting about compression defines whether merge will compress the nodes or not. Merge will process only nodes that have been touched in recent write operations. Other nodes in the tree are not touched. Hence, only data that has been touched after the compression property has been set, will be in compressed format.

The impact of compression is depending on nature of data. If significant amount of data consists of recurring patterns (partly of fully similar column values) it can be expected to compress significantly. If data compresses, the disk footprint of database file will be smaller. The file I/O can be expected to be smaller for two reasons: due to compression both individual disk block and collection of disk blocks stored in solidDB^® cache have capacity for more rows. If file I/O has been a bottleneck in the system, compression can be expected to improve system performance.