Compression algorithm principle

solidDB Help : Configuring and administering : Performance tuning : Data compression : Compression algorithm principle

Data storage in solidDB disk-based tables is based on the B+ tree algorithm where, when a page fills up with data, the page is split into two pages with each page containing half the data. Compressing the data in a page, allows you to store more data in a single page and thus reduce the frequency of page splits.

Compression only takes place when a page reaches capacity. After compression, more (uncompressed) data can be added to the page until the page is full again, at which point the uncompressed data is also compressed. When a single page can no longer hold all the compressed data, the page is split into two pages with each page containing half the compressed data.

When compressed data is retrieved from disk, it is stored in the solidDB cache in compressed format, which results in either a lower cache requirement or higher cache hit rates than with uncompressed data.

solidDB does the entire compression and decompression at the page level. Compression of a page is fully self-contained and does not refer to any external dictionary. Because of this, there are no compression-related administrative tasks such as compression dictionary refresh operations.

solidDB compression takes place inside the merge task of the database. The table-specific setting for compression defines whether the merge task compresses the pages or not. The merge task processes only pages that were updated in recent write operations. Hence, only pages that are updated after the compression property is set, are targets for compression.

The impact of compression depends on the nature of data. If a significant amount of data consists of recurring patterns (for example, the same column values), the data should compress significantly and the disk footprint of the database file should be smaller.

Due to compression, file I/O should be reduced as both pages on disk and the pages stored in solidDB cache have capacity for more rows. If file I/O is a bottleneck in the system, compression should improve system performance.

The physical disk block size (as defined with the IndexFile.BlockSize parameter) defines the page size in B+ tree. Since page size is the size of the compression unit, this setting has a potentially significant impact on the compression rate of the database, see IndexFile section.

Go up to

Data compression