Varlen Encoding

From Free Pascal wiki
Revision as of 03:05, 23 March 2019 by CuriousKit (talk | contribs) (Information on the design of Varlen Encoding)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Varlen Encoding (short for "variable length") is a way of storing a QWord so that small values take as few bytes as possible.

Inspiration

The inspiration and design of Varlen encoding came from two sources:

Some data fields are statistically more likely to contain small values, such as those used for sequential indices and string lengths, but whose upper limit may be the size of a LongWord or a QWord. If many hundreds of these values are stored, space can be quickly wasted when most of the bytes that make up the full integer value are zero, and this can be problematic on media with limited capacity. The design of Varlen Encoding is an attempt to reduce this wastage with minimal performance impact.

Format

A Varlen-encoded integer (henceforth just called a Varlen) takes between 1 and 9 bytes to store. The first byte, known as a Lead Byte, encodes a byte count and the most significant bits of the number, stored in big-endian order. The bit pattern of the Lead Byte dictates the number of Data Bytes that follow:

Lead Byte Value Range Description
0####### $00..$7F No data bytes; #'s encode a 7-bit value between $00 to $7F (0 to 127)
10###### $0080..$403F 1 data byte follows; #'s encode a 14-bit value offset by $80 (add $80 to the bits to obtain the stored integer)