pykafka.utils.compression

Author: Keith Bourgoin

pykafka.utils.compression.encode_gzip(buff)

Encode a buffer using gzip

pykafka.utils.compression.decode_gzip(buff)

Decode a buffer using gzip

pykafka.utils.compression.encode_snappy(buff, xerial_compatible=False, xerial_blocksize=32768)

Encode a buffer using snappy

If xerial_compatible is set, the buffer is encoded in a fashion compatible with the xerial snappy library.

The block size (xerial_blocksize) controls how frequently the blocking occurs. 32k is the default in the xerial library.

The format is as follows: +————-+————+————–+————+————–+ | Header | Block1 len | Block1 data | Blockn len | Blockn data | |-------------+------------+--------------+------------+--------------| | 16 bytes | BE int32 | snappy bytes | BE int32 | snappy bytes | +————-+————+————–+————+————–+

It is important to note that blocksize is the amount of uncompressed data presented to snappy at each block, whereas blocklen is the number of bytes that will be present in the stream.

Adapted from kafka-python https://github.com/mumrah/kafka-python/pull/127/files

pykafka.utils.compression.decode_snappy(buff)

Decode a buffer using Snappy

If xerial is found to be in use, the buffer is decoded in a fashion compatible with the xerial snappy library.

Adapted from kafka-python https://github.com/mumrah/kafka-python/pull/127/files

pykafka.utils.compression.encode_lz4_old_kafka(buff)

Encode buff for 0.8/0.9 brokers – requires an incorrect header checksum.

Reference impl: https://github.com/dpkp/kafka-python/blob/a00f9ead161e8b05ac953b460950e42fa0e0b7d6/kafka/codec.py#L227

pykafka.utils.compression.decode_lz4_old_kafka(buff)

Decode buff for 0.8/0.9 brokers

Reference impl: https://github.com/dpkp/kafka-python/blob/a00f9ead161e8b05ac953b460950e42fa0e0b7d6/kafka/codec.py#L258