1

I am trying to work out how to estimate the cost in bits for encoding a bitmask using arithmetic encoding.

For example, I have an array of 4096 bits. Only 30 of which can be set at any given time, so the probability table looks like the following...

flag states probability
0 4066 0.993
1 30 0.007

Assuming a non adaptive model how does one estimate the cost in bits of encoding the above array?

I have searched for the last couple of days and haven't found a way to do this.

  • 1
    The entropy using $$\mathrm{H}(X)=-\sum_{i=1}^{n} \mathrm{P}\left(x_{i}\right) \log \mathrm{P}\left(x_{i}\right)$$ gives 0.0417084 nats (0.0601724 bits if log base 2 is used) per symbol, so 177.387 nats (246.466 bits). Of course you can do much better than this if you know there's a pattern to the data – flinty Sep 01 '21 at 09:49
  • Thanks. I was thinking it would be closer to the average distance which is around 213 bits. However would you mind explaining the equation above? (btw. the bits are essentially random) – DeveloperChris Sep 02 '21 at 02:59
  • It's from information theory - you'll need to read that. Also any good text on data compression would cover this. The $-\log(\text{P}(x_i))$ is the surprisal of $x_i$ and is the information needed to encode it (measured in nats, bits, trits or whatever unit you want depending on the base of the logarithm) - the sum and probability weighted log is really just an expectation of the surprisal, so you can think of it as the average number of bits needed to represent an outcome of the $X_1, X_2,...,X_n$ random variables. – flinty Sep 02 '21 at 09:23
  • Thank you. I found this site which gave me the code https://rosettacode.org/wiki/Entropy#C.2B.2B to calculate the bits required it gave me ~255.19 bits with a bits per symbol of ~0.0625. If you would like to submit your response as an answer I can accept it? – DeveloperChris Sep 03 '21 at 08:29

0 Answers0