0

Here is a well-known formula you should recognize from Wikipedia's page on entropy.

$$H(X) = -\sum_{i=1}^n {\mathrm{P}(x_i) \log_b \mathrm{P}(x_i)}$$

The article makes these definitions:

  • $H$ is entropy
  • $X$ is your discrete random variable
  • $P$ is the probability mass function
  • $b$ is base of exponentiation (usually 2)

And $i$ is defined in the summation lower limit.


I am disturbed that $n$ is the upper limit of summation and that it is not defined. You need to accept the implication

$$H(x) = (x_1, x_2, ... x_n)$$

to properly parse this equation.

In the interest of improving math writing, would it be acceptable to instead write the above as:

$$H(X) = -\sum_{x \in X} {\mathrm{P}(x) \log_b \mathrm{P}(x)}$$

And is this considered any less formal than the above? Is there contexts where a journal would prefer one style over the other?

  • Not my field, but I would like your notation better, if only $X$ were a set. I always thought a r.v. was a function on a probability space, though. – Lubin Mar 12 '17 at 04:22

1 Answers1

3

Here $X$ is a random variable that takes values in a finite set $\{x_1,\ldots,x_n\}.$ I agree that that should be spelled out explicitly and that it's not the most general case.

However $x\in X$ would be bad notation in my opinion. $X$ usually denotes the random variable, not its set of possible values.