20

I want to write the following sentence formally:

The sequence $S$ contains elements of the set $A$. The probability value $P(a)$ for an element $a$ is defined as the number of its occurrences in the sequence $S$, divided by the count of all its elements.

I can write it the following manner:

$$ S = (s_1, s_2, ..., s_n) : s_i \in A.$$ $$ P(a) := {{ \left| \lbrace i \in \lbrace 1, 2, ..., n \rbrace : s_{i} = a \rbrace \right| } \over {n}}, \text{ given } n > 0\text{ and }a \in A. $$

It's, however, quite long and rather not elegant. Is there a simpler way to write this?


Edit:

There's always a solution, which involves breaking the formula to smaller parts:

$$ \text{Let } C(x) = \left| \lbrace i \in \lbrace 1, 2, ..., n \rbrace : s_i = x \rbrace \right|.$$ $$ P(a) := {C(a) \over n}. $$

It's more readable, but it's still not what I'm searching for...

ViHdzP
  • 4,582
  • 2
  • 18
  • 44
Spook
  • 946

4 Answers4

27
  1. If you are willing to use the "Iverson bracket notation", popularized by Knuth and others, you can say $$C(x) = \sum_{i=1}^n [s_i = x]$$

    Here $[\ldots]$ are the Iverson brackets. $[P]$ is defined to be 1 if $P$ is true, and 0 if it is false.

  2. People do sometimes use the Kronecker delta for this: $\delta_{ij}$ is defined to be 1 if $i=j$ and 0 if $i\ne j$, so you would have:

    $$C(x) = \sum_{i=1}^n \delta_{xs_i}$$ or $$C(x) = \sum_{i=1}^n \delta(x, s_i)$$

    but I think the Iverson bracket is more straightforward.

  3. Most straightforward would be to write

    Let $C(x)$ be the number of elements of $s_1,\ldots,s_n$ that are equal to $x$. Then…

    The idea that this is somehow less "formal" than something involving a bunch of funny symbols is a common misapprehension.

MJD
  • 65,394
  • 39
  • 298
  • 580
  • 4
    You know, I guess, that your third solution will suit my needs. After all, my priority is that someone understands my notation - writing the sentences as formally as possible is not really as important. – Spook Mar 11 '13 at 07:46
  • 3
    Agreed. I think the third solution provided is optimal for readability. A very classical approach. – Adam Erickson Mar 09 '16 at 08:07
  • I just learned about Iverson brackets, nice – osolmaz Oct 01 '19 at 11:32
4

Using the number symbol "#",

$$ \DeclareMathOperator*{\countif}{\#} P(a) := \dfrac{\countif\limits_{i=1}^{n} (s_i=a)}{n}, \text{ given } n > 0\text{ and }a \in A. $$

See John Fox, Applied Regression Analysis and Generalized Linear Models (3rd edition), Section 21.2.3

3

It's also not quite correct. How about $|\{i\in\{1,\ldots,n\}\colon s_i=a\}|$ in the numerator? If you view the sequence $S$ as a function $S\colon\{1,\ldots,n\}\to A$, $i\mapsto s_i$, then you might even write $|S^{-1}(a)|$ for the numerator. And the denominator should simply be $n$ (which you implicitly defined for $S$). As $S$ is not (primarily) a set, $|S|$ looks strange.

  • You're right about my notation, I corrected it. However, I don't like the solution with $S^{-1}$ much, that looks like redefining the sequence to simplify the notation. I thought rather of a mathematical construct corresponding to a 'count of' function. Isn't there one? – Spook Mar 10 '13 at 15:55
  • @Spook Actually, that normally isn't really redefining the ntotion of sequence. But nevertheless, isn't $|{\ldots}|$ what corresponds to a count function after all? All that matters is that you can define your frequency notion clear and unambiguously, not necessarily with less then five symbols ... – Hagen von Eitzen Mar 11 '13 at 07:41
2

2 ideas:

1) Using the hashtag or number symbol "#", e.g:

$$ \#(a \in S) $$

It is fairly common within literature, see: 2008. Elements of Statistical Learning 2nd Ed, Chapter 9.2.2. Hastie, Tibshirani, Friedman

2) Using the indicator function (returns 1 if condition true, else 0) in a sum, e.g: $$ \sum\limits_{i=1}^{n}{\textbf{1}_{a_i \in S}} $$

or

$$ \sum\limits_{i=1}^{n}{I({a_i \in S})} $$

It is probably the most common and 'mathematical', see: 2008. Elements of Statistical Learning 2nd Ed, Chapter 9.2.3. Hastie, Tibshirani, Friedman


Usage example for original post:

$$ \mathbb{P}(a) = \frac{\sum\limits_{i=1}^{n}{\textbf{1}_{a_i \in S}}}{n} \forall \ a \in S \in A $$ where

$n$ = number of elements in S

$\textbf{1}_a$ = indicator function that returns 1 if the element is a

PaulG
  • 143
  • I suppose, that #(...) is so far most clear notation for me. The Iverson bracket [...] seems relatively clear too, but I guess, the hash-notation is more obvious. I guess, that it would be beneficial if matematicians agreed on some common notation at some point, because - especially in computer science - it is used very often. – Spook Sep 28 '20 at 07:37