4

Let $X = \sum_{i=1}^n{X_i}$, for Bernoulli random variables $X_i$ which are not necessarily independent. However, assume that conditioned on any possible values for the other variables, the probability that $X_i = 1$ is at most $p$.

I would like to say that if $Y = Y_i$ is the sum of $n$ independent Bernoulli variables with $\Pr(Y_i = 1)=p$, then for any $q>p$, $$ \Pr(X>qn) \leq \Pr(Y>qn). $$ However, I don't know how to justify this. Is it true? What's the proof?

Zur Luria
  • 1,514

2 Answers2

3

A guy at my office solved this one for me, and I thought I'd post the answer, since it seems like a useful technique. The idea is to couple $X$ and $Y$ by presenting both as functions of the same random process.

First, note that the values of $q_i(x_1, ... ,x_{i-1}):=\Pr(X_i=1 | X_1 = x_1, ... ,X_{i-1} = x_{i-1})$ for $1\leq i \leq n$ and $x_j \in \{0,1\}$ for $j < i$ describe the distribution of $X$ uniquely, since for any $x_1, ... ,x_n \in \{0,1\}$ we have $$ \Pr(X_1 = x_1, ... ,X_n = x_n) = \prod_{i=1}^n{\Pr(X_i=x_i|X_1 = x_1, ... ,X_{i-1}=x_{i-1})}. $$ Now let $U_1, ... ,U_n$ be i.i.d. continuous random variables that are uniformly distributed on the interval $[0,1]$. Set $Y_i$ to be the indicator variable of the event that $U_i\leq p$, and define $X$ as follows. Set $X_1$ to be the indicator variable that $U_1 \leq q_1$, and inductively set $X_i$ to be the indicator variable of the event that $U_i \leq q_i(X_1, ... ,X_{i-1})$. It is easy to see that $X$ and $Y$ thus defined have the distribution we wanted. It is also immediate that $Y_i \geq X_i$ for every $i$, since $q_i$ is always at most $p$. Therefore $\Pr(Y\geq qn)\geq\Pr(X \geq qn)$ for any $q \in (0,1)$.

Zur Luria
  • 1,514
0

Here is a bound of this flavor. Hope this helps.

For each $k\ge 0$, the probability of an event of the form $\{X_1 =x_1,X_2=x_2,\dots,X_n=x_n\}$ where $x_i \in \{0,1\}$ and $\sum_{i=1}^n x_i = k$ has probability bounded above by $p^k$. This follows from the assumption by induction.

Let $S^X_n = \sum_{i\le n} X_i$. Then by the union bound, $$P(S^X_n\ge k) \le \sum_{k\le j \le n} \binom{n}{j} p^j.$$ Now let $(Y_i:i\ge 1)$ be IID Bernoulli with mean $r=\frac{p}{1+p}\in (0,\frac 12)$, and let $S^Y_n =\sum_{i\le n} Y_i$. Observe that $p=\frac{r}{1-r}$. Then
\begin{align*} P(S^X_n \ge k) &\le \sum_{k\le j \le n} \binom{n}{j}p^j \\ & = \sum_{k\le j \le n} \binom{n}{j} (\frac{r}{1-r})^j \\ & =(1-r)^{-n} \sum_{k\le j \le n} \binom{n}{j} (1-r)^{n-j} r^j \\ & = (1-r)^{-n} P(S^Y_n \ge k)\quad (*) \end{align*}

This is true for any $k$, and you would normally choose $k=k(n)$. As can be seen from standard large deviations, the bound $(*)$ is not trivial (i.e. strictly less than $1$), provided $k=k(n)\ge cn$ where $c\in (0,1)$ depends on $r$ (note we excluded the trivial case $c=1$ for which the bound is trivially nontrivial...)

To find the smallest $c$, recall from large deviations that the rate function for Bernoulli with mean $r$ is $I(c) = c\ln \frac{r}{c}+(1-c)\ln \frac{1-r}{1-c}$, and so

$$ P(S^Y_n \ge \lceil cn \rceil)\le e^{-n I(c)},$$

and so the bound $(*)$ is non-trivial if $I(c)>\ln \frac{1}{1-r}$ (there exists such $c\in (0,1)$ because $I$ is continuous on this interval and $\lim_{c\nearrow 1} I(c) = \ln \frac {1}{r} > \ln \frac{1}{1-r}$.

Fnacool
  • 7,519
  • This bound does have the right flavor, but it leaves open the question of whether there is a distribution X as above and a number $q \in (0,1)$ such that $ \Pr(Y>qn)<\Pr(X>qn) $. I think that this is impossible, but I can't find a proof or a counterexample. – Zur Luria Jun 09 '16 at 13:28