What are some applications of entropy maximization and minimum volume covering ellipsoid?

Question

I'm reading Boyd's Convex Optimization textbook. In particular, I'm currently focusing on Chapter 5 (Duality). There is a frequent recurrence of two examples:

Minimum volume covering ellipsoid \begin{align*} minimize & \quad log \; det \; X^{-1}\\ subject \; to & \quad a_i^T X a_i \leq 1 \\ \end{align*}
Entropy Maximization: \begin{align*} minimize & \quad \displaystyle\sum_{i=1}^n x_i \;\;log \;x_i \\ subject \; to & \quad Ax \leq b \\ & \quad {\bf 1}^Tx=1 \end{align*}

I understand the interpretation of (1) as the minimum volume covering ellipsoid, but when would you ever want to solve this? E.g., if doing machine learning, you might want to do something like this to training data for the purpose of outlier detection, but such a model would surely be overfit, and it would seem better to incorporate a more graceful probabilistic decay from the boundary (you might consider setting the discovered ellipsoid to be equal to, say, the 95th percentile probability contour in a multivariate Gaussian model; but instead of doing this, it would seem wiser to just maximize a multivariate Gaussian likelihood directly). So when might one want to solve this problem?

As for (2), I can vaguely imagine situations in which one might want to find a maximum entropy probability distribution (as suggested by the cost function and second constraint) which satisfy some constraints, but what is a realistic example where one might want to impose the linear inequality constraint $Ax \leq b$?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

Finding a minimum volume covering ellipsoid is helpful in building probability distributions in the presence of outlying data points. For instance, suppose we have a data set $X \subset \mathbb{R}^N$ that is sampled from a normal distribution, $N(0, \Sigma)$. As you said, a reasonable way to estimate this normal distribution is to use the maximum likelihood parameters, which in this case are the sample mean and sample covariance, $\hat{\mu}$ and $\hat{\Sigma}$, respectively. However, if we add an outlier $y \in \mathbb{R}^N$ to our data set $X$ then the estimated distribution can be very skewed. In particular, $\|\mu - \hat{\mu} \|$ and $\|\Sigma - \hat{\Sigma} \|$ can become arbitrarily large so that our maximum likelihood estimation is no good. Many robust density estimation methods rely on finding a minimum volume covering ellipsoid to estimate the density in the presence of outliers. It has been a while since I looked at it, but I believe that this algorithm for computing the Minimum Covariance Determinant makes use of minimum covering ellipsoids.

Unfortunately, I haven't had any experience with maximum entropy models so I can't give a lot of examples showing why you would want to have the $Ax \leq b$ constraint. However, I suspect that requiring $Ax \leq b$ imposes some prior on the distribution. If you're still curious, I think that the folks at Cross Validated Stack Exchange could give better answers.

What are some applications of entropy maximization and minimum volume covering ellipsoid?

1 Answers1