4

I have been examining the literature on optimal transport, and typically it is introduced as an arbitrary number of probability measures $\mu_0, \mu_1, \cdots$ on an n-dimensional manifold $N$.

The situation I wish to examine is a lot simpler - optimal transport between two point sets $\vec{p}_i, \vec{q}_j$ in $\mathbb{R}^2$, where the probability measures for both of these point sets is taken to be a uniform histogram. That is to say, the distribution of the points in $\mathbb{R}^2$ can be arbitrary, but the weights of the points are uniform.

The number of points in each set are given by $N_1, N_2$. The papers I have read indicate that optimal transport algorithms between point sets can be executed even when $N_1 \neq N_2$.

This fact is really not making sense to me - if optimal transport takes you continuously from one point distribution to another, and the number of points varies between distributions, doesn't this mean that at some point during the transport some of the points must vanish/reappear? This seems like a discontinuous operation that doesn't fit within the overall context of optimal transport.

Daniel Hogg
  • 153
  • 1
  • 8
  • 4
    I believe that in the case that $N_1\not= N_2$ you have to accept the possibility that the masses of points can split and merge together. So what you are looking for is an $N_1\times N_2$ matrix for which the row sums and column sums match the source and end distributions, respectively, all entries are positive, and the $ij$ entry is interpreted as the amount of mass moving from $p_i$ to $q_j$. This is the discrete notion of Kantorovich's formulation of the optimal transport problem, so it fits quite naturally into the overall theory. – felipeh Aug 27 '17 at 18:58

1 Answers1

1

I have been examining the literature on optimal transport, and typically it is introduced as an arbitrary number of probability measures $μ_0,μ_1,\ldots$ on an $n$-dimensional manifold $N$. The situation I wish to examine is a lot simpler - optimal transport between two point sets $p_i,q_j$ in $\mathbb{R}^2$, where the probability measures for both of these point sets is taken to be a uniform histogram. That is to say, the distribution of the points in $\mathbb{R}^2$ can be arbitrary, but the weights of the points are uniform.

So, your manifold is $(M,g)=(\mathbb{R}^2,I)$ the Euclidean plane, with point sets $Q$ and $P$. Note that our discrete distributions can then be written as $$ \mu_P = \frac{1}{|P|} \sum_i \delta(p_i)\hspace{0.7in}\& \hspace{0.7in} \mu_Q = \frac{1}{|Q|} \sum_i \delta(q_i) $$ where $\delta(x)$ denotes a Dirac distribution density centered at $x$. Since each $\delta$ integrates to $1$, each $\mu_P$ and $\mu_Q$ is a valid density function, and optimal transport makes sense.

Note that if $|P|\ne |Q|$, then essentially we are "reweighting" the points. I.e., the "mass" at each $p_i$ is $1/|P|$, and the equivalent for $q_i$. For instance, if $|P| >> |Q|$, then then the mass on each $q_i$ is large (or equivalently, on $p_i$ it is small), so we use multiple $p_k$'s to satisfy each $q_j$.

So my answer is (1) yes, density can split and merge, since we are still just in a special continuous case, and (2) it may be better to think of the masses at each point being different for each set (so multiple light points from a large set are needed to match a single heavy point from a small set). There is no vanishing or reappearing, but there is certainly splitting and merging.


See also this question.

user3658307
  • 10,433