Distance between probability measures
Let $(X,d)$ be a compact metric space, and let $\mu$ and $\nu$ be two probability measures on $X$. We can define the Wasserstein distance between $\mu$ and $\nu$ in the following way.
Let $\mathcal{P}_{\mu, \nu}$ be a the set of probability measures on $X \times X$ whose first marginal is $\mu$ and second marginal is $\nu$. Then:
$$W_1 (\mu, \nu) = \min_{\pi \in \mathcal{P}_{\mu, \nu}} \int_{X \times X} d(x,y) \pi (dx,dy).$$
Distance between compact spaces
Let $(X,d)$ and $(X',d')$ be two compact metric spaces. We can define the Gromov-Hausdorff distance between $\mu$ and $\nu$ in the following way.
Let $\mathcal{D}_{d, d'}$ be a the set of distances on $X \coprod X'$ whose restriction to $X \times X$ is $d$ and whose restriction to $X' \times X'$ is $d'$. Then:
$$d_{GH} ((X,d), (X',d')) = \inf_{D \in \mathcal{D}_{d, d'}} D_H (X,X'),$$
where $D_H$ is the Hausdorff distance in $X \coprod X'$ for the distance $D$.
Generalizations ?
These two distances are defined in a similar way. To create a distance between two objects $X$ and $Y$, we realize them simultaneously into a larger object, and quantify their distance in the larger object. Then, we take the infimum.
They also have the same kind of "universal" definitions. The Wasserstein distance between two probability measures can be defined as a minimum over all couplings between these probability measures (without specifying the base space). The Gromov-Hausdorff distance can be defined as a minimum over all isometric embeddings of the compact metric spaces.
In addition, these definitions or sort of compatible, in that it is possible to define a Gromov-Hausdorff-Wasserstein distance between probabilized compact metric spaces (see e.g. here).
From here, I have two questions:
1) Why does the definition of the Wasserstein distance uses a product, and the definition of the Gromov-Hausdorff distance a coproduct? I suspect it has to do with the fact that measures are naturally pushed forward and distances are naturally pulled backward (this is also what the .pdf I linked to suggests), but that's little more than a hunch.
2) Are there other examples of this kind of construction, where we define a distance between two structures by realizing them simultaneously, taking the distance in some way in this realization, and then taking the infimum over all such realizations? I was thinking in particular of algebraic structures, but anything goes.