0

In MacKay's "Information theory, inference, and learning algorithms" book, in chapter 26 he covers the sum-product algorithm for calculating marginal probabilities, partition functions, etc. The same info can be found in this wikipedia article.

Assume we have an unnormalized probability distribution $P^\ast(x)$ which can can be expressed as a product of factors over different subsets of the components of $x$, for example:

enter image description here

is the graph for:

$P^\ast(x) = f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_1, x_2) f_5(x_2, x_3)$

and for a general factored $P^\ast(x)$:

$Z = \sum\limits_{\pmb x} f_1(x_1) f_2(x_2) \dots f_M(x_M)$

where $\pmb x$ is the full random vector of all components, and each of the $x_i$ are the subset of components that factor $f_i$ depends on. In general, summing over all combinations of all components of $\pmb x$ would be intractably huge, but the sum-product algo avoids lots of unnecessary calculation because $P^\ast$ factors like this.

It seems like because of the factored nature of $P^\ast$, in a tree-like factor graph, we'll always be able to express $Z$ in a way that lets us sum over as few components of $\pmb x$ at once. For example, in the concrete example above,

$Z = \sum\limits_{\pmb x} P^\ast(\pmb x) = \sum\limits_{x_1, x_2, x_3} f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_1, x_2) f_5(x_2, x_3) = \sum\limits_{x_2} f_2(x_2) \bigg[ \sum\limits_{x_1} f_1(x_1) f_4(x_1, x_2) \bigg] \bigg[ \sum\limits_{x_3} f_3(x_3) f_5(x_2, x_3) \bigg]$

This is a simple example, but if each of the $x_i$ components had $N$ discrete values, this would only need $O(N^2)$ operations compared to the naive $O(N^3)$.

My question is the following: is this actually all the sum-product algorithm is doing, in the tree-like graph case? Is it just so we can avoid having to reorder the sum like above, or does it provide something else?

skymonkey
  • 125
  • When I learnt this, I found that the best way to think about efficiency of this algorithm vs naive marginalisation is that after you have decided an order in which you wish to marginalise/eliminate the variables (which for tree graphs is simple), then by grouping factors which have the same eliminand, and then grouping intermediate factors with other factors with same eliminand, you are effectively caching computations. – microhaus Mar 02 '24 at 12:05
  • @microhaus thanks for the comment, that makes sense to me in terms of caching computations as well. But am I correct in thinking that the sum-product MP algo really is nothing but what I wrote above and you mentioned with grouping by factors? and the presentation with the "messages" exactly equivalent? – skymonkey Mar 03 '24 at 16:38
  • Yes you are correct - the sum-product MP algo allows you to efficiently compute the normalisation constant (generally in undirected graphical models where this is an issue), and probabilities of say one variable $X_1$. But I think you may have understated its usefulness. There are many situations, such as one I'm working on with linear-chain conditional random fields/hidden Markov models, where it would be completely intractable for me to compute the normalisation constant without this algorithm. – microhaus Mar 04 '24 at 20:42

0 Answers0