1

I was going through original GAN paper: Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014. Link: http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

For proving optimal D, eq 2, they have rewritten the objective function in equation 3. It is: Equation 3 image

So, essentially they have changed p(z) to p(x) and g(z) to x. My question is how can this be done?

p.s: Is this the correct place to ask such question? Is there a dedicated place where I can ask questions related to specific sub topics of ML?

user1953366
  • 191
  • 7
  • Regarding your last question, there is Cross Validated. But personally I think your question is okay here as it is just a question about the math. – Jair Taylor Apr 22 '20 at 23:15
  • Sorry, I read this too quickly and missed that. Thanks. – Jair Taylor Apr 22 '20 at 23:23
  • @JairTaylor Hmmm, I am not trying to validate. I am merely trying to understand it. – user1953366 Apr 22 '20 at 23:35
  • Cross Validated is just the name. It's for 'Q&A for people interested in statistics, machine learning, data analysis, data mining, and data visualization' – Jair Taylor Apr 22 '20 at 23:40
  • Cool, I will ask as there (I can only ask a question every 40 min) – user1953366 Apr 22 '20 at 23:44
  • Make sure to cross-reference between the two posts. If someone posts an answer only to find it has been resolved elsewhere they may get irate. – Jair Taylor Apr 23 '20 at 00:03
  • 1
    If I'm not mistaken, on page 4 they say that the figures show how setting $x = g(z)$ gives the desired distribution $p_g$ on the transformed samples. In addition, it should be fairly clear that a maximum is attained when $p_\text{data} = p_g$, since in this case the discriminator cannot tell the difference between the genuine data and the generated data. So basically, since they are working in the optimal case they can make those substitutions. – SescoMath Apr 23 '20 at 00:38

1 Answers1

1

The change of variables in the proof of Proposition 1 of Goodfellow et al's 2014 GAN paper is valid. However, one needs to pay particular attention to the dimension of the latent variable z and the data variable x in the transformation x=G(z). (Aside: we really should use $\hat{x}$ for the generator output variable rather than x.) Everything is written as if it's a scalar and this is confusing. It turns out that when dim(z) $\geq$ dim(x) everything is fine but when dim(z)<dim(x), which applies in practical image synthesis, the PDF of the generator output $p_g(x)$, is degenerate - i.e. it contains delta functions and it is also non-unique! The change of variable formula still works, but (and this is the clincher) the next part of the proof of Proposition 1 does not hold. This is because variational calculus has been used and the integrand needs to be continuously differentiable with respect to x and D(), which it clearly isn't when delta functions are present. This means that equation (3) holds but the optimal discriminator does not exist when dim(z)<dim(x). This assertion has recently been demonstrated in a paper from Google researchers 1 who applied ODE's to implement GANs (rather than straight stochastic gradient descent). They obtained the expected convergence for a dim(x)=2 < dim(z)=32 example, but not for a dim(x)=3072 > dim(z)=128 example (Cifar-10).

A clear explanation with low dimensional examples is contained in section 2 of the paper https://www.researchgate.net/publication/356815736_Convergence_and_Optimality_Analysis_of_Low-Dimensional_Generative_Adversarial_Networks_using_Error_Function_Integrals