In Ian GoodFellow's Deep Learning textbook, there is a description of using a Gaussian prior for the weight $w$ of a linear regression model.
$$p(w) = N(w; u_0, \Lambda_0) \propto exp(\frac{-1}{2}(w-u_0)^T \Lambda_0^{-1}(w-u_0))$$ Where $\mu_0$ and $\Lambda_0$ are the prior distribution mean vector and covariance matrix respectively.
The authors write (Chapter 5.6.0, page 134)
With the prior model thus specified, we can now proceed in determining the posterior distribution over the model parameters $$p(w|X,y) \propto p(y|X, w) p(w) $$ $$ \propto exp(\frac{-1}{2}(y-Xw)^T (y-Xw)) exp(\frac{-1}{2}(w-u_0)^T\Lambda_0^{-1}(w-u_0))$$ $$ \propto \frac{-1}{2}(-2y^TXw + w^TX^TXw + w^T \Lambda_0^{-1}w - 2u_0 \Lambda_{0}^{-1} w)) $$
** We now define $\Lambda_m = (X^TX+\Lambda_0^{-1})^{-1}$ and $\mu_m= \Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0)$. Using these new variables, we find that the posterior may be rewritten as a Gaussian distribution: $$p(w|X, y) \propto exp(\frac{-1}{2} (w-\mu_m)^T \Lambda_{m}^{-1} (w-\mu_m) + \frac{1}{2} \mu_m^T\Lambda_{m}^{-1} \mu_m)$$ $$ \propto exp(\frac{-1}{2} (w-\mu_m)^T \Lambda_m^{-1} (w-\mu_m))$$
Starting from (**), I'm a bit unsure how this second last equation is obtained. The covariance prior is assumed diagonal.
Expanding the exponentiated term I get $$w^T \Lambda_m^{-1}w - w^T \Lambda_{m}^{-1} \mu_m - \mu_{m}^T \Lambda_{m}^{-1} w + u_m^T \Lambda_{m}^{-1} \mu_m + \frac{1}{2} \mu_m^T \Lambda_{m}^{-1} \mu_m$$
Further expanding, sorry for the ugliness, I just don't see a cleaner way of trying to see these results.
$$w^T \Lambda_m^{-1}w - w^T \Lambda_{m}^{-1} \Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0) - (\Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0))^T \Lambda_{m}^{-1} w $$ $$+ (\Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0))^T \Lambda_{m}^{-1} (\Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0) $$ $$ + \frac{1}{2} (\Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0))^T \Lambda_{m}^{-1} (\Lambda_m(X^Ty+ \Lambda_{0}^{-1}\mu_0))$$
From here I'm not sure how to simplify. Any help appreciated.