Suppose I have N independent observations from a Gaussian data generation process with know variance $\sigma$. I would assume that the Bayesian likelihood function can be written as: $$ p(y | \mu, \sigma) = \prod_{i = 1}^N p(y_i| \mu, \sigma) = \prod_{i = 1}^N N(y_i| \mu, \sigma^2) $$
I'm reading a book where in this situation, the likelihood is actually stated as $$ p(y | \mu, \sigma) \propto N(\overline{y}| \mu, \sigma^2/N) $$ where $\overline{y}$ is the sample mean. It is stated that this is possible because $\overline{y}$ is a sufficient statistic.
Can somebody explain to me how this works? To me they do not look the same, since the first one still has some ordering on the $y_i$. Admittedly I'm not sure that this ordering matters since the $y_i$ are assumed to be exchangeable. Perhaps this could be illustrated for the case where $N = 2$? Thanks in advance.