I know this is very elementary but I cannot remember how to show $\hat\alpha$ as below.

I know this is very elementary but I cannot remember how to show $\hat\alpha$ as below.

You minimize the squared error $$\epsilon^T\epsilon=(y-X\alpha)^T(y-X\alpha)=y^Ty-2\alpha^TX^Ty+\alpha^TX^TX\alpha$$ This expression can be minimized by setting its derivative w.r.t. $\alpha$ equal to zero:
$$-2X^Ty+2X^TX\alpha=0\Rightarrow \alpha=(X^TX)^{-1}X^Ty$$