for a matrix $A \in \mathbb{R^{m\times n}}$ and for $x,\epsilon \in \mathbb{R^n}$ and $\epsilon$ small we have that
$\|A(\hat{x}+\epsilon)-b\|_2^2=(A(\hat{x}+\epsilon)-b)^T(A(\hat{x}+\epsilon)-b)=\|A\hat{x}-b\|_2^2+2\epsilon^T(A^TA\hat{x}-A^Tb)+\epsilon^TA^TA\epsilon$
Now the book states that the $A^TA\hat{x} - A^Tb$ can be treated as a derivative of something, and so we can set it equal to 0 to get the normal equation? How is this possible, what is it the derivative of and why do we set it to 0?