The Hessian is an essential part of the multidimensional Taylor expansion of a sufficiently smooth function. Total differentiability of a function $f:U\to\mathbb R$ in $x_0\in U$ for an open subset $U\subseteq \mathbb R^n$ means that there is a linear map $L:\mathbb R^n\to \mathbb R$ such that
$$\lim_{x\to x_0}\frac{f(x)-[f(x_0)+L(x-x_0)]}{\Vert x-x_0\Vert}=0.$$
That's the definition of total differentiability. The term in $[]$ is then the first order Taylor approximation of $f$ around $x_0$, and we call $L$ the gradient. The equation essentially tells us that as we go to $x_0$, the difference between $f$ and its Taylor approximation gets arbitrarily small quickly. We could also derive that the gradient's matrix representation is $\nabla f(x_0)$, but I'll skip this.
Now if $f$ is twice totally differentiable, this means that additionally there is a bilinear form $B:\mathbb R^n\times\mathbb R^n\to\mathbb R$ such that
$$\lim_{x\to x_0}\frac{f(x)-[f(x_0)+L(x-x_0)+\frac{1}{2}B(x-x_0,x-x_0)]}{\Vert x-x_0\Vert^2}=0.$$
This is not a definition, but the statement of one of the several versions of Taylor's theorem. The term in $[]$ is now the second order Taylor approximation, and we call $B$ (or rather its matrix representation) the Hessian of $f$, and we get $B(v,w)=w^T \mathrm Hf(x_0) v$. It also happens to be the total differential of the function $x\mapsto \nabla f(x)$, which would allow us to derive its components, but again, I'll skip that.
With this, the Taylor approximation of a twice totally differentiable function becomes
$$f(x)\approx f(x_0)+\nabla f(x_0)\cdot(x-x_0)+\frac{1}{2}(x-x_0)^T \cdot\mathrm Hf(x_0)\cdot(x-x_0).$$
From here it might be intuitively clear why the Hessian tells us about the type of critical point. If $\nabla f=0$, then the Taylor approximation is just a constant plus the Hessian term. And if the Hessian is positive or negative definite, it means that this term either only increases (positive definite) or decreases (negative definite) if $x-x_0$ moves away from 0 (and thus $x$ moves away from $x_0$). So we have to be at a minimum/maximum. If it is indefinite, however, that means that as $x$ goes away from $x_0$ in some direction, the Hessian term increases, while in another direction it decreases. So we have to be at a saddle point.