1

Suppose $f$ is a differentiable map from the inner product space $V$ to $\mathbb{R}$. How is the gradient of $f$ (at some point $v$) defined?

For a map $f:\mathbb{R}^n \to \mathbb{R}$, the typical treatment of multivariable calculus defines the gradient as the transpose of the Jacobian of $f$, and the interpretation is that $\nabla f$ is the direction of (local) steepest increase.

Is there anything wrong with just using $$\nabla f(v) := \text{argmax}_{||h|| = 1} Df(v)h$$ as a definition of the gradient in the general case?

  • Is $V$ really an arbitrary inner product space? If $V$ can be infinite-dimensional and incomplete the situation can become complicated. See examples like this, in which case there is actually no direction of steepest increase (but there will be if you pass to the completion..) – Izaak van Dongen Nov 11 '23 at 00:45
  • @IzaakvanDongen I meant finite dimensional. Honestly I just want to understand how you formally define the gradient of a real-valued function of a matrix. – TheProofIsTrivium Nov 11 '23 at 00:47
  • OK. GReyes' answer below is a good one. The point is that any linear function $V \to \Bbb R$ is given by taking the inner product with some (unique) fixed vector in $V$. This is (a special case of) the Riesz representation theorem. Your argmax construction has a small problem which is that the argmax isn't always unique. (What happens when $f = 0$?). In your case there really isn't anything stopping you from using the usual definition of gradient. You can forget about the fact that they're matrices and pretend they are vectors in $\Bbb R^{mn}$ instead, with the usual standard basis! – Izaak van Dongen Nov 11 '23 at 00:55
  • @IzaakvanDongen Thank you. If I wanted to choose an appropriate inner product on $\text{Mat}(m, n)$, so that GReyes' definition of the gradient coincides with what I'd get if I "flattened" all of the matrices like you say, which would I choose? Would I choose $\langle A, B \rangle := \text{Tr}(AB^T)$? – TheProofIsTrivium Nov 11 '23 at 00:59
  • 1
    Yes - the standard basis of "matrices with a single entry which is a $1$ and all other entries are $0$" is orthonormal with respect to the inner product $\mathrm{Tr}(AB^T)$. So therefore if you flatten the matrices, the inner product structure is just the usual dot product on $\Bbb R^{mn}$. In other words, the flattening is an isometric isomorphism of inner product spaces. If you write out $\mathrm{Tr}(AB^T)$ you should actually see that it's just the sum over products of corresponding entries of $A$ and $B$. – Izaak van Dongen Nov 11 '23 at 01:05
  • Thank you very much, that makes sense. – TheProofIsTrivium Nov 11 '23 at 01:08

1 Answers1

1

If your map is differentiable say at $a\in V$, it means that $$ f(a+h)=f(a)+\partial f(a)(h)+o(h)\qquad \textrm{as } h\to 0 $$ where $\partial f(a)$ is an element of the dual space $V^*$, acting on $h$. Since your space has a Euclidean structure, you can identify $V^*$ with $V$ in such a way that $$ \partial f(a)(h)=\langle u(a),h\rangle, $$ for some vector $u(a)\in V$. It is this vector the one that we call gradient.

GReyes
  • 16,446
  • 11
  • 16
  • Thank you. Can I ask two questions?
    1. How can you prove that your definition of the gradient gives the direction of steepest ascent?
    2. Can you suggest a reference for this? Do I need to learn more linear algebra? Is this differential geometry?
    – TheProofIsTrivium Nov 11 '23 at 00:33
  • If $h$ is any unit vector, you have $\langle u(a),h\rangle\le |u(a)|$ by Cauchy-Schwartz. The equality is achieved when $h=u(a)/|u(a|$, which is the direction of the gradient. – GReyes Nov 11 '23 at 00:37
  • This is linear algebra. In differential geometry you have a tangent space at each point and the same idea applies locally. – GReyes Nov 11 '23 at 00:39
  • Could you please suggest a book? Linear Algebra Done Right has Dual Spaces but no mention of gradients. – TheProofIsTrivium Nov 11 '23 at 00:41
  • Also, what do you mean by "identify" $V^*$ with $V$? Does this mean there is an isomorphism between the two? – TheProofIsTrivium Nov 11 '23 at 00:43
  • 1
    Yes, $\Phi:V^\to V$. Any element $l\in V^$ can be identified with a vector, $u=\Phi(v)$ in such a way that $l(x)=\langle u,x\rangle$ for all $x\in V$. In the infinite-dimensional case this is called Riesz representation theorem, but it is clearly true in finite dimensions. The construction of $\Phi$ goes like this: given $l$, the kernel is a $(n-1)$-dimensional subspace. Then $u=\Phi(l)$ is a vector in the orthogonal complement (which is one-dimensional). Your linear function is just a scalar multiple of the (scalar) projection onto the orthogonal complement of the kernel. – GReyes Nov 11 '23 at 00:49
  • I don't know the book you mention. I recommend you read chapter 7 of Arnold's book "Mathematical Methods of Classical Mechanics". There you can learn in a few pages what gradients, curls and divergences are and their relations to (antisymmetric) linear forms. – GReyes Nov 11 '23 at 00:56
  • From the same book you can learn more Mathematics (and, more importantly, Mechanics) than from dozens of other standard books. – GReyes Nov 11 '23 at 00:57
  • Thank you for the detailed answer and follow-ups, and for the recommendation. – TheProofIsTrivium Nov 11 '23 at 01:00
  • Izak van Dongen's comment on the need of completeness is an important one in infinite dimensions. Irrelevant in the finite-dimensional case. – GReyes Nov 11 '23 at 01:01