3

I have a question regarding the differential $d_{\textbf a} f$.

Suppose we have the function $f(x,y)= xy$, and the vectors $\textbf a = (1,1)$ and $\textbf u = (2,1)$. Then, if I understand this correctly, $$d_{\textbf a} f(\textbf u) = \nabla f(\textbf a) \cdot \textbf u = (1,1)\cdot (2,1) = 2+1 = 3,$$ where $\nabla f(\textbf a) = (\partial f/\partial x, \partial f/\partial y)$. But what if my assignment is to calculate $d_{\textbf a} f$? I don't know what it means. Do they want me to calculate $d_{\textbf a} f(x,y) = (1,1)\cdot (x,y) = x+y$, or something else?

Edit: Note that it is not the directional derivative that I'm asking about.

Eivind
  • 1,800
  • What is $v$ supposed to be in the above equation? Since $d_a$ would normally refer to the directional derivative, please also give more information about the context. – Alexander Thumm May 30 '11 at 18:28
  • I think the OP means $\mathbf{u}$ rather than $\mathbf{v}$ in the displayed equation. – Jesse Madnick May 30 '11 at 18:28
  • There's a minor point that I'm curious about: does your book refer to $\mathbf{a} = (1,1)$ as a "point" or a "vector"? The distinction doesn't really matter mathematically, but sometimes it helps psychologically. – Jesse Madnick May 30 '11 at 18:30
  • What I mean is that we usually think of taking the gradient $\nabla f$ at a point $a = (1,1)$, and then taking the dot product of that with the vector $\mathbf{u} = (2,1)$. Similarly, the differential $d_af$ is regarded as being at the point $a$. – Jesse Madnick May 30 '11 at 18:31
  • As you mentioned in a previous post, the notation is $$D_\mathbf{v}f(a) = \nabla f(a)\cdot \mathbf{v} = d_af(\mathbf{v}),$$ where $D_\mathbf{v}f(a)$ is the directional derivative in the direction of the vector $\mathbf{v}$ evaluated at the point $a$. – Jesse Madnick May 30 '11 at 18:33
  • @Alexander: Jesse is right, I meant $\textbf u$. With $d_{\textbf a}$, I mean the differential. I'm not sure that I am able to give more information about the context. Isn't the differential clearly defined? – Eivind May 30 '11 at 20:08
  • Eivind: I think the question is clear. I'm writing an answer in a moment. – t.b. May 30 '11 at 20:10
  • @Jesse: The book refers to $\textbf a$ as a point, but it is still a vector, or isn't it? – Eivind May 30 '11 at 20:12
  • By the way: your calculations are correct and you're almost done, the calculations need only be interpreted and I'll explain it in a moment. Concerning your question to Jesse: Yes, $\mathbf{a}$ is a point in $\mathbb{R}^2$, that is to say a vector. – t.b. May 30 '11 at 20:14

2 Answers2

4

Essentially, you have worked out everything already, but there seems to be a bit of confusion about the definitions, so let me try to set this straight.

The differential of $f$ at the point $\mathbf{a} \in \mathbb{R}^2$ is the row matrix $$ d_{\mathbf{a}}f = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) & \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix}.$$

Now if you write $d_{\mathbf{a}}f (\mathbf{u})$ for $\mathbf{u} = \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix} \in \mathbb{R}^2$ you're meaning the matrix product $$d_{\mathbf{a}}f (\mathbf{u}) = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) & \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix} \cdot \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix} = \frac{\partial}{\partial x} f(\mathbf{a}) \cdot u_1 + \frac{\partial}{\partial y}f (\mathbf{a}) \cdot u_2 .$$

On the other hand, $\nabla f (\mathbf{a})$ is the column vector $$ \nabla f (\mathbf{a}) = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) \\\ \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix}$$ and when you're writing $\nabla f (\mathbf{a}) \cdot \mathbf{u}$ you're meaning the scalar product $$\nabla f( \mathbf{a}) \cdot u = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) \\\ \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix} \cdot \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix} = \frac{\partial}{\partial x} f(\mathbf{a}) \cdot u_1 + \frac{\partial}{\partial y}f (\mathbf{a}) \cdot u_2 . $$

So we see that for $f(x,y) = xy$ $$d_{\mathbf{a}}f = \begin{pmatrix} y & x \end{pmatrix} \qquad \text{while} \qquad \nabla f (\mathbf{a}) = \begin{pmatrix} y \\\ x \end{pmatrix}.$$

Now the confused reaction was due to the fact that the notation used here for the derivative of $f$ at the point $\mathbf{a}$ is often used as the directional derivative, and as you rightly pointed out in a comment, we have the relations $$ D_{\mathbf{u}} f (\mathbf{a}) : = d_{\mathbf{a}} f (\mathbf{u}) = \nabla f(\mathbf{a}) \cdot \mathbf{u},$$ and everything should be fine now, no?

Since you made the computations yourself already, I'll not repeat them here.

t.b.
  • 78,116
  • I'm having a little trouble with understanding the difference in using matrices and matrix product vs. vectors and scalar product. Since the results are the same, I guess must be because of some underlying theoretical difference between $\nabla f(\textbf a)$ and $d_{\textbf a} f$? Also, I came over this definition: $df=\frac{\partial f}{\partial x}(\textbf a) \cdot dx + \frac{\partial f}{\partial y}(\textbf a) \cdot dy$. How does this fit in with the rest of it? – Eivind May 30 '11 at 21:37
  • @Eivind: Okay, I see. As I mentioned in my comment to Jesse's answer, $d_{\mathbf{a}}f$ is a linear map $d_{\mathbf{a}}f : \mathbb{R}^2 \to \mathbb{R}$. Now from linear algebra you might know that every linear map $\phi: \mathbb{R}^2 \to \mathbb{R}$ is of the form $\phi(v) = \langle x_\phi, v \rangle$ for a unique vector. The vector corresponding to $d_{\mathbf{a}}f$ is $\nabla f(\mathbf{a})$, that is $d_{\mathbf{a}}f(\mathbf{u}) = \langle \nabla f (\mathbf{a}), \mathbf{u} \rangle$. Note that the formulae for the matrix product and the scalar product are similar, but they mean rather ... – t.b. May 30 '11 at 21:43
  • ... different things! – t.b. May 30 '11 at 21:43
  • Let $\mathbf{u} = \begin{pmatrix} u_1 \\ u_2 \end{pmatrix}$. The map $dx$ is a linear form and $dx (\mathbf{u}) = u_1$ and analogously $dy(\mathbf{u}) = u_2$, thus writing $d_{\mathbf{a}}f = \frac{\partial}{\partial x}f(\mathbf{a)} dx + \frac{\partial}{\partial y}f(\mathbf{a)} dy$ means when evaluating at $\mathbf{u}$ that $d_{\mathbf{a}}f (\mathbf{u}) = \frac{\partial}{\partial x}f(\mathbf{a)} dx (\mathbf{u}) + \frac{\partial}{\partial y}f(\mathbf{a)} dy (\mathbf{u}) = \frac{\partial}{\partial x}f(\mathbf{a)} u_1 + \frac{\partial}{\partial y}f(\mathbf{a)} u_2$ again, so it's the same thing. – t.b. May 30 '11 at 21:49
  • @Theo I like your explanation here but have a follow-up. Without invoking, say, the Jacobian and its role in characterizing the derivative, is there an even more elementary way to show that the differential is a row vector and the gradient is a column vector? Said differently, from first principles, how can one know that the differntial is, in fact a row vector and the gradient is a column vector? – ItsNotObvious May 30 '11 at 23:44
  • @3Sphere: If $f: U \subset \mathbb{R}^{n} \to \mathbb{R}$ then $d_{a}f$ is the unique linear map $d_af:\mathbb{R}^n \to \mathbb{R}$ satisfying $f(a + h) - f(a) = (d_af)(h) + o(|h|)$ (if it exists). So, choosing bases we get a $1 \times n$-matrix. Choosing $h = e_{i}$ we see that necessarily $d_{a}f(e_i) = \dfrac{\partial f(a)}{\partial x_i}$. On the other hand, to define the gradient, we need a scalar product: given a scalar product we get $\nabla f(a)$ as the unique vector such that $d_a f (u) = \langle \nabla f(a), u \rangle$. Having chosen a basis and the associated SP ... – t.b. May 31 '11 at 00:02
  • ...we get that $\nabla f(a)$ is the usual gradient. However, we need not choose that scalar product, we could choose another one, and the gradient would no longer have the familiar form. Does this answer your question? Ah, and thanks by the way! :) – t.b. May 31 '11 at 00:06
  • @Theo Makes perfect sense now, especially in view of the manner in which you defined the gradient. I was unaware that the gradient could be defined in this particular way. Thanks for the clarification. – ItsNotObvious May 31 '11 at 00:48
  • @3Sphere: Ok, great! There were some typos in my last comments: the formula I used for defining $d_a f$ should read "$f(a + h) - f(a) = (d_af)(h) + o(|h|)$ as $|h| \to 0$". Then I should choose $h = t e_1$, divide by $|t|$ and let $|t| \to 0$ to get the formula for the Jacobian. This definition seems to depend on the choice of a norm, but it doesn't, as all norms on $\mathbb{R}^n$ are equivalent. I reiterate: the differential is always defined and involves no choices, while the gradient only makes sense in presence of a scalar product. The familiar formula is using the standard SP. – t.b. May 31 '11 at 06:03
  • @3Sphere and Theo: Thanks to both of you for making this clear. I think I get the idea now. Theo, let's see if I understand you correctly. If this was an exam, would $d_{\textbf a} f = (y \quad x)$ be the correct answer? Also, I don't think that the book I use says anything about a difference between row/column matrices and vectors, so thank you for explaining that to me. – Eivind May 31 '11 at 07:47
  • @Eivind: Yes, that would be the perfectly correct answer. Does the book you're using really write calculations the you write it in your question $d_{\mathbf{a}}f(\mathbf{u}) = \cdots = 3$? That must be quite confusing... I'm curious: what book is it? – t.b. May 31 '11 at 07:51
  • @Theo: When it comes to the differential, the book uses $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy$. The formula $d_a f(u) = \nabla f(a) \cdot u$ was given in a lecture. I can't find it in my book (Vector Calculus by Colley). But it introduces the gradient vector as $\nabla f=(\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_2})$ (in that notation), and this is the notation I am used to, when it comes to vectors. – Eivind May 31 '11 at 08:17
  • Then it says: "Alternatively, we can use matrix notation and define the derivative of $f$ at $a$, denoted $Df(a) = [f_{x_1}(a)\quad \cdots \quad f_{x_n}(a)]$." It seems to me that $Df(a)$ is the Jacobian matrix. Then $Df(a)$ (row matrix) is multiplied by $\textbf h$, which is suddenly in column matrix form (but it doesn't say why, other that it's convenient). This makes me a little confused. Of course, it could be that I'm missing something from the text. – Eivind May 31 '11 at 08:17
  • @Eivind: I see. So the book looks at the differential as a $1$-form (does it use that word?), that is, a linear map $\mathbb{R}^2 \to \mathbb{R}$ and writes it "invariantly" that is independently of coordinates. Now once you choose coordinates (that is, a basis), $1$-forms become identified with row matrices and what you write $Df(a)$ is indeed what is usually called the Jacobian. Probably, the point that the book is trying to make is: If $V = \mathbb{R}^2$, vectors are abstract entities that happen to have a concrete incarnation as column matrices once you choose a basis... – t.b. May 31 '11 at 08:26
  • ...similarly, linear maps $g: V \to W$ are maps satisfying $g(v+w) = g(v) + g(w)$ and $g(\lambda v) = \lambda g(v)$ and can only interpreted as matrices after you choose bases of both $V$ and $W$. If you want to multiply matrices (= compose linear maps), they have to be in the correct form, that is $AB$ is only defined if the number of columns of $A$ is equal to the number of rows of $B$. – t.b. May 31 '11 at 08:29
  • @Theo: Actually, the book uses that word (1-form), but that part is not syllabus in the course I'm taking at the moment (Calculus 2), so I have not read it. The same book is also used in Calculus 3. But I think I understand what you mean. – Eivind May 31 '11 at 11:49
  • @Eivind: In fact, it took me quite some time to see why these rather subtle distinctions are made and why they are so important. Don't worry too much about them now, they will become much clearer soon. Try to get used to calculating the various forms of derivatives, try to get some feeling for them and as soon as you've mastered the formal busines, the theoretical distinctions will become much easier than they may seem now. That's the best piece of advice I can give you right now (and many people might disagree). – t.b. May 31 '11 at 12:09
-1

In the case of functions $f\colon \mathbb{R}^n \to \mathbb{R}$, like $f(x,y) = xy$ as you have, the differential $d_af$ is the same thing as the gradient $\nabla f(a)$.

Jesse Madnick
  • 31,524
  • 4
    No, it's not the same thing. One is a linear map $d_af: \mathbb{R}^n \to \mathbb{R}$ (hence a row matrix once you choose a basis), the other is a vector $\nabla f(a) \in \mathbb{R}^n$). They're related by $(d_af)(v) = \langle \nabla f(a), v \rangle$. But: The linear map is invariantly defined (independently of a basis), the gradient is only defined when a scalar product is around (the standard one once you've chosen a basis). – t.b. May 30 '11 at 18:29
  • @Theo: Yes, you're right of course, but I was trying to keep it simple. – Jesse Madnick May 30 '11 at 18:42