1

So we have the function $f(\textbf{B}) = a_0 + \textbf{a}^T\textbf{B}$

and we want to do the following operation$\frac{\partial}{\partial \textbf{B}}$

My intuition tells me this should result in $\textbf{a}^T \textbf{1}$. But according to my worksheet the answer is simply $\textbf{a}$.

This doesn't make sense to me because the function originally outputs a scalar so why would taking the partial output a vector?

  • 1
    Always when dealing with derivatives think linear approximations. Since $f$ is linear here, the derivative is $f$. To see, look at $f(B+H) -f(B) =a^TH$. So $Df(B) = a^T$. The gradient is $a$. – copper.hat Mar 03 '20 at 05:21
  • Is $B$ a vector or a matrix? – Exodd Mar 03 '20 at 05:30
  • Roughly speaking, each component of the partial output vector represents the change in the function if only the corresponding component of the input vector changes. Look at the definition of gradient in wikipedia. – RozaTh Mar 03 '20 at 06:28
  • @copper.hat Hmmm okay, so I would I show that? Right now I just want to "pass" the derivative into my $\textbf{B}$ which would make a vector of 1s. How should I think about it so that I get the gradient out of this? – financial_physician Mar 03 '20 at 14:58
  • I'm not sure what you are asking. The derivative at $B$ is a function $Df(B)$, and in this case, since $f$ s affine, $F(B+H)=f(B)+a^TH$, so $Df(B)(H) = a^TH$. The derivative at $B$ is the function that maps from the 'perturbation' $H$ to a scalar value, that is, the map $H \mapsto Df(B)(H)$. However, people generally identify the function $H \mapsto a^TH$ with the vector $a^T$ so they often refer to $a^T$ as the derivative when in fact it is just a representation of the function. – copper.hat Mar 03 '20 at 15:30
  • The gradient is a way of representing the derivative of a scalar valued function in an inner product space. Here it essentially amounts to taking the transpose of $a^T$ which is $a$. – copper.hat Mar 03 '20 at 15:31

1 Answers1

0

source https://en.wikipedia.org/wiki/Gradient#Gradient_and_the_derivative_or_differential

Since simple words, gradient needs to match the dimension of the original matrix/vector taking derivative of w.r.t.