Derivative using chain rule

Question

I have the following function $$f=\sum_{i=1}^{n} \sum_{j=1}^{p} \bigg \lbrace y_{ij}(\boldsymbol{\lambda_j^{T}m_i}+\beta_{0j}) - \frac{1}{2} (\boldsymbol{\lambda_j^{T}m_i}+\beta_{0j}) - \frac{1}{4} \lbrace (\boldsymbol{\lambda_j^{T}m_i}+\beta_{0j})^2+\boldsymbol{\lambda_j^{T}V_i\lambda_j} \rbrace \bigg \rbrace$$ where $y_{ij}, \beta_{0j}$ real numbers, $\boldsymbol{\lambda_j}$, $\boldsymbol{m_i}$ $qx1$ vectors and $\boldsymbol{V_i}$ $qxq$ matrix. I want to calculate the $\frac{\partial f} {\partial{\lambda_{jk}}}$ where $\lambda_{jk}$ is the k-th element of the vector $\boldsymbol{\lambda_j}$. I started with the chain rule: $\frac{\partial f} {\partial{\lambda_{jk}}}=\frac{\partial f} {\partial{\boldsymbol{\lambda_j}}} \frac{\partial{\boldsymbol{\lambda_j}}} {\partial{\lambda_{jk}}}$ , however the first term will give a $qx1$ vector and the second a $1xq$, while I want the final result to be a real number. Is the chain rule wrong?

This is a mathematics question – not a statistics question. I suggest this be migrated to math.stackexchange. — The Pointer, May 10 '21 at 10:39

greg · Answer 1 · 2021-05-10T17:41:17.777

$\def\e{\epsilon}\def\v{\varepsilon}\def\R#1{{\mathbb R}^{#1}}\def\o{{\tt1}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Let $\{e_i,\v_j,\e_k\}$ denote vectors from the standard basis for $\{\R{n},\R{p},\R{q}\}$ and define the all-ones vector/matrix variables $$\eqalign{ \o_n = \sum_{i=1}^n e_i \quad \o_p = \sum_{j=1}^p \varepsilon_j \quad \o_q = \sum_{k=1}^q \epsilon_k \qquad J_{np} = \o_n\o_p^T \quad J_{pp} = \o_p\o_p^T \\ }$$ and the double-dot product (of identically dimensioned matrices) $$A:B = \sum_{i=1}^n\sum_{j=1}^p A_{ij}B_{ij}$$

Then define the following vector/matrix variables and map them to the indexed quantities appearing in the problem statement $$\eqalign{ Y &\implies y_{ij} &= Y:e_i\varepsilon_j^T = e_i^TY\varepsilon_j \\ M &\implies m_i &= Me_i\\ L &\implies \lambda_j &= L\v_j \\ b &\implies \beta_{0j} &= b^T\v_j \\ W &\implies W &= \sum_{i=1}^n V_i \\ }$$ In other words, $\{M,L\}$ are matrices whose columns are the $\{m_i,\lambda_j\}$ vectors, while the individual components of $\{Y,b\}$ are the $\{y_{ij},\beta_{0j}\}$ scalars.

The following auxiliary matrix variables will be very convenient $$\eqalign{ A &= M^TL + \o_nb^T \quad&\implies\quad dA = M^TdL \\ S &= \tfrac 12\left(W+W^T\right) \quad&\implies\quad S = {\rm Sym}(W) \\ }$$ Write the objective function in a pure matrix form using these new variables.
Then calculate its differential and gradient. $$\eqalign{ f &= Y:A - \tfrac 12 J_{np}:A - \tfrac 14 A:A - \tfrac 14 J_{pp}:L^TWL \\ df &= Y:dA - \tfrac 12 J_{np}:dA - \tfrac 12 A:dA - \tfrac 14 J_{pp}:(L^TW\,dL+dL^TWL) \\ &= \left(Y-\tfrac 12J_{np}-\tfrac 12A\right):M^TdL - \tfrac 14 \left(W+W^T\right)LJ_{pp}:dL \\ &= \left(MY-\tfrac 12MJ_{np}-\tfrac 12MA - \tfrac 12SLJ_{pp}\right):dL \\ \p{f}{L} &= MY-\tfrac 12MJ_{np}-\tfrac 12MA - \tfrac 12SLJ_{pp} \;\;\doteq\;\; G\quad\{{\rm the\,gradient}\} \\ }$$ This gradient is a $(q\times p)$ matrix. To obtain individual components, simply contract it with the standard basis vectors $$\eqalign{ G_{kj} = \e_k^TG\v_j = G:\e_k\v_j^T }$$

score 0 · Answer 2 · answered May 10 '21 at 09:04

0

You're using denominator layout. For consistent differentiation, you'll need to go left, because you're taking transpose of an expression written in numerator layour, $(AB)^T=B^TA^T$): $$\frac{\partial f}{\partial \lambda _{jk}}=\underbrace{\frac{\partial \lambda_j}{\partial \lambda_{jk}}}_{1\times q}\underbrace{\frac{\partial f}{\partial \lambda_j}}_{q\times 1}$$

Or, you could directly differentiate the expression wrt $\lambda_{jk}$.

answered May 10 '21 at 09:04

gunes

481

If I understand correctly, the chain rule that I wrote is in numerator layout? So I should use the transpose of that quantity? – tata May 10 '21 at 09:37
Your terms are in denominator layout, but you try to go right as in numerator layout chain rule. So, you should either expand left as above, or use numerator layout (i.e. numerator dimension x denominator dimension) – gunes May 10 '21 at 09:40
Thank you, I think I understand. Concerning direct differentiation wrt $\lambda_{jk}$, I should write the multiplication terms using sum expressions? i don't know how I can write the last product – tata May 10 '21 at 09:43
It is $$\sum_m\sum_r \lambda_{jm}\lambda_{jr}V_{i,mr}$$ – gunes May 10 '21 at 09:44
Thank you. Using the chain rule, I get $\frac{\partial \boldsymbol{\lambda_j}}{\partial \lambda_{jk}} \sum_{i=1}^{n} \bigg ( y_{ij}\boldsymbol{m_i} - \frac{1}{2} \boldsymbol{m_i}- \frac{1}{2} \boldsymbol{m_im_i^{T}\lambda_j} - \frac{1}{2} \beta_{0j}\boldsymbol{m_i} - \frac{1}{2} \boldsymbol{V_i\lambda_j} \bigg ) $. Setting this to $0$, how can I solve wrt to $\lambda_{jk}$ since I have the term $\frac{\partial \boldsymbol{\lambda_j}}{\partial \lambda_{jk}}$ which is a 0/1 vector with 1 only at k position? – tata May 10 '21 at 10:56

Derivative using chain rule

2 Answers2