The rules for matrix calculus I find assume column vectors. Are the rules different for row vectors? (I am having a hard time finding them)
I used them when deriving a formula for backpropagation. \begin{gather*} Y\ =\ XW\ +\ B\\ X=\begin{bmatrix} x_{0} & x_{1} & x_{2} \end{bmatrix} ,\ Y=\begin{bmatrix} y_{0} & y_{1} \end{bmatrix} ,\ W=\begin{bmatrix} w_{00} & w_{01}\\ w_{10} & w_{11}\\ w_{20} & w_{21} \end{bmatrix} ,\ B=\begin{bmatrix} b_{0} & b_{1} \end{bmatrix} \end{gather*}
\begin{gather*} \left(\frac{\partial L}{\partial W}\right)^{T} =\begin{bmatrix} \frac{\partial L}{\partial w_{00}} & \frac{\partial L}{\partial w_{00}}\\ \frac{\partial L}{\partial w_{10}} & \frac{\partial L}{\partial w_{11}}\\ \frac{\partial L}{\partial w_{20}} & \frac{\partial L}{\partial w_{21}} \end{bmatrix} =\begin{bmatrix} \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{00}} & \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{01}}\\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{10}} & \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{11}}\\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{20}} & \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{21}} \end{bmatrix}\\ \\ Focus\ on\ one\ term:\\ y_{0} \ =\ w_{00} x_{0} +w_{10} x_{1} +w_{20} x_{2} \ +b_{0}\\ y_{1} \ =\ w_{01} x_{0} +w_{11} x_{1} +w_{21} x_{2} +b_{1}\\ \\ \frac{\partial Y}{\partial w_{00}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{00}}\\ \frac{\partial y_{1}}{\partial w_{00}} \end{bmatrix} =\begin{bmatrix} x_{0}\\ 0 \end{bmatrix}\\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{00}} =\ \begin{bmatrix} \color{red}{\frac{\partial L}{\partial y_{0}}} & \color{red}{\frac{\partial L}{\partial y_{1}}} \end{bmatrix}\begin{bmatrix} x_{0}\\ 0 \end{bmatrix} =\color{red}{\frac{\partial L}{\partial y_{0}}} \ x_{0} \ +\ \color{red}{\frac{\partial L}{\partial y_{1}}} *\ 0\ =\color{red}{\frac{\partial L}{\partial y_{0}}} \ x_{0}\\ \\ \frac{\partial Y}{\partial w_{10}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{10}}\\ \frac{\partial y_{1}}{\partial w_{10}} \end{bmatrix} \ =\begin{bmatrix} x_{1}\\ 0 \end{bmatrix} ,\ \frac{\partial Y}{\partial w_{01}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{01}}\\ \frac{\partial y_{1}}{\partial w_{01}} \end{bmatrix} \ =\begin{bmatrix} 0\\ x_{0} \end{bmatrix} ,\ \frac{\partial Y}{\partial w_{11}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{11}}\\ \frac{\partial y_{1}}{\partial w_{11}} \end{bmatrix} \ =\begin{bmatrix} 0\\ x_{1} \end{bmatrix} ,\\ \frac{\partial Y}{\partial w_{20}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{20}}\\ \frac{\partial y_{1}}{\partial w_{20}} \end{bmatrix} \ =\begin{bmatrix} x_{2}\\ 0 \end{bmatrix} ,\ \frac{\partial Y}{\partial w_{21}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{21}}\\ \frac{\partial y_{1}}{\partial w_{21}} \end{bmatrix} \ =\begin{bmatrix} 0\\ x_{2} \end{bmatrix}\\ \\ Finally:\\ \left(\frac{\partial L}{\partial W}\right)^{T} =\begin{bmatrix} \frac{\partial L}{\partial y_{0}} \ x_{0} & \frac{\partial L}{\partial y_{1}} \ x_{0}\\ \frac{\partial L}{\partial y_{0}} \ x_{1} & \frac{\partial L}{\partial y_{1}} \ x_{1}\\ \frac{\partial L}{\partial y_{0}} \ x_{2} & \frac{\partial L}{\partial y_{1}} \ x_{2} \end{bmatrix} =\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}\begin{bmatrix} \frac{\partial L}{\partial y_{0}} & \frac{\partial L}{\partial y_{1}} \end{bmatrix} =\ X^{T}\color{red}{\frac{\partial L}{\partial Y}} \end{gather*}
However, I am not sure if the final result has the correct shape.
\begin{gather*} \frac{\partial L}{\partial X} =\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial X}\\ \frac{\partial Y}{\partial X} =\begin{bmatrix} \frac{\partial y_{0}}{\partial x_{0}} & \frac{\partial y_{0}}{\partial x_{1}} & \frac{\partial y_{0}}{\partial x_{2}}\\ \frac{\partial y_{1}}{\partial x_{0}} & \frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} \end{bmatrix} =\begin{bmatrix} w_{00} & w_{10} & w_{20}\\ w_{01} & w_{11} & w_{21} \end{bmatrix} =W^{T}\\ \frac{\partial L}{\partial X} =\ \color{red}{\frac{\partial L}{\partial Y}} W^{T} ,\color{red}{\frac{\partial L}{\partial Y} =}\color{red}{\begin{bmatrix} \color{red}{\frac{\partial L}{\partial y_{0}}} & \color{red}{\frac{\partial L}{\partial y_{1}}} \end{bmatrix}}\\ \\ If\ the\ rules\ were\ simply\ reversed:\\ \color{red}{\frac{\partial L}{\partial Y}}\color{red}{=}\color{red}{\begin{bmatrix} \color{red}{\frac{\partial L}{\partial y_{0}}}\\ \color{red}{\frac{\partial L}{\partial y_{1}}} \end{bmatrix}} ,\frac{\partial Y}{\partial X} =\begin{bmatrix} \frac{\partial y_{0}}{\partial x_{0}} & \frac{\partial y_{1}}{\partial x_{0}}\\ \frac{\partial y_{0}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{1}}\\ \frac{\partial y_{0}}{\partial x_{2}} & \frac{\partial y_{1}}{\partial x_{2}} \end{bmatrix}\\ Then\ the\ dimensions\ for\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial X} \ won't\ match \end{gather*}