1

I'm currently taking an online course where the professor has a habit of writing norms like this:

$||a^{[l](C)} - a^{[l](G)} ||^2$

Since I don't have a great amount of experience in math or the concepts of deep learning, I was often confused whether the 2 simply meant, in conjunction with the double bars, "apply the L2 norm to the terms within, i.e. square each of them and then sum the result" or if the 2 was, itself, a squaring of whatever the double brackets meant on their own.

So I googled for confirmation of the notation and it seems that the 2 is normally written as a subscript, not a superscript. For example you can see the notation in Wikipedia is written this way: https://en.wikipedia.org/wiki/Norm_(mathematics)#Notation

So is the superscript notation wrong? Or is this just one of those unfortunate cases where there is no standard?

Stephen
  • 1,013
  • 1
    So if the $2$ appears in the exponent on a quantity, it's meant as a square. As a subscript, it indicates that it is the $L^2$ norm most likely. However you will see both $L^2$ and $L_2$ in use. I prefer the former, but some prefer the latter. – Cameron Williams Mar 14 '18 at 01:32

2 Answers2

8

Since I don't have a great amount of experience in math or the concepts of deep learning, I was often confused whether the 2 simply meant, in conjunction with the double bars, "apply the L2 norm to the terms within, i.e. square each of them and then sum the result" or if the 2 was, itself, a squaring of whatever the double brackets meant on their own.

I have never seen an author disambiguate the norm delimiters $\lVert\quad\rVert$ through the use of a superscript. In analysis, such notation would be incredibly confusing, since we frequently need to establish inequalities among norms of vectors raised to some power.

Also, an $L^2$ norm of a vector is the square root of the sum of the absolute squares of its components: $$\lVert x\rVert_2=\sqrt{\sum_{i=1}^n\lvert x_i\rvert^2}\text{;}$$ consequently, $$\lVert x\rVert^2_2=\sum_{i=1}^n\lvert x_i\rvert^2\text{.}$$

K B Dave
  • 7,912
0

Just to add to this: since OP mentioned "concepts of deep learning", I'm guessing that the expressions that contain these norms appear in loss functions. Usually, regressive loss functions have a square because they have nicer derivatives for gradient descent.

For example, with a regularisation term added, you would see something like $$ L = \frac{1}{2}\sum_{i=1}^N (y_i - f(\vec x_i))^2 + \frac{\lambda}{2}||\vec w||^2 $$ with $f(\vec x) = \vec w \cdot \vec x$. Lots of squares here, but differentiating to any $w_j$, we have $$ \frac{\partial L}{\partial w_j} = \sum_{i=1}^N (f(\vec x_i) - y_i) x_j + \lambda w_j $$ which would have been a lot less nice without the squares. Other norms are sometimes also used (e.g. $||\vec w||_1$ for L1-regularisation), but the convention in machine learning is always that $||\vec w||$ means $||\vec w||_2$ by default.

Mew
  • 327