Does this notation produce a scalar?

Question

In Blum, Hopcroft and Kannan's "Foundations of Data Science" they discuss the Perceptron Algorithm (p. 111). In the proof it says:

We will keep track of two quantities, w^Tw^* and |w|². Each update increases w^Tw^* by at least 1.

(w + x_il_i)^Tw^* = w^Tw^* + x_i^Tl_iw^* ≥ w^Tw^* + 1

The implication seems that both w^Tw^* and |w|² are scalar values. I interpret |w|² as the dot product of w with itself, as in this answer. However w^Tw^* (and similar uses of transpose throughout the proof) does not make sense to me, as I don't see how it can produce a scalar value.

My reasoning is as follows:

The vector of weights w is a row vector.
The vector of weights w^*, which assumes the "if" condition of the theorem, is also a row vector.

A transposed row vector, multiplied by a row vector, is a column vector multiplied by a row vector, which produces a matrix the size of the column multiplied by the length of the row.

For example, let:

$$\mathbf{w} = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$$ $$\mathbf{w}^* = \begin{bmatrix} 2 & 4 & 6 \end{bmatrix}$$

Then:

$$\mathbf{w}^T\mathbf{w}^* = \begin{bmatrix} 1\\2\\3 \end{bmatrix} \begin{bmatrix} 2 & 4 & 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6\\4 & 8 & 12\\6 & 12 & 18 \end{bmatrix}$$

My assumption is that "≥ w^Tw^* + 1" means "equal to or larger than the sum of two scalars, w^Tw^* + 1". Hence w^Tw^* should be a scalar.

What am I missing?

Thanks in advance.

Are you sure your source works with row vectors ? Because if $\mathbf{w}$ is a column vector, it becomes clear that $\mathbf{w}^T\mathbf{w}^*$ is scalar. — Fabien, Jul 29 '20 at 15:17
Thanks @Fabien. True, I wondered about that - I can't be 100% sure. I'm more familiar with the programmatic implementation, where the weight vector is typically a row vector, same as the input feature vector x_i. The nearest hints I can find in the text are: "The task is to find a d-dimensional vector w" and "the weight vector w will be a linear combination of the x_i". To me this translates to row vectors, but I could be wrong. — Mind over matrix, Jul 29 '20 at 15:32
They stick to mathematics' conventions in this book, saying explicitly "row vectors" for probabilities and "column vectors" for space vectors. They don't say anything about $\mathbf{w}$, but I think they consider it as a space vector, and thus a column vector. — Fabien, Jul 29 '20 at 18:03
Thanks for checking, I will follow up on that. My aim is to link the mathematical approach to the programmatic implementation, which is why I find it counterintuitive, coming from the applied side - however it might still be compatible. Elsewhere I noticed that the proof assumes the following to be equivalent: $\mathbf{w}^T\mathbf{w}^* = ,\mid\mathbf{w}\mid\mid\mathbf{w}^\mid$ and $\mathbf{w}^T\mathbf{w} = ,\mid\mathbf{w}\mid^{2}$. So my reading of it is that $\mathbf{w}^T\mathbf{w}^$ is the same as the inner product of $\mathbf{w}$ and $\mathbf{w}^*$. — Mind over matrix, Jul 30 '20 at 10:46
That sounds like the Cauchy-Schwarz inequality: $|v\cdot w| \leq |v||w|$, with equality iff $\lambda v = \mu w$ for some scalars $\lambda,\mu$. — Jackozee Hakkiuz, Aug 01 '20 at 13:50

Does this notation produce a scalar?

0 Answers0