0

In Blum, Hopcroft and Kannan's "Foundations of Data Science" they discuss the Perceptron Algorithm (p. 111). In the proof it says:

We will keep track of two quantities, wTw* and |w|2. Each update increases wTw* by at least 1.

(w + xili)Tw* = wTw* + xiTliw*wTw* + 1

The implication seems that both wTw* and |w|2 are scalar values. I interpret |w|2 as the dot product of w with itself, as in this answer. However wTw* (and similar uses of transpose throughout the proof) does not make sense to me, as I don't see how it can produce a scalar value.

My reasoning is as follows:

The vector of weights w is a row vector.
The vector of weights w*, which assumes the "if" condition of the theorem, is also a row vector.

A transposed row vector, multiplied by a row vector, is a column vector multiplied by a row vector, which produces a matrix the size of the column multiplied by the length of the row.

For example, let:

$$\mathbf{w} = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$$ $$\mathbf{w}^* = \begin{bmatrix} 2 & 4 & 6 \end{bmatrix}$$

Then:

$$\mathbf{w}^T\mathbf{w}^* = \begin{bmatrix} 1\\2\\3 \end{bmatrix} \begin{bmatrix} 2 & 4 & 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6\\4 & 8 & 12\\6 & 12 & 18 \end{bmatrix}$$

My assumption is that "≥ wTw* + 1" means "equal to or larger than the sum of two scalars, wTw* + 1". Hence wTw* should be a scalar.

What am I missing?

Thanks in advance.

  • 3
    Are you sure your source works with row vectors ? Because if $\mathbf{w}$ is a column vector, it becomes clear that $\mathbf{w}^T\mathbf{w}^*$ is scalar. – Fabien Jul 29 '20 at 15:17
  • Thanks @Fabien. True, I wondered about that - I can't be 100% sure. I'm more familiar with the programmatic implementation, where the weight vector is typically a row vector, same as the input feature vector x_i. The nearest hints I can find in the text are: "The task is to find a d-dimensional vector w" and "the weight vector w will be a linear combination of the x_i". To me this translates to row vectors, but I could be wrong. – Mind over matrix Jul 29 '20 at 15:32
  • 1
    They stick to mathematics' conventions in this book, saying explicitly "row vectors" for probabilities and "column vectors" for space vectors. They don't say anything about $\mathbf{w}$, but I think they consider it as a space vector, and thus a column vector. – Fabien Jul 29 '20 at 18:03
  • Thanks for checking, I will follow up on that. My aim is to link the mathematical approach to the programmatic implementation, which is why I find it counterintuitive, coming from the applied side - however it might still be compatible. Elsewhere I noticed that the proof assumes the following to be equivalent: $\mathbf{w}^T\mathbf{w}^* = ,\mid\mathbf{w}\mid\mid\mathbf{w}^\mid$ and $\mathbf{w}^T\mathbf{w} = ,\mid\mathbf{w}\mid^{2}$. So my reading of it is that $\mathbf{w}^T\mathbf{w}^$ is the same as the inner product of $\mathbf{w}$ and $\mathbf{w}^*$. – Mind over matrix Jul 30 '20 at 10:46
  • That sounds like the Cauchy-Schwarz inequality: $|v\cdot w| \leq |v||w|$, with equality iff $\lambda v = \mu w$ for some scalars $\lambda,\mu$. – Jackozee Hakkiuz Aug 01 '20 at 13:50

0 Answers0