Why does regularization have an effect in linear classifiers?

Question

I'm struggling to understand how regularisation, for example using the l1 or l2 norm, has any effect on linear classification problems.

If we have a simple binary classification task where we are trying to find a weight vector $w$ to classify a number of data points $x_{1...n}$ where the predicted value $\hat{y}$ is $+1$ if $w^T x_i \geq 0$ and $-1$ if $w^Tx_i < 0$ where a loss function based on the logistic loss is used with $l_1$ regularization such that $L(w) = log(1 + exp(-y_iw^Tx_i)) + \frac{\lambda}{2}||w||_1$ then why does the regularization have any effect on the resulting weight vector? For any value of $\lambda$ could we not just reduce the magnitude of $w$ sufficiently such that the regularization term has a negligible effect on the loss but keep $w$ proportional to the optimal $w$ for the training set? For example, if the optimal $w$ is $[1, 1]$ then just reducing $w$ by the necessary factor to get $[0.00001, 0.00001]$ and thereby ignoring the effects of regularization. I'm not sure I see how the parts of the loss function that penalise misclassification would not just take full priority as they have a significantly greater effect on the resulting loss.

Any help would be appreciated. Thank you.

There prediction does not change if w is scaled down, but the logistic loss does change! — PhoemueX, May 01 '23 at 18:12
@PhoemueX Ok thank you I worked it through with some numbers and I think it makes sense now. Is this sort of a result of the logistic loss having lower loss for predicted values further from $0$ which prevents a tiny $w$? — Tommy, May 02 '23 at 18:39

Why does regularization have an effect in linear classifiers?

0 Answers0