0

I'm struggling to understand how regularisation, for example using the l1 or l2 norm, has any effect on linear classification problems.

If we have a simple binary classification task where we are trying to find a weight vector $w$ to classify a number of data points $x_{1...n}$ where the predicted value $\hat{y}$ is $+1$ if $w^T x_i \geq 0$ and $-1$ if $w^Tx_i < 0$ where a loss function based on the logistic loss is used with $l_1$ regularization such that $L(w) = log(1 + exp(-y_iw^Tx_i)) + \frac{\lambda}{2}||w||_1$ then why does the regularization have any effect on the resulting weight vector? For any value of $\lambda$ could we not just reduce the magnitude of $w$ sufficiently such that the regularization term has a negligible effect on the loss but keep $w$ proportional to the optimal $w$ for the training set? For example, if the optimal $w$ is $[1, 1]$ then just reducing $w$ by the necessary factor to get $[0.00001, 0.00001]$ and thereby ignoring the effects of regularization. I'm not sure I see how the parts of the loss function that penalise misclassification would not just take full priority as they have a significantly greater effect on the resulting loss.

Any help would be appreciated. Thank you.

Tommy
  • 13
  • 4
  • There prediction does not change if w is scaled down, but the logistic loss does change! – PhoemueX May 01 '23 at 18:12
  • @PhoemueX Ok thank you I worked it through with some numbers and I think it makes sense now. Is this sort of a result of the logistic loss having lower loss for predicted values further from $0$ which prevents a tiny $w$? – Tommy May 02 '23 at 18:39

0 Answers0