0

Recently I've been studying machine learning. I want to find out the growth function of the following hypothesis set:

$$ \mathcal{H}=\left\{h_\mathbf{w}\equiv\mbox{sign}\big(\langle\mathbf{w}, \cdot\rangle\big)\,\colon \mathbf{w}\in\mathbb{R}^2\right\}\,. $$


I've read section 2.3 of this lecture, from Learning Theory course of University of Washington. In the paper (page 3) it claims that the growth function of $\mathcal{H}$ is

$$ g_\mathcal{H}(m)=\binom{m}{0} + \binom{m}{1} + \binom{m}{2} = \frac{m^2+m+2}{2}\,, $$

which is the maximum possible number of sections that $m$ lines can divide $\mathbb{R}^2$ into.

The reason is that each input vector $\mathbf{x}$ can be seen as a line that divides $\mathbb{R}^2$ into two sections. Any $\mathbf{w}_1$ from the same side of the line as $\mathbf{x}$ will assign $\mathbf{x}$ positive. Any $\mathbf{w}_2$ from the opposite side of the line as $\mathbf{x}$ will assign $\mathbf{x}$ negative. If there are $m$ input vectors $\mathbf{x}_1,\ldots,\mathbf{x}_m$ that divide $\mathbb{R}^2$ into multiple sections, any $\mathbf{w}$ from the same section will assign $\mathbf{x}_1,\ldots,\mathbf{x}_m$ the same way.


I understand the reasoning of the paper, but I have a different answer. I think the growth function is

$$ g_\mathcal{H}(m)=2m\,. $$

This is because all the $m$ lines should pass the origin. Therefore, these lines can divide $\mathbb{R}^2$ into $2m$ sections at most, instead of $(m^2+m+2)/2$ sections.

I don't see anything wrong in my answer. Which growth function is correct?

yanruijie
  • 500
  • Can you tell us what a "growth function" is? – Plop Sep 29 '23 at 09:00
  • Moreover, can we exclude the possibility that "linear" is used to mean "affine"? – Plop Sep 29 '23 at 09:02
  • @Plop Let $\mathcal{H}$ be a hypothesis set, and $S$ be a set of $m$ input vectors. Each hypothesis $h$ in $\mathcal{H}$ assigns elements in $S$ one way. The growth function of $\mathcal{H}$, denoted by $g_\mathcal{H}(m)$, is the maximum number of different ways the hypotheses in $\mathcal{H}$ can assign an arbitrary input vector set $S$ with $|S|=m$. – yanruijie Sep 29 '23 at 09:08
  • @Plop Yes. "Linear" is not used to mean "affine" here. There is another different section in the paper about affine classifiers, which I'm not interested in. – yanruijie Sep 29 '23 at 09:11
  • @ Plop I'm not sure if my explanation of growth function is clear enough. Its definition is also in the paper, and perhaps that definition is easier to understand. – yanruijie Sep 29 '23 at 09:15
  • I think you should incorporate the definition in your question, and the one that we can find in the document you give a link to. – Plop Sep 29 '23 at 10:41

1 Answers1

1

I agree with you.

Let $(x_1,\cdots, x_m) \in (\mathbb{R}^2\setminus \{0\})^m$. Consider the (vector) lines $\{d_1,\cdots, d_m\}$ such that for all $i$, $d_i$ is the orthogonal subspace to $x_i$.

Then $\mathbb{R}^2 \setminus \bigcup_i d_i$ has (usually, and at most) $2m$ connected components which are all convex cones, and on which, for all $w$, $h_w$ is constant. Moreover, for all $w$, the values that $h_w$ takes on these components are all distinct. Therefore, for all $w$, $h_w$ takes (usually and at most) $2m$ values, and therefore, the growth function of $\mathcal{H}$ is $m \mapsto 2m$.

I think that the author of the document got confused between linear things and affine things, because the argument looks valid if we consider $\mathcal{H}_{aff} := \{\langle w,\cdot\rangle + b \ \vert w\in \mathbb{R}^2,\ b \in \mathbb{R}\}$. Moreover, the document does not discuss the growth function of this set of affine classifiers. The relevant idea being, both for the linear and the affine case, "count how many sectors can be obtained if you cut through $m$ hyperplanes", maybe the author was thinking one day about linear hyperplanes, the other day about affine hyperplanes, and wrote down the correct proof for the wrong statement.

Plop
  • 2,647