1

In this question, the top answer references a definition of PAC learning from the book

Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David"

The definition in the book says:

A hypothesis class $\mathcal{H}$ is PAC learnable if there exist a function $m_{\mathcal{H}} : (0, 1)^2 \rightarrow \mathbb{N}$ and a learning algorithm $\mathcal{A}$ with the following property: For every $\epsilon, \delta \in (0, 1)$ and for every distribution $\mathcal{D}$ over $\mathcal{X}$, every labelling function $f: \mathcal{X} \rightarrow \{0, 1\}$, when running the learning algorithm on $m \geq m_{\mathcal{H}}(\epsilon, \delta)$ i.i.d. examples generated by $\mathcal{D}$, the algorithm returns a hypothesis $h$ such that, with probability of at least $1 − \delta$, $$L_{D, f}(h) < \epsilon$$

The answer to the question translates this to a mathematical statement that contains an if and only if statement in its definition, as opposed to an implication in a single direction (which was what I had assumed the definition was).

It isn't clear to me exactly that the above definition translates to

if ... then $\left(m \geq m_{\mathcal{H}}(\epsilon, \delta) \iff \mathbb{P}[L_{\mathcal{D}, f}(h) < \epsilon] \geq 1 - \delta \right)$

as opposed to just

if ... then $\left(m \geq m_{\mathcal{H}}(\epsilon, \delta) \implies \mathbb{P}[L_{\mathcal{D}, f}(h) < \epsilon] \geq 1 - \delta \right)$

Why is this? The answer in the original question points to a 'page 23' in the book for an explanation of the iff, but that specific page makes no mention about PAC learning whatsoever.

1 Answers1

0

Observe that if, for every $\varepsilon$ and $\delta$ there exists such $m_{\mathcal{H}}(\varepsilon, \delta)$, you can always pick the minimum of such $m_{\mathcal{H}}(\varepsilon, \delta)'s$ (valid for any distribution). If you do that, the reverse implication is true and trivial.

As a matter of fact, we normally understand by sample complexity the minimum of such $m_{\mathcal{H}}(\varepsilon, \delta)'s$. This is mentioned in the book.

Conclusion: we can safely assume the double implication.

GReyes
  • 16,446
  • 11
  • 16
  • I understand now that the 'greater than or equal' to part suggests that our hypothesis is probably-approximately correct for m greater than or equal to the minimum sample complexity function. How do we know though that a probably approximately correct hypothesis certainly can't hold for a sample size any less than that? – fomomorphism Aug 26 '23 at 04:55
  • We just define the complexity to be the minimum $m$ such that the requirements for PAC learnability hold. If we "knew" that they hold for a lesser $m$, the given one would not be the sample complexity in the first place. The minimum number satisfying any property is always well defined (unique) as long as the set of all numbers satisfying it is not empty. – GReyes Aug 26 '23 at 17:11