0

I am new to machine learning and I came across the term "hypothesis space". I am trying to grasp what is it and especially am interested in dimension of this "space." For example in the context of linear regression, trying to fit a linear polynomial to the data, would the dimension of the hypothesis space be $2$? What about in the context of logistic regression?

funmath
  • 219
  • How was the term used? – Michael Hardy Apr 29 '20 at 04:53
  • One often speaks of a "parameter space". In the simplest logistic regression problems, one has $$ \operatorname{logit} \Pr(Y_i=1) = \alpha + \beta x_i $$ where $$\operatorname{logit} p = \log \frac p {1-p}$$ and $\Pr(Y_i\in{0,1}) = 1.$ Then the parameter space is the set of all possible values of the two parameters $\alpha,\beta.$ And one considers hypotheses concerning the values of these two parameters. – Michael Hardy Apr 29 '20 at 04:56
  • @MichaelHardy I think hypothesis space has more to do with function space as opposed to parameter space. I am unsure though if both end up have the same dimension. – funmath Apr 29 '20 at 16:24
  • As I said: How was the term used? – Michael Hardy Apr 29 '20 at 16:30
  • @MichaelHardy A hypothesis space refers to the set of possible approximations that algorithm can create for f. The hypothesis space consists of the set of functions the model is limited to learn. For instance, linear regression can be limited to linear functions as its hypothesis space. – funmath Apr 29 '20 at 16:40
  • ok, Then that seems like about $98%$ of the answer to your question. But let's be clear on a couple of things. Linear regression is not the same as fitting a "linear polynomial." A popular naive error is to think that the reason it's called linear regression is that a straight line is being fitted. But fitting a quadratic polynomial by ordinary least squares is another instance of linear regression. Logistic regression, on the other hand, is an instance of nonlinear regression. – Michael Hardy Apr 29 '20 at 17:06

1 Answers1

0

In the simplest instances of logistic regression one has independent random variables $Y_1,\ldots,Y_n$ for which $$ \begin{cases} \operatorname{logit} \Pr(Y_i=1) = \phantom{+(}\alpha + \beta x_i \\[8pt] \operatorname{logit} \Pr(Y_i=0) = -(\alpha+\beta x_i) \end{cases} $$ where $$ \operatorname{logit} p = \log \frac p {1-p}, $$ and

  • $\{(x_i, Y_i) : i=1,\ldots,n\}$ are observed;
  • $\alpha,\beta$ are not observed and are to be estimated based on the above observed data;
  • As mentioned, $Y_i$ are random variables. On the other had $x_i$ are treated as constant, i.e. non-random, despite the fact that they may change if a new sample of $n$ observations is taken, the justification being that one is really interested in the conditional distribution of $Y$ given $x.$

Least squares is not the method used for estimating $\alpha$ and $\beta;$ maximum likelihood is, and the MLE is found by iteratively re-weighted least squares.

The function of most interest my be $$ p = \operatorname{logit}^{-1} (\alpha + \beta x) = \frac 1 {1 + e^{-(\alpha+\beta x)}}. $$ Every such function is completely determined by the values of $\alpha$ and $\beta.$ And in this case $\alpha$ and $\beta$ can be any real numbers at all.

Therefore the hypothesis space, if that is defined as the set of functions the model is limited to learn, is a $2$-dimensional manifold homeopmorphic to the plane.

When the mapping from the parameter space to the hypothesis space is one-to-one and continuous, then the dimension of the hypothesis space is the same as the dimension of the parameter space. And "continuous" may be best defined in this context in such a way that it's always continuous, i.e. the mapping itself determines the topology on the hypothesis space.