4

During my first statistics course I learned that a statistical model is a collection of probability measures $\mathcal{P}$, where we can index each measure by a 'parameter' $\theta$ such that $\mathcal{P} = \{P_\theta\,\,|\,\,\theta\in\Theta\}$.

My first question is: What exactly is $\Theta$?

I am now working on a project concerning nonparametric statistics where $\Theta$ is always an (infinite dimensional) vector space. However, when we look at the parametric normal family $\{N(\mu,\sigma^2)\,\,|(\mu,\sigma)\in\Theta = \mathbb{R}\times(0,\infty)\}$, then clearly $\Theta$ is no vector space.

A possible answer that I thought of was that $\Theta$ is in general a metric space (although maybe just a set is enough?), but then how do we mark the transistion between a parametric model and non parametric model. To only separate when $\Theta$ is an infinite dimensional vector space produces strange cases. For example: when we consider an infinite dimensional vector space, but interpret it as just a metric space, do we suddenly deal with a parametric model? That seems odd...

My second question: What exactly separates parametric and nonparametric models when we look at $\Theta$.

Thank you!

Marc
  • 6,861
  • It might be helpful to state your definitions (or understanding, if these are unfamiliar) of the terms "vector space", "finite dimensional", and "infinite dimensional". – Eric Towers Mar 02 '14 at 22:40

1 Answers1

5

$\Theta = \mathbb{R} \times (0,\infty)$ is not infinite dimensional. It is finite dimensional, with dimension two, since every $\theta \in \Theta$ is a pair of numbers: a mean and a standard deviation.

It is not a mathematical vector space, since in general, there is no meaningful addition operation on the "vectors" in a parameter space. It is a "space of vectors" in the sense that it is a space comprised of ordered sequences of numbers.

The technical terms parametric and non-parametric refer to the dimensionality of the parts of $\Theta$. For instance, the parts of the $\Theta$ you describe are each one-dimensional as they are subsets of the real numbers. Since each parameter is finite dimensional and there are only finitely many of them, the model is parametric.

If a model has an infinite dimensional parameter or an infinite number of parameters, then it is called nonparametric. An example of such a model is all real valued functions on the interval $[0,1] \subseteq \mathbb{R}$. Since each point of the function is independent of every other point, such a function is specified by listing all (uncountably many) of its values. The space of such functions is vastly larger than the real numbers. The more common examples in statistics assume that the population has a distribution about which nothing is known and attempts to construct a "best-fit" distribution over the space of all possible distributions. (The process sketch just made has many technical difficulties to overcome in an implementation.)

Eric Towers
  • 67,037
  • Superb, thank you. One more question though. You refer to the dimension of the parts of $\Theta$, however the part of $\sigma^2$ in my example is $(0,\infty)$ which is no vector space? – Marc Feb 28 '14 at 15:01
  • It is a subset of the reals (it is the nonnegative reals). It is not a "vector space" because it is not an abelian group. It lacks additive inverses, like $-1$ and $-2$. It is a coordinate space since the only requirement for such a space is that it contains the values you need for whatever you're doing. Since variance is nonnegative you only need the nonnegative real numbers. – Eric Towers Mar 02 '14 at 16:37