Could someone elaborate on what is the difference between the two, perhaps with a use of a simple example? I am a bid confused as I always thought they were connected...
2 Answers
I am trying to explain Data space vs parametric space or dual space with the following simple example.
Consider an algebraic equation of a line $ax_1+bx_2=c \equiv \omega \cdot x =c$, where $\omega \in \Omega=\mathbb{R}^2, x \in \mathcal{X}=\mathbb{R}^2$ and $c \in \mathbb{R}$ is thresholding bias to take care additional contribution (but not a noise) while in measurement ($y=ax_1+bx_2-c$) other than the variable components $x_1, x_2$.
In duality representation, in the equation $\omega \cdot x =c$, the parapeteric vector $\omega$ is a 2-D vector in the parameter space or $\omega-$ space and x is a 2-D vector in the data space or $x-$ space and projected in the direction normal to $\omega$.
Furthermore, the equation of line $\omega \cdot x =c$ represents an affine line translated by an amount c from the linear line $\omega \cdot x =0$.
- 69
Here is an example. It is similar to @Laskshman's but I think it is a bit clearer.
Let's say you get data with three columns: $x_1$, $x_2$, and $y$. If the values for $x_1$, $x_2$, and $y$ can take any real value, then your data space is (an instance of) $\mathbb{R}^3$. In other words, each data point is an element of $\mathbb{R}^3$.
Now, let's say you want to model $y$ as a function of $x_1$ and $x_2$. You need to pick a model so let's say you pick the following:
$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12}x_1 x_2.$$
When you train the model, you will try to find the best values for $\beta_1$, $\beta_2$, $\beta_3$, and $\beta_4$. These can each be any real numbers so the parameter space is $\mathbb{R}^4$.
But the data space and parameter space can be very different. For example, let's say that the model you are using is a decision tree with only two levels. In other words, it will be the equivalent of:
If x_i1 > t_1 then
If x_i2 > t_2 then return z_2a
Else return z_2b
Else
If x_i3 > t_3 then return z_3a
Else return z_3b
Now, when you train this model, you are learning values for:
- i1, i2, i3 --> each of these can be $1$ or $2$
- t_1, t_2, t_3 --> each of these can be anything in $\mathbb{R}$
- z_2a, z_2b, z_3a, z_3b --> each of these can be anything in $\mathbb{R}$
So, in this case, the parameter space is (an instance of) $\{1, 2\}^3 \times \mathbb{R}^7.$
- 1,447