I am studying the following Adaboost algorithm:
$for (t=1,2,...,T):$
$h_t = argmin_{h\in H} \Sigma_{i=1}^m(D_i^t\cdot1_{[h(x_i)\neq y_i]}) $
$\epsilon_t = \Sigma_{i=1}^m(D_i^t\cdot1_{[h_t(x_i)\neq y_i]})$
$w_t = \frac{1}{2}ln(\frac{1-\epsilon_t}{\epsilon_t})$
$\forall i \in [m]: \hat D^{t+1}_i=D_i^t\cdot exp(-y_i\cdot w_t\cdot h_t(x_i))$
$D_i^{t+1}=\frac{\hat D_i^{t+1}}{\Sigma_{i=1}^m \hat D_i^{t+1}}$
and afterwards our prediction will be:
$h_s(x)=sign(\Sigma_{i=1}^T(w_i\cdot h_i(x)))$
To my understanding, $w_t$'s role is to determine how good $h_t$ is.
If the loss function is low, $w_t$ will be high and $h_s$ will care more about that certain $h_t$.
I got 2 problems with that method. The first regards $D^t$ while calculating $w_t$. It is good for calculating $D^{t+1}$ later, yet that is not what I am talking about. When I want to predict the loss of that function, I want $\epsilon$ to estimate the real loss of the hypothesis.
In my opinion you should calculate the loss of the hypothesis with uniform distribution, which represents the real world as best as possible.
The second problem, which is related to the first one, is: why do we calculate $\epsilon_t$ on the training set??
How come we don't use a test set for estimating the loss function the best we can? Off course a loss function on the training set does not give us as much information as it will on a new set of data.
I know we need those $\epsilon_t$ and $w_t$ for calculating $\hat D^{t+1}_i$, but can we not calculate another set for a better hypothesis of $h_s$?
Edit: I want to clarify the method I suggest:
We take the data and seperate it in 3 parts: train, test1, test2. To make it easier to understand, the training size is m, and the test1 size is n.
$for (t=1,2,...,T):$
$h_t = argmin_{h\in H} \Sigma_{i=1}^m(D_i^t\cdot1_{[h(x_i)\neq y_i]}) $
$\epsilon_t = \Sigma_{i=1}^m(D_i^t\cdot1_{[h_t(x_i)\neq y_i]})$
$w_t = \frac{1}{2}ln(\frac{1-\epsilon_t}{\epsilon_t})$
$\epsilon'_t = \Sigma_{i=1}^n(\frac{1}{n}\cdot1_{[h_t(x_i)\neq y_i]})$
$w'_t = \frac{1}{2}ln(\frac{1-\epsilon'_t}{\epsilon'_t})$
$\forall i \in [m]: \hat D^{t+1}_i=D_i^t\cdot exp(-y_i\cdot w_t\cdot h_t(x_i))$
$D_i^{t+1}=\frac{\hat D_i^{t+1}}{\Sigma_{i=1}^m \hat D_i^{t+1}}$
afterwards our prediction will be:
$h_s(x)=sign(\Sigma_{i=1}^T(w'_i\cdot h_i(x)))$
Thanks a lot