0

I want to know the exact procedure involved in KNN classification. I understand the bigger picture but I miss the details to implement.

I have 3 pieces of data: Train, Validate and Test.

1) Suppose we have training points $x_1, x_2,\dots,x_N$ each in $\Bbb R^D$ where $D$ is number of features and the labels are $y_1,\dots,y_N$ each in $\Bbb R$ where $N$ is number of training points.

What does training involve? Do I need to pick nearest neighbors for each point $j\in\{1,\dots,N\}$ and relabel the points based on majority vote?

2) What does validation involve?

Given a validation point $x\in\Bbb R^D$ with label $y$ what should I do?

3) Is testing same as validation?

Turbo
  • 6,221
  • Train phase here requires to just remembers the distance matrix for those points and labels of points. So the distance matrix will be N x N size. Validation/prediction here means that you take some point $x$ and make prediction for it, based on the distance matrix learned above, to be move precise, you take a point $x$, take it's k neighbours (you know them from distance matrix) and make prediction for this point based on these neighbours' labels: mean label in case of regression task, mode of labels in case of classification – Joitandr Jul 23 '20 at 06:17

1 Answers1

0

I think the points from $R^D$ that are not $x_1, x_2,\dots,x_N$ should be classified, you should not relabel $x_1, x_2,\dots,x_N$ .
For validation you need points from $R^D$ with valid labels ( the same $x_1, x_2,\dots,x_N$ have).

VK69UK
  • 1