0

I have a set of feature values in a matrix where a single column represents feature values for each 3D point (i.e. each row). I need to use that data for training a SVM model. For doing that, I first need to normalize the feature values. But the issue is, applying any normalisation technique gives same values for the particular column of the feature value matrix. For example, if I use min-max normalisation, all the values in a particular column for the feature matrix, will be same maybe because of the high variance in the data. For instance, the maximum and minimum value for particular column is:

Max value for 1st column of features: 7.7409e+11
Min value for 1st column of features: -9.3142e+11

A sample of values after scaling:
---------------
 Col1     Col2
---------------
0.5461 | 0.0293
0.5461 | 0.0293
0.5461 | 0.0293
0.5461 | 0.0293
0.5461 | 0.0293

Due to such a high range, I am unable to obtain any results. I have also tried a different normalisation technique called as decimal-scaling normalisation. The formula for the method is:

MATLAB CODE:

j = round(log10(max(data(:,1))))+1;
normalisation(:,i-3) = data(:, i)/(10.^j);

But i'm not sure if this is the right approach. Please help me with the normalisation technique which best suits such data.

  • How are you scaling with such large numbers? If $x_{min}$ and $x_{max}$ is abnormally larger than normal then you have a problem. Since you will be just having $\frac{x_{max}}{x_{max} + x_{min}}$ in your calculation as the actual value, $x$ is negligible. So have a look at your data and the distribution to see if these values are real..Also with these values you have an issue of floating point. – Chinny84 Mar 14 '17 at 18:09
  • Are there many such outliers? What if you removed, say, the most outlying 1% of the data, and then tried? – user3658307 Mar 22 '17 at 23:00

0 Answers0