4

In a lot of optimization algorithms I have seen that they use a scaling factor to scale the objective function. I dont understand what is the reason behind this. Why do we need scaling of objective function in optimization algorithms? Does it work without scaling. Logically it should work but I am a litte bit confused now.

I hope you can answer me and thank you indeed

Student
  • 777
  • You are correct in that solving $\min_{x \in C} f(x)$ is the same (in some sense) as solving $\min_{x \in C} \sigma f(x)$, with $\sigma>0$ a fixed constant. To further answer the question, it would help if you gave an example of an algorithm that does this scaling. – copper.hat Apr 08 '13 at 17:50
  • 2
    In principle, with exact computations, it makes no difference. In practice, with limited precision, scaling can be of great practical importance. – André Nicolas Apr 08 '13 at 17:51
  • 1
    Like @AndréNicolas said. Limited precision. If you've done any programming or read about floating point numbers you'll realize when numbers are very small or large errors can occur and create problems. – DiegoNolan Apr 08 '13 at 17:57
  • I usually think of scaling in terms of the domain rather than the objective value itself (as in, say, Newton's method). – copper.hat Apr 08 '13 at 19:38
  • @copper.hat. GlowWorm Swarm Optimization is the algorithm that I developed and that uses scaling to update its actual value in current iteration by adding an amount of value to previous iteration`s value and also a value from the objectiv function J. Like this :

    l(t+1) = (1 - rho)l(t) + ghamma*J(x(t)); ghamma is scaling factor(a constant)

    – Student Apr 08 '13 at 20:32
  • @DiegoNolan. Yes I know about that. Are you pointing to floating point errors caused because of rounding or because of limited max and min of numbers in machines...but I dont understand what`s the point of scaling here? – Student Apr 08 '13 at 20:36
  • @Drilon: I don't really follow. As in I don't know what is being optimized, what is the domain, etc. – copper.hat Apr 08 '13 at 21:10
  • @copper.hat OK no problem. It is difficult to find something on internet...and about the domain - it can be the set of real numbers R. It is not certain which function are we going to use. It can be any function. Multimodal functions(functions with multiple optima) are being optimized such that at the end we will find multiple optima (global optima and all local optima). The maximum of a function is a global optima and other values near to that maximum are local optima. Here is a link youtube.com/watch?v=_vhSu4xBoFs – Student Apr 09 '13 at 07:16
  • @Drilon: Thanks. I had a glance at the video, but it makes for a nice movie, but gives no clue as to how it functions. It looks like a multi-start method with some local optimization. – copper.hat Apr 09 '13 at 07:28
  • @copper.hat. Yes that`s whhat it does. Instead of finding just one global solution(global optima (maxima)) it finds also local optimas. It is like all the other optimization algorithms as Partical Swarm optimization or Ant Colony Optimization but this algorithm is for multiple optima instead of just global optima. Fot the algorithm in details it is difficult to explain it here. But if you can then try to look in their paper http://link.springer.com/content/pdf/10.1007/s11721-008-0021-5 – Student Apr 09 '13 at 08:17
  • @copper.hat. One another thing. Before some days I asked the author itself for this scaling issue and he just answered me: You can neglet scaling factor. He explained me some of my questions in details but not the scaling factor. He just said it is a scaling factor and you can neglet it! He made me suspect that maybe he just used it because he knows that it is VERY GOOD using scaling but why?? I am not sure yet. – Student Apr 09 '13 at 08:46
  • @copper.hat: Any idea ? – Student Apr 09 '13 at 21:18
  • @Drilon: I don't have a Springer account. – copper.hat Apr 09 '13 at 21:55

2 Answers2

3

When you study a optimization problem theoretically, scaling does not matter. However, sometimes one adds a scaling factor, such that the derivative is a little bit simpler, e.g., instead of $x^2$ one likes to minimize $\frac12 x^2$.

However, scaling is of importance for the numerically solution. If your optimization method is not scaling invariant, then you get a different sequence of iterates (not only due to rounding errors). Newton's method, e.g., is affine invariant, i.e., if you do an affine transformation of your optimization problem, you get (up to the transformation) the same sequence of iterates.

gerw
  • 31,359
  • Thank You indeed for your answers. Well most probably it is because of my limited knowledge on this field but I couldnt get it. Anyway I appreciate your explanations. Maybe what you said should be enough for me to understand it but actually I couldnt. Anyway thnx a lot for your answers. If not in what I was interested exactly, still they helped me in general knowledge. – Student Apr 09 '13 at 07:20
  • I interpret the affine invariance of Newton's method as an indication of how good the scaling is. – copper.hat Apr 09 '13 at 07:26
0

Beyond the numerical efficiency perspective there is also an impact on the model.

If your feature variables are scaled then your objective function needs to be scaled, otherwise you would need disproportionately large coefficients to get to the same target value.

Scaling on power law distributed objective functions also give skewed results. These objective functions require sublinear transformations.

From Skiena's Data Science Design Manual p276:

"Small-scale variables need small-scale targets, in order to be realized using small-scale coefficients. Trying to predict GDP from Z-scored variables will require enormously large coefficients. How else could you get to trillions of dollars from a linear combination of variables ranging from -3 to +3?"