0

Let's say I have a function of $N$ input variables, $f(x_1,x_2,...,x_N)$. I want to numerically estimate the gradient of the function at a particular point in space, $\vec{x}_0$.

Lets say my function takes a very long time to evaluate. At minimum, it seems like I would need to perform $N$ function evaluations in addition to the initial point $f(\vec{x}_0)$. Because the gradient is a $N$-dimensional vector, we need $N+1$ points to constrain our estimate. I'm fairly certain that this intuition is correct.

Question #1: Is my intuition correct?

Question #2: How would you explain why this is the case to an audience with limited mathematical background (e.g. biologists)?

1 Answers1

1

It depends on how you want to estimate the gradient. Take, for example, the forward difference quotient; in one dimension, it's just $\frac{d}{dx}f(x)\approx\frac{f(x+h)-f(x)}{h}$ for some small $h$.

For the gradient, you can now use this technique to estimate the partial derivatives in each direction, i.e. for dimension $k$: $$\frac{\partial}{\partial x_k}f(x_1,\ldots,x_N)\approx\frac{f(x_1,\ldots,x_k+h,\ldots,x_N)-f(x_1,\ldots,x_N)}{h}.$$ If you do this $N$ times, once for each dimension, you can count the function evaluations- you need $f(x_1,\ldots,x_N)$ once in each fraction, and then for each $k=1,\ldots,N$ you need $f(x_1,\ldots,x_k+h,\ldots,x_N)$. This gives exactly the $N+1$ points you mentioned.

However, if you want to estimate with a central difference quotient, $\frac{d}{dx}f(x)\approx\frac{f(x+h)-f(x-h)}{2h}$, you can't reuse the point on the right in the fraction above; this method would give you $2N$ points at which you need to evaluate $f$.

I should mention here that the problem of estimating derivatives is ill-posed, see this for more info.

  • Thanks for the response. I figured what you said is the case. Just to be clear -- you are saying that the minimal amount of evaluations you could get away with is $N$, and you could get an improved estimate by using the central difference quotient and $2N$ evaluations. But there isn't a principled approach to use less than $N$ evaluations. – digbyterrell Jan 23 '14 at 12:57
  • 1
    The gradient is defined as the vector of the $N$ partial derivatives, so you have to estimate all $N$ values to get the complete gradient. As far as I know, there does not exist an estimate that only requires one point per partial derivative, so combining forward (or backward) differences yields the least number of evaluations, $N+1$. However, if you have additional information on the structure of the gradient you might get away with less, but not in general. – rngantner Jan 24 '14 at 07:16