3

A function $f:[a,b]\to \mathbb{R}$ is given (we can assume it is continuous or differentiable) and we want to interpolate it with Gaussian functions $$\phi_i(x) = e^{-\alpha(x-x_i)^2}\text{,}$$ where $x_i$'s, $x_i\in [a, b]$, and $\alpha > 0$ are given. We are looking for coefficients $w_i$ in the expression $$F_f(x) = \sum_{i = 1}^n w_i \phi_i(x)$$ which would minimize norm $||f - F_f||$. The questions are

  1. Are there any (common) norms (maybe $||\cdot||_\infty$ or $||\cdot||_2^2$) for which there is an algorithm for computing $w_i$'s?
  2. How to choose parameters $x_i$, $1\leq i\leq n$, and $\alpha$ in an optimal way?

In the section Approximation in Wikipedia's article Radial basis function, there are only some brief (partial) answers to upper two questions. Following the link from the section, one get to the article Kernel smoother. After reading it, I doubt that there is a positive answer to the first question, but it seems that equidistant $x_i$'s are a good choice.

Feel free to add an additional tag or two, which would make description more precise. I didn't want to create new ones and I know hardly anything about this topic.

Antoine
  • 3,439

1 Answers1

2

As you point out these a are Gaussian Radial Basis functions, $k(x,y) = e^{-\alpha(x-y)^2}$ and as such they are members of what is called a universal Reproducing Kernel Hilbert Space, in this case over the reals. A universal RKHS, $H$, in this context is a Hilbert space for which given a continuous function $f$ over $\mathbb{R}$ and a compact set $X$, there is a sequence of functions $f_n \in H$ such that $f_n \to f$ in the supremum norm (over that compact set). Hence the Hilbert space is dense inside of the space of continuous functions, and henceforth we assume that any function we wish to approximate is in our Hilbert space, since it can be represented by a "close" candidate.

This is a quick and dirty definition. In fact as a Hilbert space, $H$ comes with a norm of its own, which can be determined by means of the radial basis functions (or kernel functions). Moreover, this norm satisfies $\| f - \hat f\|_\infty < C \|f- \hat f\|_H$ for some $C$, where the supremum norm is taken over the compact set.

Specifically, if the approximation in the Hilbert space norm is made to be very good (less than $\epsilon/C$) then the sup norm approximation becomes very good as well (less than $\epsilon$).

For this reason, the Hilbert space norm becomes a candidate for function approximation. This is desirable, since minimizing a Hilbert space norm is easier than minimizing the sup norm. This is because there are unique weights that can be found using a straightforward algorithm.

For instance, suppose $f \in H$ and we are using the centers $x_1,x_2,...,x_n$, then we wish to minimize the following: $$T(a_1,...,a_n) = \left\| f- \sum_{i=1}^n a_i e^{-\alpha(x-x_i)^2} \right\|^2 = \|f\|^2 - 2 (a_1,...,a_n)(f(a_1),...,f(a_n))^T +$$ $$ (f(a_1),...,f(a_n))K(f(a_1),...,f(a_n))^T$$ where $K=(e^{-\alpha(x_i-x_j)^2})_{i,j=1}^n$ is a positive definite matrix. This can be minimized using a gradient descent algorithm.

Since a RKHS is determined by its kernel function, and the kernel functions are dense in the space, an overzealous choice of points would produce a good approximation. Say by taking a sequence $\{x_i\}$ that is dense inside of the compact set you wish to approximate over would do it. Though, as the points get closer together, the matrix $K$ becomes increasing ill conditioned.

Joel
  • 16,256
  • Why not apply Gram-Schmidt to construct an othogonal basis and then expand in that basis? – Count Iblis Nov 03 '14 at 21:27
  • That is possible to do as well. The nice thing about RKHSs is that you get extra tools for computing the projection, and they are quadratic programming problems. These can be solved quickly using a gradient descent algorithm. – Joel Nov 03 '14 at 21:30
  • In the end, the values you get using the gram-schmidt process are the same as those you get from the gradient descent algorithm. – Joel Nov 03 '14 at 21:34
  • I should add, that unless specific information about the function you are trying to approximate is known, there is not much you can do other than uniformly distributing your centers. Often, a function is only known at particular points (say in signal processing or support vector machines), in this case these points govern your choice of centers. – Joel Nov 04 '14 at 15:21