Numerical integration of a data set with uncertainties

Question

I have a 1D data set {xi, yi} with no uncertainties in xi and with uncertainties dyi in yi. The resulting discrete function is monotonic and relatively smooth and I would like to integrate the function.

If there were no uncertainties, I would interpolate the data (either interpolation order 1 or 2) and then numerically integrate the interpolated function. But what do I do when there are uncertainties? My guess is that this is a well-studied problem but I can locate any references. Do you know of any references?

Here are my two thoughts on how I could proceed.

Quick technique. Look at two other data sets {xi, yi + dyi} and {xi, yi - dyi} which bound the original set. Interpolate and integrate as before. This would give an upper and lower bound to the integral.
More involved technique. For each value of xi, draw a value of yi' from a normal distribution with mean yi and standard deviation dyi. Then interpolate and integrate. Do this a large number of times and find the average value and its standard deviation.

Hope this makes sense. Comments?

Andre · Answer 1 · 2015-04-25T01:44:33.900

For two data points $(x_1, y_1)$ and $(x_2, y_2)$, with uncertainties $dy_1, dy_2$ respectively, the integral of the linear interpolant is: $$(x_2-x_1)*\tfrac12(y_1+y_2)$$ If you replace the y values with normal distributed random random variables instead, $Y_1$~$N(y_1, dy_1), Y_2$~$N(y_2, dy_2)$, you get the same formula with substituted variables. $$(x_2-x_1)*\tfrac12(Y_1+Y_2)$$ The expected value for a high number of integrals thus computed is $$\mathbb E((x_2-x_1)*\tfrac12(Y_1+Y_2)) = (x_2-x_1)*\tfrac12(\mathbb E(Y_1)+\mathbb(Y_2)) = (x_2-x_1)*\tfrac12(y_1+y_2),$$ same as the original integral without uncertainties. Thus, if I'm not mistaken, your method number 2 should in the limit yield the same results as if you ignored the uncertainties.

If you aim to maximize the smoothness of the interpolant, you could try defining and energy-minimizing spline subject to the condition that $f(x_i) \in [y_i-dy_i, y_i+dy_i]$.

Is there not a $1/2$ missing in your formulas ? – Yann Apr 24 '15 at 18:13 — Yann, Apr 24 '15 at 18:13
@Yann: Yes, of course. Edited, thanks, – Andre Apr 25 '15 at 01:45 — Andre, Apr 25 '15 at 01:45

score 0 · Answer 2 · answered Apr 20 '15 at 22:07

By the central limit theorem, your second approach is much more sensible if you want to get a sense for the typical deviation away from the expected value of your integral, as long as your interpolation method is sensible and your function is well-behaved enough. Your first approach will probably grossly under and over approximate the typical value of integral unless your number of data points is small, because it is very unlikely that you will noticeably under-estimate every value of your function or noticeably over-estimate every value of your function.

score 0 · Answer 3 · answered Apr 20 '15 at 22:21

Let's say you were to use some $n$-th order interpolation method to integrate the contral values (ignoring the uncertainties), and you are wondering whether to use a higher order interpolation scheme instead. Then unless the uncertainties are very small (on a scale of the $n$-th order interpolation errors, the higher order interpolation scheme cannot help (and indeed due to arithmetic accuracy issues -- round off -- will actually do worse than the lower order scheme.

In the practical world, this usually means that simple trapezoidal interpolation (linear interpolation between each pair of points) is as far as you can practically go. If your uncertainties are really small, the quadratic or cubic interpolation might be an improvement, but this is not likely in the real world.

And now you want (or should want) your uncertainty on the integral. Here, assuming the individual point uncertainties on the $y_i$ are not correlated with one another, you can fix as zero all the uncertainties but one and easily find the contribution of that one uncertainty on the resulting integral. Note that since you want uncertainties and not bounds (and the individual point uncertainties are typically normal variates, which themselves are not bounded) you should add the (non-correlated) uncertainties in quadrature.

One final major subtlety about decorellated uncertainties: there is a real danger that $dy_i$ is correlated with $dy_j$ when $|x_i - x_j| < \delta$ for some correlation length $\delta$. This will occur whenever your "noise" is process noise rather than measurement noise. In such a case, you cannot simply add uncertainty contributions to the integral in quadrature; you need to multiply by a factor which is in the simplest cases the square root of the typical number of points $x_i$ lying within an interval $\delta$.

Numerical integration of a data set with uncertainties

3 Answers3

Linked