1

If I have a set of data points ($y_i$, $x_i$) and measurement uncertainties ($\Delta y_i$,$\Delta x_i$) giving error bars for each point, and I want to estimate the integral $$ F = \int_a^b \text{d}x \, y(x) $$ using a simple numerical integration method on the data (such as the trapezoid rule) what should the quoted uncertainty in the final estimate of $F$ be?

Assuming that the noise on each point is Gaussian, I can imagine doing something like adding random shifts to each point ($y_i$, $x_i$) $\rightarrow$ ($y_i + Y_i$, $x_i + X_i$) where $Y_i \sim \mathcal{N}(0,\Delta y_i^2)$ and $X_i \sim \mathcal{N}(0,\Delta x_i^2)$ and numerically estimating $F$ many times for many different draws from the distributions $Y_i$, $X_i$. Then quoting the error as the standard deviation of the different $F$ that I obtain.

It seems to me that this would work, but is it overkill? Are there more straightforward error propagation arguments that relate $\Delta F$ to the ($\Delta y_i$,$\Delta x_i$)?

My question is very similar to this one but I am interested in the error on the estimate of $F$, rather than what the best estimator of $F$ is.

RGWinston
  • 197
  • If you use trapezoid rule you have, for each couple of points, a contribution $2I_i=(x_{I+1}-x_i)(y_{I+1}-y_i)$. Let's say that $x$ and $y$ are normal variables, their sums and differences are still normal so $2I_i \sim \mathcal{N}(x_{i+1}-x_i,\Delta_{x_i}^2+$$ \Delta_{x_{i+1}}^2) \mathcal{N}(y_{I+1}-y_i, \Delta_{y_i}^2+ \Delta_{y_{i+1}}^2)$. . Now you need the mean and variance of $I_i$ that can be evaluated as it is the product of two gaussians, look here: https://ccrma.stanford.edu/~jos/sasp/Product_Two_Gaussian_PDFs.html – N74 Jan 12 '18 at 22:01
  • so then, if I'm following you correctly, the estimate of the integral would be $F = \sum_i I_i$ and so $F \sim \mathcal{N}( \sum_i \mu_i, \sum_i \sigma_i^2)$ where the $\mu_i$ and $\sigma_i^2$ are evaluated using the product of two Gaussians formulae – RGWinston Jan 13 '18 at 13:09
  • to be clear, the expression should be $2 I_i = (x_{i+1}-x_i)(y_{i+1}+y_i)$ rather than a minus on the $y$ right? – RGWinston Jan 13 '18 at 13:10
  • You are right about the minus, but the product of two gaussians is not a gaussian, so $F$ is not normal in your former comment. Anyway you can esimate the variance of the sum as the variances. – N74 Jan 13 '18 at 13:23
  • ah, okay I see, so $I_i$ is not normally distributed. What is the argument that you can estimate the variance of $F$ as the sum of the variances of the $I_i$? Is it a central limit type thing? – RGWinston Jan 13 '18 at 13:36
  • It is just the law of total variance. We have to neglect covariances (ie suppose that the $I_i$ are uncorrelated) but, as the variables are independent, this should not be a problem. – N74 Jan 14 '18 at 09:45
  • okay, thank you. (Excuse it if this is a dumb question) Is there not covariance between the $I_i$ because consecutive $I_{i+1}$ and $I_i$ contain one of the same ($x$,$y$)? – RGWinston Jan 14 '18 at 13:39
  • Your is a good point, indeed we should demonstrate that the $I_i$s are independent, maybe you can run a simulation with your algorithm (sample many normals and evaluate mean and variance) and check if the variances differ from the ones estimated with the supposed uncorrelation. – N74 Jan 14 '18 at 18:03
  • I've had a go at this test a bit and couldn't get it to come out well. It could just be code bugs, but it made me think -- is your assertion that $2 I_i \sim \mathcal{N}(x_{i+1} - x_{i}, \Delta x_{i+1}^2 + \Delta x_{i}^2 ) \mathcal{N}(y_{i+1} + y_{i}, \Delta y_{i+1}^2 + \Delta y_{i}^2 )$ and is the 1d product of two Gaussians correct? Looking at the ref you provided it seems that's the case for two Gaussians that are functions of the same variable $x$, but aren't these two Gaussians functions of different $x$ and $y$. – RGWinston Jan 21 '18 at 18:27
  • There is no 'variable' when dealing with probability distributions: the $t$ in the equations is just a 'placeholder' needed to express the value of a sample, but you normally need to integrate over it to get the sample mean or variance; the $x_i$s, $y_i$s and so on are, in this problem, all constants. So I see no issue in using that equation. – N74 Jan 22 '18 at 07:11

0 Answers0