1

This is mostly me reasking this question because I believe I have an alternative approach to a similar idea, whereas this seems to be adding some kind of discretization instead.

Let $(X_i)_{i=0,\cdots,n}$ be i.i.d normal standard normal variables. We define the linear interpolation $L_n((X_i)_{i=0,\cdots,n},Y)$ to be:

$$ L_n((X_i)_{i=0,\cdots,n},Y) = \begin{cases} X_1 + (X_2 - X_1)Y & 0 \leq Y \leq 1\\ X_2 + (X_3 - X_2)(Y - 1) & 1 < Y \leq 2 \\ \vdots \\ X_{n-1} + (X_n - X_{n-1})(Y - n - 1) & n-1 < Y \leq n \end{cases} $$

Giving us the intuitive plot for a given sample of $(X_i)_{i=0,\cdots,n}$:

enter image description here

Reasking the original question with this new approach:

  1. Assuming that $Y\sim \mathcal{U}(0,n)$ what is the distribution of $L_n((X_i)_{i=0,\cdots,n},Y)$?
  2. In the limit at $n\to\infty$ what is the limit distribution $L_\infty((X_i)_{i=0,\cdots,n},Y)$?
  • If you write $L_n(y)$ like that, it means that $x_1, ...,x_n$ are not variables of $L$ and are known (in other words, not random). I suggest to write $L(Y, (X_i) _{i=1,...,n})$ to indicate $L$ is a function of $Y$ and $X_i$ for $i=1,..,n$. $Y$ and $X$ are written in capital letter to indicate that they are random variables. – NN2 Oct 03 '23 at 13:03
  • That's a fair comment, edited! – Thomas Pluck Oct 03 '23 at 13:20

1 Answers1

2

Focusing on a single segment we have the product between a normal and uniform distribution given by:

$$X_i+(X_{i+1}-X_i)Y$$

So we're looking at the product between $A\sim\mathcal{N}(0,2)$ and $B\sim U(0,1)$ plus $C\sim\mathcal{N}(0,1)$ - this isn't a "named" distribution - we'll start by focusing on $AB$ which we know is product of a uniform and a normal distribution - so is given by the PDF:

$$ PDF_{AB}(z) = \frac{\Gamma(0, \frac{z^2}{4})}{4 \sqrt{\pi}} $$

Where Gamma in this context is the upper incomplete gamma function, it looks like this:

enter image description here

This is not a normal distribution, and computing, $AB+C$ requires a convolution that I'm not sure how to do: $$ \text{PDF}_{AB+C}(z) = \frac{\Gamma(0, \frac{z^2}{4})}{4 \sqrt{\pi}} * \dfrac{1}{\sqrt{2\pi}}e^\frac{-z^2}{2} $$

However, using Monte Carlo simulations, we get the following distribution which isn't normal:

enter image description here

It fails the Shapiro-Wilks, Anderson-Darling and Kolmogorov-Smirnov normality tests and has an excess kurtosis of $0.369$, so it is very slightly leptokurtic or fat-tailed.

Extending this to arbitrary/intervals, we must always select from one of these intervals that we've already defined, so we will always sample from this distribution, even in the limit.