4

I'm looking for a way to derive the equation of a curve that will fit a set of points.

I have a set of points like this:
{130,20},
{150,30},
{160,40},
{165,50},
{175,60},
{185,70},
{192,80},
{200,90},
{205,100},
{210,120},
{215,140}

How can I programmatically derive a formula for this curve? I'm writing software that will have different data for the X values but all data will follow the general shape of the curve.

For more background information, I'm estimating Training Stress Score (Y values) from heart rate (x values). More information on that here: https://www.trainingpeaks.com/blog/estimating-training-stress-score-tss/.

So given a heart rate of 170, it would return a value of 50. However, this should be able to be made more accurate with a curve since a heart rate of 174 should have a TSS value of closer to 60 than 50.

Since I'll be allowing a user to define their own heart rate values (depending on their own personal heart rate zones), the method of finding the curve will need to be derived from those values.

  • 1
    Several approaches are possible. More context will allow Readers to help you find a suitable approach. I notice that the $y$-coordinates increase with increasing $x$-coordinates. While a polynomial function could be made to fit these eleven points exactly, it will often happen that the lowest degree such polynomial will not be monotone increasing in spite of the fact that the data are. So information about what use you will make of the "curve" would be helpful. – hardmath Oct 09 '17 at 16:15
  • I added some more detail to the original post. Thanks! – Darin Alleman Oct 09 '17 at 16:22
  • This is fairly situation dependent, as was mentioned. Some terms you could search for generically are splines, Fourier series, and polynomial interpolation. With more context and discussion, we may be able to recommend a specific choice of method. – Zach Boyd Oct 09 '17 at 16:24
  • I think that polynomial interpolation may be an adequate solution. However, how can I determine what the degree of the polynomial is, given the data set? From what I understand, the degree of the polynomial needs to be a given. – Darin Alleman Oct 09 '17 at 16:34

2 Answers2

2

A simple approach which preserves the monotonicity (strict increasing property) of the data is linear interpolation.

In the case you mentioned for a heart rate of $170$, this occurs at the midpoint between the points $(165,50)$ and $(175,60)$. Linear interpolation thus returns the midpoint value of $55$ for heart rate $170$, and for heart rate $174$ we get:

$$ 50 + (174 - 165)\frac{60-50}{175-165} = 59 $$

In general one locates the two data points $(x_0,y_0)$ and $(x_1,y_1)$ which bracket an input value $X$, i.e. $x_0 \lt x \lt x_1$ in ascending order, and we calculate:

$$ y = y_0 + (x - x_0) \frac{y_1 - y_0}{x_1 - x_0} $$

hardmath
  • 37,015
  • This seems by far and away the most simple solution. It won't be as accurate as a polynomial curve, but even a curve will still be just an estimation of the training stress of increasing heart rate so perfect values aren't necessary. It's definitely more accurate what I had been doing! – Darin Alleman Oct 09 '17 at 16:52
  • 1
    The most important limitation of any interpolation method is that it doesn't give "predicted" values outside the range of basic datapoints being used. In other words, it won't assign a score for heart rates below $130$ or above $215$ using your data. But arguably such an extrapolation of scores is not justified by the data and thus is acceptable as a limitation. – hardmath Oct 09 '17 at 17:15
1

I don't know much about the technical details for this kind of thing, but here is the way I heuristically think of polynomial approximation:

When I say point, I mean "generically," as in assuming that the points are distinct, and no $k+2$ live on a $k$-dimensional affine space (point, line, plane, etc.)

$2$ points determine a line.

$3$ points determine a quadratic

$11$ points determine a degree $10$ polynomial.

hence, we can make the following computation:

assuming that the curve is $p(x)=a_0+\cdots a_{10} x^{10}$, we can solve for the variables by substituting points $(x,y)=(a,b)$ and solving what is now a linear system of equations.

{130,20}, {150,30}, {160,40}, {165,50}, {175,60}, {185,70}, {192,80}, {200,90}, {205,100}, {210,120}, {215,140}

\begin{align*}20 &=a_0+\cdots+a_{10}\cdot130^{10}\\ 30 &=a_0+\cdots+a_{10}\cdot150^{10}\\ 40 &=a_0+\cdots+a_{10}\cdot160^{10}\\ 50 &=a_0+\cdots+a_{10}\cdot165^{10}\\ 60 &=a_0+\cdots+a_{10}\cdot175^{10}\\ 70 &=a_0+\cdots+a_{10}\cdot185^{10}\\ 80 &=a_0+\cdots+a_{10}\cdot192^{10}\\ 90 &=a_0+\cdots+a_{10}\cdot200^{10}\\ 100 &=a_0+\cdots+a_{10}\cdot205^{10}\\ 120 &=a_0+\cdots+a_{10}\cdot210^{10}\\ 140 &=a_0+\cdots+a_{10}\cdot215^{10}\\ \end{align*}

But solving this correctly will amount to row reducing matrices, since this is a $10 \times 10$ linear system of equations. I don't have time to punch this into an online calculator, but I think this should give you a "reasonable" approximation.

Andres Mejia
  • 20,977
  • Just to show the danger of polynomial fit, I did what you suggested. FO sure, the curve matches exactly all the data points. The problem is that, in the interval, the generated curve goes through a maximum for $x=135.04$ ($y\approx 153.92$), through a minimum for $x=152.36$ ($y\approx 27.71$) – Claude Leibovici Oct 10 '17 at 08:25