2

I'm a hobby coder and I have a pissload of data coordinates that I know. About 4000. I was wondering if instead of storing the datapoints, maybe I could store the equation that would represent that data. Problem is, I'm not very good at math, and I was wondering if all (non rule violating) nonlinear graphs had equations, and where I could read up on how to find them.

  • 2
    https://en.wikipedia.org/wiki/Curve_fitting – Giuseppe Negro Jun 25 '18 at 22:56
  • 3
    Well yes, any finite number of data points can be fitted by a polynomial. But if you have about 4000 data points, you're going to need a 3999-degree polynomial, so unless the data points coincidentally behave like a very simple polynomial, it would be more efficient just to store them as data points. – Franklin Pezzuti Dyer Jun 25 '18 at 22:56
  • ...and (+1) for the use of the word "pissload." – Franklin Pezzuti Dyer Jun 25 '18 at 22:57
  • This question on meta (and several of its answers) explain how a continuous polynomial can interpolate between data points. However, if you don't want to lose information, the data itself and the polynomial are (generally speaking) going to require the same amount of storage. Thus there is no savings in storing a function or graph over the raw data (generally speaking). EDIT: What @Frpzzd said. – Xander Henderson Jun 25 '18 at 22:57
  • There is also the additional issue of whether or not the data points come from a function (remember: functions are many-to-one or one-to-one, but never one-to-many). If the "outputs" are not uniquely determined by the "inputs", then you have the additional task of fitting a more general algebraic curve to the data. – Xander Henderson Jun 25 '18 at 23:00
  • I've also removed the [tag:graph-theory] and [tag:education] tags; graph theory is a branch of mathematics about graphs, which is distinct from the graphs of functions, and the education tag is primarily meant for questions about teaching mathematics and the process of learning mathematics. – Xander Henderson Jun 25 '18 at 23:04

1 Answers1

2

Assuming that your data is a set of (x,y) coordinates in the plane.

In terms of the intuitions you'd have as a code cutter, it's as if you're trying to take a black & white bitmap and convert it up into a vector graphic. A vector graphic is (effectively) a set of equations that can tell whether any given pixel needs to be turned on or off. Hence it is easy to go from a vector graphic to a bitmap.

Your situation may actually somewhat worse, as bitmaps are typically set up as pixels in a regular grid.

Equally, your question can be thought of in terms of data compression. You're trying to take 4,000 points and compress them into smaller storage. To do this, you're seeking to exploit redundancy in the data -- namely that the points are merely the manifestation of a small number of equations, plotted at selected x.

Hence the questions to ask about the data are:

  • What do I know about the equations that could be used? For example, are there reasons to believe that the data originated from line segments, or sections of circle? That is, is there an underlaying model to the data?

  • In going from data to representatation-as-equations, what lossiness is acceptable?

This kind of thinking will likely take you into the literature of feature extraction. An example of feature extraction in two dimensions is the Hough Transform, for inferring that data arises from lines.

PtH
  • 1,040