How to work out the formula that connects several numbers

Question

I have an interesting problem. Say I have lots of datasets like this:

a = 21
b = 23
c = 58
d = 498
etc (lots of other values)

X = 85

I need to find the formula that derives X from a, b, c, d etc, with the added complication that I don't know whether all of the values affect X or whether some have no effect on it. Is there a generic method to do that?

I do not have the ability to vary a, b, c and d and check the derived value of X; however, I have a huge amount of these datasets (combinations of values and the resulting X) to look at. I have some programming skills, so I am able to analyse all of these datasets using an algorithm, but I have literally no idea what that algorithm should be. Any help would be appreciated.

Note: I am new to this site, and don't know which tags to use, so feel free to retag this.

EDIT: Each dataset contains the same amount of numbers, and the positions are fixed, i.e. 'a' of one dataset corresponds to the 'a' in others.

In general, for a finite sequence of numbers there is no way to tell which one should be 'next', i.e. to tell what $X$ should be. Is there any additional structure to how $X$ relates to $a$, $b$, $c$, $d$, etc? — Servaes, Jun 01 '14 at 12:36
I have a general idea for which of a, b, c, d et al are related to X, but I'm not sure. But surely, with the huge volume of data that I have, I should be able to find a relationship? — Bluefire, Jun 01 '14 at 12:44
Entering $1+1$ into your calculator and pressing enter, it will respond $2$ a million times over. But there's no way to be sure (mathematically) that it will always do so unless you know something about the inner workings of your calculator. You will need to know (or assume) something about how the output relates to the input if you want to find a relationship mathematically. — Servaes, Jun 01 '14 at 12:51
I'm not quite sure what you mean. I've probably misunderstood you, but I can assume that the calculator I have here is consistent, that is, if a, b, c, d etc are the same, then X will always be the same. — Bluefire, Jun 01 '14 at 12:52
I admit I was a bit vague. I am indeed assuming that your process is consistent.
What you are asking for is an algorithm that, given an arbitrary sequence of numbers, outputs the next number in the sequence. But there is no way to determine what the next number should be. In fact, a sequence is defined by giving all of its terms, so any number could be next. — Servaes, Jun 01 '14 at 12:56
Unless you have some restrictions, i.e. some relations, which your sequence should satisfy. — Servaes, Jun 01 '14 at 12:56
Relations... like what? Maximum values? The maximum value of any parameter (a, b, c, d, etc or X) is 99. Anything else? — Bluefire, Jun 01 '14 at 12:58
This certainly narrows things down, but it is not sufficient to determine $X$ from this. What would be sufficient precisely is a difficult question. An example of a relation would be: "If $a$ is doubled, then so is $X$", or "$X$ is less than the sum of all the inputs". Do you have any relation like this between input and output? — Servaes, Jun 01 '14 at 13:01
Right, I think I understand now. I have an assumption that I'm not sure is true, but I guess I will have to stick with it. The assumption is that X is a weighted average of all the other data, so a might have half the weight of b, twice that of c, and d might have no weight at all. — Bluefire, Jun 01 '14 at 13:04
Then it remains to determine the weights of each of the variables. For this you need at least as many data sets as you have variables. However if you have more, then there might be no solution (meaning your assumption might be false). — Servaes, Jun 01 '14 at 13:05

score 2 · Accepted Answer · answered Jun 01 '14 at 13:27

2

If you think there is a linear relationship between the $a, b, c$, etc., and $x$, then you could find the least-squares solution to the system of equations $\mathbf {Ay = X}$. The matrix $\mathbf A$ will consist of rows of the form $[a_i\ b_i\ c_i \ldots]$, and $\mathbf X$ is a column vector containing the values $x_i$. The vector $\mathbf y$ corresponds to the weights in your weighted average.

The system $\mathbf {Ay = X}$ does not necessarily have a solution, but you can find the "best fit" by multiplying both sides by $\mathbf A^t$ and solving the resulting system; i.e., $\mathbf {A}^t\mathbf{Ay} = \mathbf{A}^t\mathbf{X}$.

Thus the best-fit solution for your weights is $\mathbf{\hat y} = (\mathbf{A}^t\mathbf{A})^{-1}\mathbf{A}^t\mathbf{X}$.

answered Jun 01 '14 at 13:27

Théophile

24,627

What do you mean by $\mathbf A^t$? – Bluefire Jun 01 '14 at 13:56
I mean the transpose of the matrix $\mathbf A$. – Théophile Jun 01 '14 at 14:00
I'm not too experienced with matrices and linear algebra, so I don't know what that is D: Is software like Matlab able to process this for me? – Bluefire Jun 01 '14 at 14:06
1

Yes, Matlab would certainly be able to do this. In fact, it is such a common problem that, from what I see, the syntax in Matlab is as simple as "$\mathtt{y = A\backslash X}$". Here's a link to the Matlab documentation. Note that the variables on that page are slightly different from here; they're solving the system $\mathbf{Ax = B}$, so you'll have to relabel accordingly. – Théophile Jun 01 '14 at 14:36

How to work out the formula that connects several numbers

1 Answers1