0

Is there a formula to find out how many times a string of N consecutive numbers occurs in a larger string of N+ numbers. So for example how many ways can a string of n2 numbers occur in a string of n3 numbers. So if i was searching for a string of any 2 numbers in a list of 1000 numbers how many possible strings could I find - i think the answer here is 200.

However what i want is a general formula to find the number of potential strings of n numbers (eg any 4 consecutive numbers) in a much larger string of say 9 numbers (eg 1000,000,000). So for 9 zero filled digits, '1234' can occur more than once, for example in the number 123,412,341 but this would count as 1 result.

So for example there are 10,000 ways you can have a string of 4 digits. How many times can those 10,000 strings be found in a string of 9 digits where only 1 occurrence of the string needs to be found to include that number, so 123,412,341 counts as 1 result.

can anyone help with this question please? Thanks in advance.

1 Answers1

2

You can build a finite state automaton that accepts the strings that contain the substring. If the substring $w$ has length $d$, the machine will have d+1 states corresponding to the suffixes of $w$. The empty suffix is the starting state and the full string an accepting state (and make it absorbing, i.e. no more transitions). Here's an example for the case $w = 1234$:

enter image description here

The thing with the transitions is to for each state and each digit $0-9$ to check the longest postfix of state+digit that is an suffix of $w$ and make a transition to that state with the digit in question. Here the only "going back cases" are transitions to the empty suffix and to $1$. But if you had for example $w=1213$, then from the state $121$ you would have a transition with the digit $2$ to the state $12$.

Now we can count the number of strings of length $n$ that contain the substring $w$ by forming a transition matrix and putting the number of transitions there. For $w=1234$ we get the matrix

$$A = \begin{pmatrix} 9 & 1 & 0 & 0 & 0 \\ 8 & 1 & 1 & 0 & 0 \\ 8 & 1 & 0 & 1 & 0 \\ 8 & 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 10 \\ \end{pmatrix}$$

Now the number of $n$-strings that contain $w$ is gotten as number of walks of length $n$ from the starting state to the accepting state and is given by the top-right element of $A^n$. For $n=9$ we get $599970$.

ploosu2
  • 8,707
  • Thanks but just to clarify if I haven't misunderstood your answer, search string length for a given scenario is always the same, so searching for 4 digits means 4 digits are always required. – George Piumatti Sep 06 '21 at 05:54
  • @GeorgePiumatti It works for any length search string. You'll just have to build the machine (or just the matrix) for a given search string $w$ and it will have $|w|+1$ states (matrix will have this size). Notice, how this doesn't depend on the length of the big string and with fast matrix exponentiation the calculation will be efficient. – ploosu2 Sep 06 '21 at 06:04
  • Per the OP's edit, cases where the substring occurs more than once are only counted once, as if the substring occurred once. So the matching of a search substring becomes an absorbing state in the Markov chain. – hardmath Sep 06 '21 at 13:51