1

The practical purpose of this is I want to identify users that use my app on a weekly basis. For each user I can generate a sequence of times of their interactions, and from that I can generate a sequence of the length of time between each interaction.

So given this sequence of lengths of time, how can I find sections of consecutive numbers that have an average of 7 days or less?

As an example, if I had the following sequence: [1, 11, 1, 8, 12]

[1, 11, 1, 8, 12] would be a valid stretch of numbers with an average of 7 or less, but [11, 1, 8, 12] would not be valid. [1, 2, 12] would again be valid.

Ideally, my output for every valid section would be the starting position of the first item and the length of the section. So [1, 11, 1, 8, 12] would be described as [1, 5] and [1, 2, 12] would be described as [3, 3].

There is a brute force, computational approach where I take every item in the sequence as a start point, and calculate the averages of every possible length of following numbers up until the end of the sequence. The number of calculations grows quickly though at a rate of n(n+1)/2 (Imagine for each given sequence of length N finding consecutive sequences of length N, N-1, N-2 etc.)

I ask broadly if there's a more elegant approach that doesn't require a quadratically growing number of individual calculations for means.

  • 2
    The complexity of your algorithm is not exponential in $N$. It's quadratic. – Rob Arthan Sep 16 '19 at 15:29
  • Do you have in mind additional conditions on the solutions? If there is any solution, then taking the entire sequence gives a solution. (And sometimes there is no other solution, e.g., for $[8, 5, 8]$.) – Travis Willse Sep 16 '19 at 19:15
  • Are the numbers positive integers? If so, then you don't have to try sequences longer than 7 and the resulting algorithm is $O(N)$. – Rob Arthan Sep 17 '19 at 21:21

1 Answers1

2

I think you have to investigate every contiguous sublist, so there's no way around an order $N^2$ algorithm. If you can bound the length of the sublist then you can do this in order $N$.

You can probably speed up the individual computations by noting that if you have an average $A$ of $k$ numbers you can update the value of $A$ and $k$ when you include or exclude a numerical value without having to look to see how $A$ was calculated originally.

Ethan Bolker
  • 95,224
  • 7
  • 108
  • 199