Finding the top sums of values from multiple lists

Question

There are multiple lists given. The number of lists is arbitrary. Each list contains numbers and is sorted descendingly. We shall take exactly $1$ element from each list and calculate the sum of elements. How would you go about finding the top $N$ sums?

Example:

L1 = [5,3,3,1]
L2 = [8,7,0,0]
L3 = [3,1,1,1]
Top 1: 5+8+3=16
Top 2: 5+7+3=15
etc.

The naive solution is really slow (finding all possible sums and then sorting them).

Similar problem here: Subset sum with multiple lists

in which context does this problem arise? – miracle173 May 01 '21 at 09:21 — miracle173, May 01 '21 at 09:21

miracle173 · Answer 1 · 2021-05-06T14:50:08.050

Given the three lists

L1 = [5,3,3,1]
L2 = [8,7,0,0]
L3 = [3,1,1,1]

we want to find the seventh highest summand that complies to your rules. I will show the way to do this. The design of an appropriate algorithm with appropriate data structures is up to you.

We create a table where each row represents a summand constructed from the list element in the predescribed way. We start with

0 0 0  5 8 3  16

This is the highest possible summand, because it uses the highest values of each list. The first three numbers, 0 0 0, are the indexes of the summands in their list, the second three numbers are the values of these summands and the seventh position is the sum. The igths positions is a mark that is set if a summand was already extended. At the moment the summand was not extende so the eights position is empty.

Now we extend the line by adding the possible candiated for the second highest values, these are the following summands

0 0 0  5 8 3  16 x
1 0 0
0 1 0
0 0 1

Now we complete and reorder the lines by the size of the sum. When we actually implement the algorithm we do not reorder the rows but we use an appropriate data structure to hold the rows.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15  
1 0 0  3 8 3  14
0 0 1  5 8 1  14

Now we extend the unmarked row with the highest sum and mark it.

0 0 0  5 8 3  16 01 x
0 1 0  5 7 3  15 02 x
1 0 0  3 8 3  14 03
0 0 1  5 8 1  14 04
1 1 0  
0 2 0  
0 1 1

We complete the rows and reorder it

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14
0 0 1  5 8 1  14
1 1 0  3 7 3  13
0 1 1  5 7 1  13 
0 2 0  5 0 3   8

We extend the unextended rows with the highest sum. So we extend two rows, because both have the same highest sum 14.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
1 1 0  3 7 3  13
0 1 1  5 7 1  13 
0 2 0  5 0 3   8
2 0 0
1 1 0 !
0 1 1 !
1 0 1
0 1 1 !
0 0 2

We remove summands that were already generated. They are marked by a ! and complete the rows.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14
0 0 2  5 8 1  14
1 1 0  3 7 3  13
0 1 1  5 7 1  13 
1 0 1  3 8 1  12
0 2 0  5 0 3   8

Now remove all but the first seven rows.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14
0 0 2  5 8 1  14
1 1 0  3 7 3  13

And now we extend the unextended rows with the highest sum. These are again two rows with sum 14.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14 x
0 0 2  5 8 1  14 x 
1 1 0  3 7 3  13
3 0 0
2 1 0
2 0 1
1 0 2
0 1 2
0 0 3

We complete the rows

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14 x
0 0 2  5 8 1  14 x 
0 0 3  5 8 1  14 
1 1 0  3 7 3  13
2 1 0  3 7 3  13 
0 1 2  5 7 1  13 
3 0 0  1 8 3  12 
2 0 1  3 8 1  12 
1 0 2  3 8 1  12

and remove all but the first seven

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14 x
0 0 2  5 8 1  14 x 
0 0 3  5 8 1  14

In the next step we will mark the last line and stop, because the table will not change. The seventh highest sum is 14 and there are different triples of summands that sum up to 14.

# a piece of pseudocode that describes the algorithm
a tuple is an object that contains all information needed:
the index, the position in a heap if it is member of a heap
one can get the sum associated to this tuple by tuple.sum()
the method "successors" generates all k immediate successors of a tuple
for the tuple with index (i1,i2,i3,...) these are the tuples with index
(i1+1, i2, i3, ...)
(i1, i2+1, i3, ...)
(i1, i2, i3+1, ...)
...
object largest_elements: a list of tuples
object elements_kept: a set to store tuples
object maxheap: a heap where pop retrieves and removes 
    the tuple with largest tuple.sum of the heap, 
    remove deletes an element of the heap,
object minheap: a heap where pop retrieves and removes 
    the tuple with smallest tuple.sum of the heap, 
    remove deletes an element of the heap 
N: we want to find the N_th largest tuple
initialisation
tuple=Tuple(list of list of summands)
this creates the first tuple with index (0,0,0)
maxheap.insert(tuple)
minheap.insert(tuple)
elements_kept.add(tuple)
loop
while not maxheap.is_empty and largest_elements.count<N:
  tuple=maxheap.pop() # pop the tuple with largest tuple.sum() 
  minheap.remove(tuple) # pop the tuple with smallest tuple.sum() 
  largest_elements.append(tuple)
  heaplength=heaplength-1
loop invariants:
number of tuples of largest_elements +heaplength == N
tupels of maxheap == tuples of minheap
number of tuples of maxheap <= heaplength
tuples of elements_kept = tuples of  maxheap union tuples of largest_elements
the intersection of tuples of maxheap and tuples of largest_elements is empty
if the set of successors contain a new element then remember this.
for t in tuple.successors():
    # same loop invariants as above 
    if t not in elements_kept:
      elements_kept.add(t)
      maxheap.insert(t)
      minheap.insert(t)
      # If we now remember more than heaplength elements forget the smallest
      while elements_kept.count()>heaplength:
        m=minheap.pop()
        maxheap.remove(m)
        elements_kept.remove(m)
if we have finished the loop either we have found the largest N tuples
or we cannot generate N tuples from the given lists
if largest_elements.count()==N:
  print("N-th largest element is", largest_elements.last())
else:
  print ("we cannot generate N elements from the given lists")

Here is a dynamic programming approach

N=10  # the Nth sum should be found
listOfListOfNumbers = [[5,3,3,1], [8,7,0,0], [3,1,1,1]]
new_sums[0] = ()
for listOfNumbers in listOfListOfNumbers:
  item_index=-1
  sums=new_sums  # copies new_sums to sums
  new_sums.empty() # creates an empty object
  for number in listOfNumbers:
    item_index=item_index+1
    for (sum, index) in "all key value pairs of sums":
      new_sums[sum+number]=index.append(item_index) 
        # append appends another coordinate to the index
        # e.g. (2,1,0).append(4) is (2,1,0,4)
for (sum, index) in "all key value pairs of new_sums ordered by sum":

   cnt=cnt+1
   if N==cnt:
      return (sum, index)
I nothing was retunred we did not find N different sums.

Interesting, yes you are right you don't need O(nk) space, anyway I have posted a similar question but I think harder on CodeReview, if you are interested :) — Tortar, May 04 '21 at 23:37

Tortar · Accepted Answer · 2021-05-03T13:19:43.990

First of all, the important thing is that you need to consider at each iteration new cases where just one step to the right in one of the lists is made, e.g. starting from the solution $5+8+3$ you obtain $3+8+3\ ,\ 5+7+3\ ,\ 5+8+1$ . Never more than $1$ step and never in more than $1$ list otherwise you will get a lower sum.

So the problem is to deal with the "rollbacks" (steps to the left) in all the list except the one you go one step to the right. The thing so is to create an algorithm where you always compare all the possible rollbacks at each new choice of the current top sum.

You can do this incrementing by one each of the indices of the current top sum and adding this possible new top sums to the comparison.

I clarify with your example :

L1 = [5,3,3,1]
L2 = [8,7,0,0]
L3 = [3,1,1,1]
Start with indices 0,0,0 and best is 5+8+3 and so do not use this in the comparison, 
use just the 3 incremented sums
                                  (5+8+3,[0,0,0])

                              /          |          \
               (3+8+3,[1,0,0])    (5+7+3,[0,1,0])    (5+8+1,[0,0,1])


The current best now is 5+7+3 with indices 0,1,0 and do not consider that in the 
next comparison, just use the other 5 sums
                (3+8+3,[1,0,0])      (5+7+3,[0,1,0])     (5+8+1,[0,0,1])

                                  /         |        \
                   (5+0+3,[0,2,0])   (3+7+3,[1,1,0])  (5+0+3,[0,1,1])  



The best is now 3+8+3 with indices 1,0,0 so we expand it ([0,0,1] has the same sum 
so at the next iteration this sum will be taken)

As you see all the sums which compete to be the current best sum are taken into account at every new choice.

You can use a heap queue to implement the idea, remember to use a set where you add the new indices to compare so that to avoid duplicates in the queue. With $k$ lists, the time complexity is $O(Nk\cdot\max(\log k,\log N))$.

This is a Python implementation :

import heapq
def top_solutions(N,lists):
N_best = []
k, len_k_lists = len(lists), [len(x) for x in lists]
init_sol = (-sum(x[0] for x in lists),tuple(0 for x in range(k)))
heap_list = [init_sol]
seen = {init_sol[1]}

for _ in range(N) :

    curr_best = heapq.heappop(heap_list)
    N_best.append(curr_best)

    ind = curr_best[1]

    for x in range(k) :

        if len_k_lists[x] &gt; ind[x]+1 : r = tuple(c if i != x else c+1 for i,c in enumerate(ind))
        else : continue

        if r not in seen :
            curr_sum = curr_best[0]-lists[x][r[x]]+lists[x][r[x]-1]
            curr_value = (curr_sum,r)
            heapq.heappush(heap_list,curr_value)
            seen.add(r)                

return [(-x[0],x[1]) for x in N_best]


N = 5
lists = [ [5,3,3,1],
          [8,7,0,0],
          [3,1,1,1] ]
print(top_solutions(N,lists))
output : [[16, [0, 0, 0]], [15, [0, 1, 0]], [14, [0, 0, 1]], [14, [0, 0, 2]], [14, [0, 0, 3]]]

Anyway you will obtain duplicate top sums, but with a different set of indices. If you are interested in just the values of the sums (e.g. you don't want two top sums equal to $14$ in the example ) then, first of all, you can eliminate the duplicate numbers in each list, and then check if you have $N$ different sums after $N$ iterations, if not you continue to run the procedure until you obtain them, I think there are also optimization to speed it up considerably. In this case, I think, the complexity is difficult to assess (anyway at worst roughly equal to the naive implementation, and for most inputs of $N$ less than it).

thanks for the observation, I'm using a new approach to solve the problem now, let me know your thoughts about this :) — Tortar, Apr 30 '21 at 19:50
You have made a large number of edits to this answer since posting it. This is not really a good thing to do---it repeatedly bumps the question to the top of the front page of questions. Please decide what it is that you want to say, post that answer, and leave it alone as much as possible. — Xander Henderson, May 01 '21 at 19:19
@XanderHenderson it is not the question that was repeatedly edited but an answer. — miracle173, May 01 '21 at 22:53
@Xander Henderson, yes I think they are a lot but anyway I just tried to provide the best possible answer because I found the question interesting — Tortar, May 01 '21 at 23:28
@Tortar I think your algorithm works now. But I think that the storage need O(Nk), so if n=k this O(N^2). But you only have to remember O(N) items. But I think that is not possible with Python heaps because you can only remove the largest element of a Python heap. — miracle173, May 03 '21 at 10:59
@Tortar Could you explain why the sums need to be negated? Thank you. — Akkua, May 03 '21 at 13:14
@Acqua that is needed because the functions of the library heapq are built so that to return the smallest element in the heap — Tortar, May 03 '21 at 13:16
@miracle173 If you are referring to the size of the heap, I'm not sure how to reduce it to O(N), can you explain how do you think it can be done even considering to change the heap in something else? — Tortar, May 03 '21 at 13:31
@Tortar I added a piece of pseudocode to my post that demonstrates how to do it — miracle173, May 04 '21 at 22:09

Finding the top sums of values from multiple lists

2 Answers2

a tuple is an object that contains all information needed:

the index, the position in a heap if it is member of a heap

one can get the sum associated to this tuple by tuple.sum()

the method "successors" generates all k immediate successors of a tuple

for the tuple with index (i1,i2,i3,...) these are the tuples with index

(i1+1, i2, i3, ...)

(i1, i2+1, i3, ...)

(i1, i2, i3+1, ...)

...

object largest_elements: a list of tuples

initialisation

this creates the first tuple with index (0,0,0)

loop

loop invariants:

number of tuples of largest_elements +heaplength == N

tupels of maxheap == tuples of minheap

number of tuples of maxheap <= heaplength

tuples of elements_kept = tuples of maxheap union tuples of largest_elements

the intersection of tuples of maxheap and tuples of largest_elements is empty

if the set of successors contain a new element then remember this.

if we have finished the loop either we have found the largest N tuples

or we cannot generate N tuples from the given lists

I nothing was retunred we did not find N different sums.

output : [[16, [0, 0, 0]], [15, [0, 1, 0]], [14, [0, 0, 1]], [14, [0, 0, 2]], [14, [0, 0, 3]]]