9

What is the most efficient way to to find the i'th combination of all combinations without repetition and without first creating all combinations until i. K is fixed (number of elements in each combination) and N is fixed (number of elements to be combined). The order does not matter although extra kudos for finding i in the following order...


1 2 3 4 5 6 7 8
1,2,3,4 1,2,3,5 1,2,3,6 1,2,4,5 1,2,4,6 1,2,5,6 1,3,4,5 1,3,4,6
9 10 11 12 13 14 15
1,3,5,6 1,4,5,6 2,3,4,5 2,3,4,6 2,3,5,6 2,4,5,6 3,4,5,6
Zodman
  • 103
  • BTW: If you still want all combinations, but just do not want to store such a large set of data to a list, then I would recommend using a backtrack algorithm for that. In case you want them all, that will be the fastest approach. In case you really want to be able to jump into the list, my answer below provides the best solution, I think. – String Apr 09 '15 at 22:20
  • Sorry! It is quite difficult to simplify the description given in my answer. I do not know, if you have read it yet. The idea is more simple than the write-up of it, I think. Sorry for that! – String Apr 11 '15 at 00:02
  • @MarcvanLeeuwen: I agree that it is a duplicate. What happens when this question is closed? Can people still find my answer if they want to? Does the other question link to this then? It is such a thorough answer IMO, which is why I ask/care :o) – String Apr 07 '16 at 12:14
  • 1
    @String When a question is closed, it is not deleted, so can still be found. There will be an obvious link to the duplicate question, but also a link back from the "Linked" list on the right. – Marc van Leeuwen Apr 07 '16 at 12:18
  • @MarcvanLeeuwen: Thank you! That sounds very reasonable, indeed. – String Apr 07 '16 at 12:19

2 Answers2

12

Much later update

After reading Zodman's comment below and his blog post, it turns out that this can be simplified! As it turns out, it is beneficial to change the setup as follows:

  • Start indexing from zero
  • Build combinations with zero as the smallest value, i.e. from $0,1,...,n-1$

Then we can answer the following question:

What is the lowest index a combination containing the number $m$?

Since there are $m$ values smaller than $m$, namely $0,1,...,m-1$, there will be: $$ x=\binom mk $$ combinations before $m$ appears for the first time. These will have indexes $0,1,...,x-1$ and so the first combination containing $m$ will have index $x$. This leads to both a ranking and an unranking algorithm:

Ranking

This is the name used in the blog post for finding the index given a combination. Consider this worked example: $$ rank([7,4,0])=\binom 73+\binom 42+\binom 01=35+6+0=41 $$ so the index of the combination $[7,4,0]$ is $41$, since:

  • You can make $35$ combinations of three less than $7$, so $[7,1,0]$ has index $35$
  • You can make $6$ combinations of two less than $4$, so $[4,0]$ has index $6$ in this set of subcases which is then added, since we are starting from $35$

Formula $$ rank([c_0,...,c_{k-1}])=\sum_{i=0}^k\binom{c_i}{k-i} $$ where $c_i$'s are sorted in descending order.

Unranking

This is done similarly, by finding what the blog article calls the combinadic for the index, given $(n,k)$. So if $(n,k)=(10,3)$ and we want to find the $101$-st combination, then we must solve: $$ 101=\sum_{i=0}^k\binom{c_i}{k-i}=\binom 93+\binom62+\binom21=84+15+2 $$ This is solved by being greedy with the larger terms, and we have found $[9,6,2]$ as the answer.

Original answer

Note that the first number only changes after the rest have performed a full set of combinations within the subset they should be in, ie. we have $$ |\{1xyz\mid 2\leq x<y<z\leq 6\} $$ making a total of $\binom53=10$ combinations starting with $1$ since the three numbers $x,y$ and $z$ are chosen from the five possible values $2,3,4,5$ and $6$. Similarly we have $$ |\{2xyz\mid 3\leq x<y<z\leq 6\} $$ giving us $\binom43=4$ combinations starting with $2$. Based on this principle, we can make a list of values for which the first digit changes: $$ \begin{array}{|c|c|} \hline \text{form}&\text{last }i&\text{formula for last }i\\ \hline 1xyz&10&\binom53=10\\ 2xyz&14&10+\binom43=10+4\\ 3xyz&15&14+\binom33=14+1\\ \hline \end{array} $$


This principle can be repeated down to the next level: Suppose we were to find the $58$-th combination $C=[x,y,z]$ of $k=3$ numbers chosen from $1,2,...,10=n$. Then we see that $$ \begin{array}{|c|c|} \hline x&y&z&\text{last }i&\text{formula for last }i\\ \hline 1&y&z&36&\binom92=36\\ 2&y&z&64&36+\binom82=36+28\\ \hline &3&z&43&36+\binom71=36+7\\ &4&z&49&43+\binom61=43+6\\ &5&z&54&49+\binom51=49+5\\ &6&z&58&54+\binom41=54+4\\ \hline &&7&55&54+\binom30=54+1\\ &&8&56&55+\binom20=55+1\\ &&9&57&56+\binom10=56+1\\ &&10&58&57+\binom00=57+1\\ \hline \end{array} $$ Following the logic of the table above we have $C=[2,6,10]$, and in fact: $$ 58=1+\underbrace{\binom 92}_{\text{1st digit}}+\underbrace{\binom71+\binom61+\binom51}_{\text{2nd digit}}+\underbrace{\binom30+\binom20+\binom10}_{\text{3rd digit}} $$ where each digit can now be derived from the number of terms $1,3,3$ above as follows: $$ \begin{align} x&=1+1&=2\\ y&=x+1+3&=6\\ z&=y+1+3&=10 \end{align} $$


Here is a Python-code where this is implemented:

def C(n,k): #computes nCk, the number of combinations n choose k
    result = 1
    for i in range(n):
        result*=(i+1)
    for i in range(k):
        result/=(i+1)
    for i in range(n-k):
        result/=(i+1)
    return result

def cgen(i,n,k): """ returns the i-th combination of k numbers chosen from 1,2,...,n """ c = [] r = i+0 j = 0 for s in range(1,k+1): cs = j+1 while r-C(n-cs,k-s)>0: r -= C(n-cs,k-s) cs += 1 c.append(cs) j = cs return c

Note that this method does never compute previous combinations, but it still has to compute and sum relevant figures for the amount of previous combinations. But it is really fast! For instance, I asked it to compute cgen(173103094564,100,10) which is the $173103094564$-th combination of $k=10$ numbers chosen from $1,2,...,100=n$, and it immediately responded:

[1, 3, 5, 11, 19, 25, 38, 66, 80, 82]

which I would never have guessed, nor have constructed through $173103094564$ iterations!


Just to show the strength of it, I just did

cgen(68814103099439929837637702193841,1000,15)->
[7, 198, 280, 324, 448, 456, 517, 607, 668, 695, 697, 875, 879, 924, 954]

which actually took roughly 3 seconds to compute. We should not complain when we compare this to first having to construct the $68814103099439929837637702193840$ previous combinations - that would be infeasible!

String
  • 18,395
  • What dies while r-C(n-cs,k-s)>0: do? It is the function C() that I have problems with - I have never coded Python and would like to translate this to C#. – AAsk May 08 '20 at 05:57
  • @AAsk We have r, which is the remainder from i, the index of the combination we are searching for. Then cs is the currently selected number in the range 1 through n, and s denotes the number of places of our combination currently filled in. Hence we have k-s remaining places to fill in each time before cs switches one up to the next candidate number. The “while” checks whether we have enough indices r left to get to the next value of cs, namely can we remove n-cs choose k-s combinations and skip ahead to the next value of cs? I hope I make some sort of sense here ... Let me know! – String May 08 '20 at 07:54
  • BTW, C(n, k) = n! / (k!(n-k)!) is just n choose k. – String May 08 '20 at 07:55
  • 2
    This answer helped me to get to the math I needed to create C# algorithms for rank and unrank. I've blogged about how it all works, provided the code, and rehosted a reference article here: https://www.redperegrine.net/2021/04/10/software-algorithms-for-k-combinations/ – Zodman Apr 10 '21 at 00:08
  • @String Hi String. Have you posted somewhere the full Python code for ranking and unranking? – spaceman117X Jan 01 '23 at 10:42
  • @spaceman117X Hi, I do not have the code anymore, but I made an update and will be able to help you with writing the code, if you like. I found a cleaner solution and updated this answer based on Zodman's comment ... – String Jan 01 '23 at 17:11
  • 2
    @Zodman, I have updated my explanation based on your post. Here I give a reason why your suggested algorithms should work, and how they can be simplified. There is no need for the dual at all, only the combinadic. – String Jan 01 '23 at 17:12
  • @spaceman117X Here is a Python module taking care of this. Note that the primary versions (rank, unrank) are based on zero indexed ranks and zero indexed combinations i.e. $0,1,...,n-1$ whereas (rank_one_index, unrank_one_index) is based one indexed ranks and one indexed combinations i.e. $1, 2, ..., n$. Here is the link: ranking.py – String Jan 02 '23 at 16:58
  • The unranking python code given here does not produce the same ranking system as the ranking function given at the top. The code ranking.py provided by the github link is consistent however. Thank you for this. – Noel May 15 '23 at 18:33
  • @Noel: You are quite right! The approach given by Zodman uses a different ordering, which works very well for simplifying both directions of the bijection. My other approach was fine for the original approach, I guess. – String May 20 '23 at 10:22
1

Just a quick summary of what is described under Combinatorial number system on Wikipedia (and elsewhere). The order is lexicographically increasing with individual combinations written in decreasing order (so in your example this makes all combinations with a $6$ come after those without a $6$, unlike the order you listed).

For the easiest description, two things should be $0$-based: the combinations should be from the set $\{0,1,\ldots,n-1\}$ (so subtract $1$ from every element to get to this form), and the list of combinations starts at index$~0$. The elements of a combination however will themselves by indexed $c_1,\ldots,c_k$ in increasing order. Then the $k$-combination $\{c_1,\ldots,c_k\}$ comes at position $\binom{c_1}1+\binom{c_2}2+\cdots+\binom{c_k}k$, which of course is easy to compute. The essential point is that these numbers are different for different $k$-combinations, see under the link for details.