$52$ cards reciprocal sum probability

Question

Imagine a deck of $52$ cards but instead of having suits and ranks, they only have sequential (unique) integer ranks from $1$ to $52$. You could also imagine a standard deck of $52$ cards but convert the ranks and suits to an integer number from $1$ to $52$ so that all the cards have different numbers assigned to them.

So the question is, what is the probability, if you randomly choose exactly $10$ cards from that deck without replacement, that the sum of the reciprocals of the cards ranks (from $1$ to $52$) totals exactly $1$? For example, if you chose cards $5,10,15,20,25,30,35,40,45$ and $50$, the sum of the reciprocals would only be $0.58579$... so that is too small.

As a hint, I believe if you sort the cards in ascending order, the first card (the lowest rank card) MUST be between a $2$ and $6$ (inclusive) to be a candidate solution. This is because $1/1$ is already a sum of $1$ so any additional cards will make it too large a sum. Also $7$ as a lowest card will not work cuz $1/7$ + $1/8$... + $1/16$ = $0.93$... so the lowest card MUST be a $2,3,4,5$, or $6$. I am re-running my modified simulation now with that added information to prune the state space it must check.

Also I am seeing multiple solutions so there is not just $1$ solution but the probability is likely very low, only a small fraction of $1$% I would guesstimate.

Also note than many solutions are very close to $1$ but not exactly $1$, thus making computer simulation of this type of problem more difficult. An example of a "close solution" is $2,4,13,25,35,41,47,50,51,52$ which evaluates to $0.99999995750965$. The closest sum not equal to $1$ happens with $3,4,8,14,17,22,26,29,46,47$ which evaluates to $1.00000000288991$. That is $8$ zeros after the $1$.

An interesting note... That number very close to $1$ is gotten by summing only $10$ terms. This is impressive since even summing negative powers of $2$ which converges to $1$, ($1/2$ + $1/4$ + $1/8$...), it takes $28$ terms to almost match that closeness to $1$ and $29$ terms to beat it.

Interesting question... Are you certain the probability is not zero? — Logophobic, Feb 20 '16 at 17:35
Yes I already have one solution that works but don't know how many out of 52 choose 10 there are so I cannot yet compute the probability. If you find this question interesting please upvote/favorite it. — David, Feb 20 '16 at 17:36
Other than an exhaustive computer simulation of billions of combinations, I don't know how to solve this mathematically but would like to know. Another problem with simulation is roundoff error. I will check my one solution to make sure it is really a valid solution but in my Excel spreadsheet with 17 digit decimal precision, it is showing exactly 1. That is, 1.00000000000000000. — David, Feb 20 '16 at 17:39
5,6,7,8,9,10,12,40,42,45 is my 1 solution. I would like to know how many out of 16 billion or so there are so the probability can be computed accurately. I stumbled across this answer by running a computer simulation of all 10 card hands but starting at 5 and my program told me this solution very quickly. The # of hands can be trimmed/pruned a lot since we know that card 1 cannot be a candidate since 1/1 is already sum of 1. We can also trim starting at any card higher than 10 since that is only 0.1 and any higher ranked card (such as 11) wont add enough to get it up to 1. — David, Feb 20 '16 at 17:49
Indeed $\frac{504+420+360+315+280+252+210+63+60+56}{2520}=1$ You've convinced me to make an effort... — Logophobic, Feb 20 '16 at 17:50
Yes it is fairly easy to find single solutions but how can we mathematically show all the solutions and thus compute the probability? I have to run my simulation overnight to get the answer and even then I wont be 100% sure it is correct. — David, Feb 20 '16 at 19:21
You should do calculations like this using integers, not floats. Python has a rational class that can operate with them or you can store the numerator and denominator. Your calculations are then exact and you won't get fooled by being close. — Ross Millikan, Feb 23 '16 at 05:37
My revised algorithm where I multiply all the ranks to get a common denominator then convert all the reciprocals of the ranks (for example 1/2, 1/5...) to that common denominator, then add them all up to check for equal to 1 seems to work well. Those are fractions but always with integer numerators (and a very large denominator). The simple example I gave is if you want to check if 4 cards (ranks) have reciprocals that add up to 1. The 4 cards are 2, 4, 5, and 20. Multiply all those together and get 800 for a new denominator. Rank 2 then equals 400/800 , rank 4 is 200/800... then add em up. — David, Feb 23 '16 at 13:30
Actually since I know the denominator (800 in the simple example), I only have to multiply each reciprocal (1/2, 1/5...) by the 800 and then add up the new numerators only and check for equal to 800. In reality, the new denominator will be a much large number for 10 cards, some being of high rank (such as 45). For example, a valid solution is 5,6,7,8,9,10,12,40,42,45. Those 10 numbers multiply to 137,168,640,000. Even the largest rank (45) converts to a large new numerator of 3,048,192,000. — David, Feb 23 '16 at 13:48

score 14 · Accepted Answer · edited Feb 21 '16 at 12:03

The easiest thing is enumerate all the possibilities (few seconds of CPU).

There are 1431 ways to extract distinct cards such that the sum of their reciprocals is 1.

So the probability is $$ 1431 / {52\choose 10} \approx 9.0455\cdot 10^{-8} $$

Among those 1431, the first and last lexicographically are $$ \{2, 4, 16, 26, 30, 36, 39, 45, 48, 52\} $$ and $$ \{5, 6, 8, 9, 10, 12, 15, 18, 20, 24\} $$

My algorithm is very naive. The running time of the resulting program is less than 2 seconds.

Any good set (sum of reciprocals equal to $1$), can be divided in two subsets of $5$ elements, one with sum $s\le\frac{1}{2}$ and the other with sum $\frac{1}{2}\le s<1$.

There are only ${52\choose 5}=2598960$ quintets, so generating them is not a problem. In particular, I stored the $422730$ whose sum is between $\frac{1}{2}$ and $1$.

Then I generate again the quintets, looking for those with sum $s\le\frac{1}{2}$ and I checked if among those stored there were some with sum $1-s$ and with a non-overlapping set of numbers.

I stored on file the matching quintets ($180959$) and then I used another software to eliminate the duplicates (duplicates were generated because there were many cases of $\frac{1}{2}+\frac{1}{2}$ and I was too lazy to implement a filter inside the program, since I could eliminate the duplicates later with a couple of Mathematica instructions.

For what concernes the implementation, I used C++. I'm not a C++ programmer, I'm too old for that OOP, and normally I use C, but the STL contains so many data structures already implemented...)

So I used a multimap (a map which can contain multiple elements with the same key and can be searched efficiently for a specific key), where the key was a representation of the sum of 5 reciprocals and the associated data was a representation of the 5 cards.

In practice I used a 64 bit integer for the key and a 64 bit integer for the data.

Given five numbers $a$, $b$, $c$, $d$, $e$, below 52, it is easy to see that their product fits in a signed 32 bit integer, because $52\cdot51\cdot50\cdot49\cdot48=311875200 < 2^{31}$. So, if I reduce the fraction $$ \frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}+\frac{1}{e}=\frac{p}{q} $$ where $(p,q)=1$, then I know that the value $p\cdot2^{32}+q$ will fit nicely in a 64 bit integer. This will be my key. When in the second part of the program I'm looking for matching quintes I will set my key to $\frac{q-p}{p}$, to search among those already stored one that added to the current one gives sum $1$.

The representation of the five numbers $a$, $b$, $c$, $d$, $e$ can be managed in various ways, but I decided to use an unsigned 64 bit number and set to $1$ the $a$-th, $b$-th, ..., and $e$-th bit. In this way it is easy to check if two quintuples have numbers in common since it is sufficient to check if the bit-wise AND is zero or not.

I think it's all.

Addendum. The algorithm above is due to my bad sight. When I computed the value ${51\choose 10}$ (the card '$1$' is useless) I did not realize it was just such a little number. I misread the exponent. Indeed it is only about $1.28\cdot 10^{10}$. With modern computers it is just a breeze. Indeed I hastily wrote a dumb C program that in about 36 seconds printed out the 1431 solutions.

Here is a short version (not to be taken as an example of good programming!)

#include<stdio.h>
int main()
{
  long double Inv[53],s0,s1,s2,s3,s4,s5,s6,s7,s8,s9,Err=6.0E-11;
  long double Min=1-Err, Max=1+Err;
  for (int i=1; i<53; i++) Inv[i]=1/(long double)i;
#define FOR(p,c) for(int i##c=52; i##c>i##p && (s##c=(s##p+Inv[i##c]))<Max;i##c--)
  int i0, cnt=0;
  for (i0=2, s0=0.5; i0<=52; s0=Inv[++i0])
  FOR(0,1) FOR(1,2) FOR(2,3) FOR(3,4)
  FOR(4,5) FOR(5,6) FOR(6,7) FOR(7,8) FOR(8,9)
  if (s9>Min) printf("%4d %.15Lf %d %d %d %d %d %d %d " 
  "%d %d %d\n",++cnt,s9, i0,i1,i2,i3,i4,i5,i6,i7,i8,i9);
  return 0; 
}

You enumerate it without the use of fractions, right? Or you used directly the fractions? — Masacroso, Feb 20 '16 at 19:56
+1 Share your algorithm (pseudocode)? Clearly you can stop any growing subset when the sum so far exceeds 1, or even 1-1/52, so you don't have to examine all the subsets. How about the number of solutions as a function of the size of the deck. Might that sequence be in OEIS (https://oeis.org/)? — Ethan Bolker, Feb 20 '16 at 19:56
@EthanBolker I added some info about the algorithm. Probably this sequence is already in OEIS, but since it depends on two parameters (the total number of cards and the number of addends) it is probably stored in a way (as a table, by diagonals, for example) that makes searching for it not immediate. — Giovanni Resta, Feb 20 '16 at 20:36
@Masacroso As you can see above (I edited my answer) I used the reduced representation of the fractions, because I needed to search them in a data structure, and thus it was much easier to store them as a pairs of integers. In general, I avoid using floating point numbers as much as I can, because of the problems they can introduce. — Giovanni Resta, Feb 20 '16 at 20:39
Yes, exactly, to avoid numerical errors. This is was exactly what I was asking. — Masacroso, Feb 20 '16 at 21:39
Few seconds of CPU depending on the computer, language, algorithm... In my case, it is taking many hours to run. I was lazy and didn't do much optimization first cuz I don't mind letting the computer run for several hours or even overnight. — David, Feb 21 '16 at 01:44
This is all very nice! BTW, I had trouble with the %.15Lf format specifier for long double; it turns out that if you are using gcc with MinGW, you need to compile with -D__USE_MINGW_ANSI_STDIO. Just in case others have the same problem. — TonyK, Feb 21 '16 at 12:35
FWIW, I get the same count and first & last solution sets, using fairly clunky Python code, which takes around 24 minutes on this old 2GHz single core machine. I use a Fraction class to do the arithmetic, rather than floating-point. Curiously, only one quintuplet has a sum of ½. — PM 2Ring, Feb 21 '16 at 13:00
24 minutes is pretty good. My original program took about 10 hours to run cuz it checked about 7.9 billion possible hands. When I put filtering in it , I got that same program down to about 5 minutes. I am attempting further optimizations now. This is using an interpreted language so in a compiled language, I would expect it to be super fast. The filtering seems very good cuz instead of letting the outer loop go from 2 to 5 which are the only candidates for the lowest ranked card of a winning hand, I let it run from 1 to 43 but the program only took a fraction of a second longer to run. — David, Feb 23 '16 at 03:44
You can eliminate all the cards that are primes greater than $26$ as you can't cancel them from the denominator. That will reduce the number of combinations as your deck (removing $1$ as well) is now $45$ cards. That reduces the number of five card combinations to $1,533,939$. That probably saves a factor of $3$ runtime. — Ross Millikan, Feb 23 '16 at 05:42

user5713492 · Answer 2 · 2016-02-21T11:12:12.850

4

The a=6 case can be eliminated by pencil and paper.
Consider that 1/1 = 1, 1/2 = 7, 1/3 = 9, and 1/4 = 10, all mod 13. The only multiple of 13 we can get out of sums of these numbers is 7+9+10=26, which shows that 1/26+1/39+1/52=1/12 is feasible, but 1/13 must not appear in the sum.
Also 1/1 = 1, 1/2 = 6, 1/3 = 4, 1/4 = 3, all mod 11. The only multiple of 11 in this case is 1+6+4=11, so 1/44 is excluded, and if 1/11 is in the sum, then 1/22 and 1/33 must also be.
These two considerations alone mean that a=6 is impossible:

sum(1.0/[6,7,8,9,10,11,12,13,14,15]) =    1.034896
sum(1.0/[6,7,8,9,10,11,12,14,22,33]) =   0.9670634
sum(1.0/[6,7,8,9,10,12,14,15,16,17]) =   0.9883869

Similar considerations exclude a total of 20 cards. With that, along with the exclusion of card 1 by its magnitude, the deck is down to 31 cards and there are only 4 possibilities for the lowest card. Throw in an early-out test at about the seventh card and compiled code gets down to about 30 ms. The interpreted code should go through in tolerable time. Obviously not an improvement on brute force because that is quick enough and less error-prone.

edited Feb 21 '16 at 11:12

answered Feb 21 '16 at 10:31

user5713492

15,938

1

+1. I like this approach. I think I used something similar many years ago on a problem with Egyptian fractions. In this case maybe it's overkill, but it could be useful to made tractable a problem like this but just a little bigger, since combinations grow fast if one increases 52 or 10. – Giovanni Resta Feb 21 '16 at 11:30
Generally I like to keep the algorithm simple to help ensure I get the correct answer, then later when the answer is confirmed correct, I can go back and try to speed up the program using a better method such as state space pruning, more efficiently running code... Maybe in 12 hours or so I can try some faster implementation after I think about it more. – David Feb 21 '16 at 14:31
I've eliminated the primes >= 17 and their multiples, as well as multiples of the squares of 5 and 7. And 1, of course. That brings the deck size down to 34, and that's enough to get the running time of my Python code down to 1m55s. – PM 2Ring Feb 22 '16 at 12:18
@PM 2Ring Either your deck size is down to 35 or you have eliminated one of {13,27,32,44} because all of the other cards each appear in many of the 1431 solutions. – user5713492 Feb 22 '16 at 16:37
Oops. Yes, I also eliminated 27 because it's $3^3$. Sorry, I forgot to mention that. And I suppose the same logic means I can eliminate 32 = $2^5$. And I do follow your reasons re 13 and 44, so I guess I should eliminate them as well. :) – PM 2Ring Feb 22 '16 at 16:47
a=6 elimination is of no speed increase to the program as with the checks in place I have for card combinations that can never add up to 1, I could allow a to loop from 1 to 43 without affecting the runtime of the program. If I only allow a=6 in my program (and no other value for the lowest card), it only takes 3/100ths of a second (interpreted) to terminate the process, resulting in 0 hits. I am really happy with the filtering I added since the speedup is more than 100 fold (100 times quicker). – David Feb 23 '16 at 05:27
So can someone list all the cards that can be removed please? I assume I would have to build the new deck with the "missing" cards then adjust my nested loops down accordingly. So instead of the deck being all cards from 1 to 52, which of those can I remove? Please list them all here as constants. Thanks. – David Feb 23 '16 at 06:39
1

integer, parameter :: exclude(21) = [1,13,17,19,23,25,27,29,31,32,34,37,38,41,43,44,46,47,49,50,51] – user5713492 Feb 23 '16 at 08:24
Wow with those exclusions and my optimizations, my program that used to take 10 hours to run (overnight) now takes about 6.7 seconds and this is using an interpreted language and about 200 lines of code! It is amazing to me that these numbers can be excluded without changing the answer. In my program, I keep my nested loops pretty much the same but bypassed the body of the loop if the LCV (loop control variable) is in the list of exclusions. So is this an exhaustive list of exclusions? Should I try others to see if the answer changes? The process is so quick now I can try other exclusions. – David Feb 23 '16 at 15:20
1

Exhaustive, as can be seen by searching the solutions, for example. However, the space to be searched can be reduced from 44352165 to 2201100 if your code is flexible enough to handle a slight generalization. The sets {11,22,33}, {26,39,52}, and {16,48} are all or nothing so you can make up 8 subproblems according as each set is present or not. Solve all with the remaining 23-card deck and aggregate your solutions. Maybe you can get down to under 1 second :) – user5713492 Feb 23 '16 at 15:53
My code reports it is only checking 1,262,095 full 10 card hands for equal to 1 . The other ones (of the 7.9 billion) are abandoned early. – David Feb 24 '16 at 03:03
Ah, but if you made the 8 subproblems as I remarked in my last Comment, you might have made it down to 87037 or so comparisons. The elementary number theory approach is roughly orthogonal to the magnitude approach, so many paths that are shortened by one may be missed by the other. With the 8 subproblem method and early outs at both cards 6 and 8 my program gets down to 0.80 ms to find the solutions compared to another 4.7 ms just to print them all out. I was hoping that I could get you down below 1 s; I'm cheering you on here... – user5713492 Feb 24 '16 at 05:40
I am not sure what you mean "all or nothing" for {11,22,33}, {26,39,52}, and {16,48}. Can you elaborate please? If I can understand it I can then implement it. I am up to about 200 lines of code. Interesting how the # of lines of code is not a measure of anticipated speed. My first effort was very simple and short code but very slow as it checked 7.9 billion possible 10 card hands. As the program grew is size, the # of hands went way down and thus the process was MUCH quicker. With no analysis (checking all possible 10 card hands), I would have to scan about 16 billion hands. – David Feb 24 '16 at 11:37
If you checked out a book on elementary number theory from the library and read the first couple of chapters and did a few problems, you would understand, and with less effort than you have expended on the thread. But consider if a/b and c/(11d) are two fractions in lowest terms, b not divisible by 11. Then a/b+c/(11d) = (11ad+bc)/(11bd). We can't cancel that 11 in the denominator because 11 divides neither b nor c. Thus we can only hope to cancel 11s with other denominators with 11s. So group 1/11, 1/22, 1/33, and 1/44 and determine how you can sum to a/b, 11 doesn't divide b. – user5713492 Feb 24 '16 at 16:47

David · Answer 3 · 2016-02-24T13:46:43.703

My simulation is much simpler (requires much less thought). I have $10$ nested loops (a,b,c,d,e,f,g,h,i,j). a goes from $2$ to $6$ only since the lowest numbered card can only be one of those choices. Each inner loop starts at $1$ more than the loop immediately outside of it. The upper limit of the loops is $43$ for a, $44$ for b, ... $51$ for i, $52$ for j. It is running now but is slow and will have to run overnight. I am storing the winners to a database so I can study them later. I will compare results when the program finishes and I will check all the answers in the database to make sure they actually equal $1$.

My program makes a check that the sum of the reciprocals equal $1.00000000000000000$ ($17$ zeros). I think that is the limit of the precision. I doubt any near miss would be truncated at the $18$th or higher decimal digit and not be an actual $1$.

I am actually surprised there are no reported solutions with the lowest card being 6.

So this seems like one of those problems that can only practically be solved via computers, not using a paper and pencil method. At least I haven't seen that type of solution presented here yet. I will give people more time cuz it seems like a hard problem to solve that way.

$Update$: For some mysterious reason, my program missed a few solutions such as $5,6,8,9,10,12,15,18,20,24$. It is confirmed as roundoff error so I need to use a different method where I multiply all of a,b,c,d,e,f,g,h,i,j (all the card ranks), set that product as p, then check if (p/a+p/b+p/c+p/d+p/e+p/f+p/g+p/h+p/i+p/j) = p. That way there are no decimal fractions cuz I am guaranteeing that there will be no remainder in the division since all the ranks are factored in, thus can be factored out. I have to rerun this so it will take many more hours. A simple example of how this algorithm works is if we take just $4$ cards ($2$, $4$, $5$, and $20$). We know that $1/2 + 1/4 + 1/5 + 1/20$ = $1$. To check this, we multiply $2*4*5*20$ and get $800$. Now we basically convert all the reciprocals to $800$ths in the denominator so $1/2$ = $400/800 $... The just sum them up and (in this case), if they tally $800/800$, then they are a solution.

$Update 2$. I also got $1431$ now with my new algorithm. By simulating all possibilities of $10$ randomly drawn cards from the deck of $52$ but with the first card only being a $2,3,4$, or $5$, my program is showing about $7.6$ billion card combinations scanned. So now my next step (since I got the right answer I think), is to speed up the algorithm since it is not necessary to check all 7.6 billion hands since some can be "short circuited" (stopped early) cuz the sum is already too high.

$Update 3$. I put some tweaks in my program so it doesn't have to check all $7.6$ billion hands. It basically keeps a running total of the accumulated reciprocal for however many cards have been seen so far in the hand. If it is already at $1$ (or over), that hand is abandoned early and then next hand is tried. Also, with the $10$ nested loops, I check at each loop to see if the accumulated sum will exceed $1$ if the minimum value of the inner loops is added in. This would mean that no matter what the values are of the inner loops, the sum of the reciprocals with any combination of the remaining cards added in will exceed $1$ so that hand can be abandoned too. I can do this cuz my nested loops are sorted such that each inner loop starts at $1$ higher then the loop immediately outside of it.

Some examples of abandoning a hand early (for illustrative purposes).

Remember my nested loops are named a,b,c,d,e,f,g,h,i,j staring from the outermost one.

First an easy one. $2,3,4$,d,e,f,g,h,i,j. We can abandon this hand since $1/2 + 1/3 + 1/4$ is already more than $1$ so we don't care what d,e,f,g,h,i and j are so we skip all those and try $2,3,5$,d,e,f,g,h,i,j but that is also more than $1$ ($1/2+1/3+1/5$) so we abandon all the $2,3,5$ combinations...

The 2nd check I do for abandoning a hand is when the accumulated sum so far is close to $1$ (such as $0.97619$) as is the case when a,b,c (the outermost $3$ loops) are $2,3,7$. That gives $1/2$ + $1/3$ + $1/7$ = $0.97619$.... There is no way $2,3,7$ can be part of a solution since the remaining $7$ cards (at minimum) would contribute $1/46$, $1/47$, $1/48$...$1/52$. So my program checks for this and doesn't even finish the $10$ card hand in cases like this.

$Update - 4$. I put another check in to speed up the program. I check the case where at each nested loop, the sum of the reciprocals of the remaining cards cannot possibly add up to 1 because even with the maximum value of the reciprocal of the remaining card ranks, the sum will be too low (below $1$). This tweak GREATLY improved the execution speed of the interpreted program. Just looping thru all $7.9$ billion card combinations took a very long $10$ hours or so (overnight). With the "abandon hand" checks for either too high (above $1$) or too low (below $1$) before checking for equal to $1$, I was able to get the program down to only $4.5$ minutes of runtime (more than $100$ times quicker running). Also, only about $73$ million hands were actually checked for equalness to $1$. The output on my screen is fun to watch cuz what used to crawl as all $7.9$ billion hands were checked now "rips" as about only $1$ in $100$ hands (compared to before) are now checked.

$Update-5$ With the help of the exclusion list, my almost $200$ lines of interpreted code runs in $6.7$ seconds. Special thanks to all involved with putting this exclusion list together. The original unoptimized program used to take about $10$ hours which is $36,000$ seconds. The highly optimized program now takes only $6.7$ seconds which is more than $5000$ times quicker to get the same correct answer of $1431$. There still may be a few more optimizations possible. Perhaps the exclusion list is not complete?

I don't know which language are you using, but I also did the 10-cycles brute force in C and it took just 36 seconds. — Giovanni Resta, Feb 21 '16 at 02:08
Interpreted language, not compiled. Probably 3 orders of magnitude slower. (36,000 seconds = 10 hours) — David, Feb 21 '16 at 02:13

$52$ cards reciprocal sum probability

3 Answers3