tl;dr (summary)
I'm a beginner in mathematics. My question is: is there a formula to calculate the right column in the following table based on the values of the left column?
$$ \begin{array}{|c|c|} \hline {\rm AwardedCount} & {\rm CalculatedScore} & {\rm ExpectedScore} \\ \hline 1303591 & 10 & 10 \\ \hline 108023 & 186 & 125 \\ \hline other data & … & … \\ \hline 12114 & 339 & 250 \\ \hline 20 & 790 & 500 \\ \hline 1 & 1000 & 1000 \\ \hline \end{array} $$
Note that they are only examples of expected values, I just want to show the distribution I want at the end.
The ExpectedScore value should be: $ExpectedScore = [10, 1000]$
The middle column show the score I was able to calculate. The formula I currently use is (thanks to MPW and Brian Rushton):
$$ CalculatedScore = \Big(\big(1 - \log(AwardedCount; MaxAwardedCount)\big) \times 990 \Big) + 10 $$
The $log()$ function use two arguments, the first is the input value and the second is the base, hereby MaxAwardedCount.
This formula is what I was looking for, but I keep this question opened just in case someone have a simpler idea.
Original question with details
(Adjust scores based on rarity)
I'm working on a personal project which involves calculating scores based on badges earned on StackExchange badges. I want to calculate a score from 1 to 1000 based on the rarity of a badge. A badge awarded to a few users (eg. only one user) should have a score of 1000. A badge awarded to a numerous users should have a lower score (always greater or equal to 1).
Let's take some examples:
- The most awarded badge is Popular question, earned 1303591 times, let's call it
MaxAwardedCount - One of the least awarded badge is castle-activerecord, earned only once
- Another badge Famous Question was awarded 108023 times, let's call it
AwardedCount - The Favorite Question badge was awarded 12114 times
Current method
My formula is :
$$ Score = \frac{\frac{MaxAwardedCount}{AwardedCount}}{MaxAwardedCount} \times 1000 = \frac{1000}{AwardedCount} $$
The value is then rounded to the superior integer value.
Results
The Famous Question badge, most earned (108023 times): $\frac{1000}{108023} = 0.009257 \approx 1$
A rare badge, awarded only 20 times: $\frac{1000}{20} = 50$
A rare badge, awarded only twice: $\frac{1000}{2} = 500$
A rarest badge, awarded only once: $\frac{1000}{1} = 1000$
Expectation
The problem is that it gives significant scores only to rare badges. Is it possible to find a formula which will give bigger scores even to more common badges?
Plus, a badge earned 10 times on sites A and B is more valuable on site A if A have more users than B (it is less frequent on A than on B). So the score has to been adjusted depending on the MaxAwardedCount value for each site.
To go further, would it be interesting to calculate the mean of the award count of each badge in order to give a score of about 500 to the badge which are never rare nor frequent? Since I got all the data from StackOverflow badges, I have calculated that the average of AwardedCount is 2451.6115. Is it possible to give an arbitrary score of 500 to this badge, then calculate the score of all the other badges?
I don't know much about mathematics so please use only simple words.