I am trying find a solution to figure out a formula for how a specific weighting factor was decided.
I am trying to determine a rarity ranking amongst the species based on traits that it has.
My sample set is a total of 9971 species:
| Trait Category | totalAmt of Traits |
|---|---|
| Trait1 | 18 |
| Trait2 | 22 |
| Trait3 | 21 |
| Trait4 | 29 |
| Trait5 | 10 |
| Trait6 | 10 |
| Trait7 | 5 |
In a tool that I used, it gave the following weighted scale for each of the trait category. I was able to find the weighting scale by cross referencing my score that was calculated with the following : Rarity Score = 1 / (Number of Species with that Trait / Total Number of Species)
| Trait Category | Weighting Multiplier |
|---|---|
| Trait1 | 0.69 |
| Trait2 | 0.56 |
| Trait3 | 0.59 |
| Trait4 | 0.59 |
| Trait5 | 0.43 |
| Trait6 | 1.25 |
| Trait7 | 2.5 |
To further explain the data with an example, Species #1: Trait1 = Strong. There are 148 Species with the value of Strong in Trait1. Thus, with my rarity scoring, it gives a ~67.37. The tool gave a score of 46.92 (~0.696 multiplier from my score) Species #1: Trait 7 = Green and this trait is extremely rare with only 17 species with this trait. and thus my score is 586.53 . The tool gave a score of 1470.59 (~2.5 multiplier)
And again, I am hoping to find out why those specific multipliers were chose for the weighting factor.
I tried looking through normalization or using Standard Deviation for something but couldnt get to those number.
Thanks for the help in advance!