In a system with chunks of arbitrary number (5-200) of questions and quantifiable answers, I'm calculating multiple bayesian average values. One for each one of these chunks of questions. Due to the fuzzy nature of these questions, the result is quite satisfying compared to the mean value.
However, when the number of answered questions (i.e. votes) approaches zero, the resulting value approaches the mean of the whole report. Obviously. Thus, if I answer zero questions I get the mean value. I want to avoid that!
Q: How can I modify the parameters (or the formula itself) to get a more fair result when too few questions have been answered? Currently, I have tried setting m to different values between 0..v but the impact is negligible. (Maybe there are alternative formulas that suit my needs better?)
Bayesian average = (v / (v + m)) * R + (m / (v + m)) * C, where
R = average for the answers in this category (mean)
v = number of answered points for this category
m = minimum number of answered points required
C = the mean answer across the whole category.
C is usually around 50-66%.
To summarize, this is what I want to accomplish:
- The resulting value should be weighted in such way that the number of answered questions has a significant impact.
- If too few questions are answered, the value should be low.
- If a large amount of the questions have a relatively "high" answer the value should be very high.
Do I need to introduce threshold values? If yes, how do I calculate decent thresholds?