0

In a system with chunks of arbitrary number (5-200) of questions and quantifiable answers, I'm calculating multiple bayesian average values. One for each one of these chunks of questions. Due to the fuzzy nature of these questions, the result is quite satisfying compared to the mean value.

However, when the number of answered questions (i.e. votes) approaches zero, the resulting value approaches the mean of the whole report. Obviously. Thus, if I answer zero questions I get the mean value. I want to avoid that!

Q: How can I modify the parameters (or the formula itself) to get a more fair result when too few questions have been answered? Currently, I have tried setting m to different values between 0..v but the impact is negligible. (Maybe there are alternative formulas that suit my needs better?)

Bayesian average = (v / (v + m)) * R + (m / (v + m)) * C, where
R = average for the answers in this category (mean)
v = number of answered points for this category
m = minimum number of answered points required
C = the mean answer across the whole category.

C is usually around 50-66%.

To summarize, this is what I want to accomplish:

  • The resulting value should be weighted in such way that the number of answered questions has a significant impact.
  • If too few questions are answered, the value should be low.
  • If a large amount of the questions have a relatively "high" answer the value should be very high.

Do I need to introduce threshold values? If yes, how do I calculate decent thresholds?

l33t
  • 1

1 Answers1

0

Wait, why do you want to avoid getting the mean value if zero questions are answered?

That corresponds to the roughest possible estimate, which is what you should get if no information is given (zero questions are answered).

Think of conditional expectation: conditioning on zero questions answered is roughly equivalent to conditioning on the trivial sigma field (i.e. no information), hence should produce the crudest estimate possible: the mean/expectation.

I know this doesn't answer your question; I am just confused about what your reasoning is (and the formatting in comments is poor).

Chill2Macht
  • 20,920
  • I see your point. What if we introduce some conditions that detect too crude estimates and then allows us to skip these values altogether? What conditions would be good to achieve this? – l33t Apr 08 '16 at 20:09
  • 1
    To be honest, my area of expertise (right now; I want to study Bayesian statistics in grad school) is pure probability theory, so I don't really know how to provide any technical answers to your question. I was just trying to understand the overall heuristic you were using. – Chill2Macht Apr 09 '16 at 01:55
  • 1
    I guess what I still don't understand is why m is set to "minimum number of answered points required" and not "number of unanswered questions", so (v+m)="total number of questions", and then you just take the weighted average of the informed estimate - the (v/(m+v)) term, and of the completely uninformed estimate, the (m/(m+v)) term? – Chill2Macht Apr 09 '16 at 01:57
  • 1
    I can see why you would want to avoid too crude estimates, but I don't understand how it could be possible to ever avoid them if the information supplied is too crude? The quality of an estimate necessarily corresponds to the quality of the information (as well as of the methodology used to produce it) -- if the information supplied is crude, then any estimate will necessarily be crude. Basically I am confused about the thinking or motivation behind this problem, because I don't understand it based on what was already written. – Chill2Macht Apr 09 '16 at 02:00