2

In this link, the Wilson Score Interval is used to calculate the interval of a discrete distribution in which possible outcomes are 1 star, 2 stars, 3 stars, 4 stars and 5 stars (this is used to calculate a score for rating items/comments on a webpage).

However, I was under the impression that the Wilson Score interval can only be used for a binomial distribution, so assuming that the link is computing the bounds correctly, how does it use WSI to get the values if the distribution isn't a binomial one?

It indicates near the bottom of the page that the final score is 1 + 4 * (the result of the formula)... can someone please explain why this will give the answer?

u3l
  • 201

1 Answers1

3

You are correct. Wilsons score interval is for a sample proportion (2 outcome). What the link appears to be doing is converting the average rating (between 1 and 5) to a value falling between 0 and 4, then dividing by 4 to get a "proportion" between 0 and 1.

The implicit assumption of the above is that it assumes a strict relation between the average rank and its variability: getting five 1's and five 5's is equivalent to getting to getting 10 3's, even though the latter has NO variability and hence no uncertainty.

I don't think this is a valid way of generating a confidence interval. I would recommend one of two other approaches:

  1. (Most accurate) Model the distriubtion of stars as a multinomial distribution, and use inferential procedures geared towards it.
  2. (More work) Estimate the multinomial rank probabilites from the data then draw bootstrap samples to determine the variability OR do the same thing but with the average rank in the sample.
  3. (Less accurate): Take the average rating, $\bar R$ and form an approximate confidence interval using the normal approximation: $\bar R \pm z_{\alpha/2}s$

Any of the above is probably more valid/accurate than what is on that link.

  • Thanks for the answer, this is what I was looking for. Could you perhaps point me to a link or two describing the 1st or second approach you mentioned? – u3l Mar 18 '14 at 05:04
  • @Trust sure: Approach 1: http://www.biostat.umn.edu/~dipankar/bmtry711.11/lecture_03.pdf, Approach 2: http://tx.liberal.ntu.edu.tw/~purplewoo/literature/!Methodology/!Distribution_SampleSize/SimultConfIntervJSPI.pdf –  Mar 18 '14 at 14:29