Say I have ratings (1-100) for a list of movies:
- Guardians of the Galaxy: 92
- Interstellar: 83
- Wonder Woman: 95
Every time I watch one of the movies, I may provide a new rating. E.g. after watching each movie four times:
- Guardians of the Galaxy: 92, 60, 85, 83
- Interstellar: 83, 80, 82, 80
- Wonder Woman: 95, 98, 78, 88
I would then want to see the average:
- Guardians of the Galaxy: 80
- Interstellar: 81.25
- Wonder Woman: 89.75
If I store n numbers (all the ratings), I can easily get the average by adding them all and dividing by n.
If I store 2 numbers (currentAverage, numberTimesSeen), I can also easily get the average when I add a new rating. E.g. after seeing each movie 3 times, I would get:
- Guardians of the Galaxy: (79, 3)
- Interstellar: (81.66, 3)
- Wonder Woman: (90.33, 3)
v1 + v2 + v3 n - 1 v4 v1 + v2 + v3 + v4
------------ * ----- + ---- = -----------------
n - 1 n n n
When I add the new ratings (83, 80, 88, respectively), I would get:
- Guardians of the Galaxy: 79*3/4 + 83/4 = 80
- Interstellar: 81.66*3/4 + 80/4 = 81.25
- Wonder Woman: 90.33*3/4 + 88/4 = 89.75
Our new values would then be:
- Guardians of the Galaxy: (80, 4)
- Interstellar: (81.25, 4)
- Wonder Woman: (89.75, 4)
However, I'm still forced to store two values.
Can I get a reasonably accurate value that takes into the history by storing only one value? What's the best formula to use?
