How should I normalize the grade of a movie by users ( y)?

Question

I have two datasets.The first one contains the grade of users mark for a movie(y). The second dataset shows that if the users have rated the moive or not(r) ( if the users have rated the movie => r =1 , otherwise r = 0). I don't know how should I normalize with two variables y and r. Will it be wrong if I use this formula : $$y mean = \frac{y - y min }{ymax-ymin}$$

For some basic information about writing mathematics at this site see, e.g., here, here, here and here. — Another User, Oct 23 '22 at 15:53
I'm missing context here, but I assume you want a "neutral" rating to correspond to $0$ (i.e. we assume that someone not rating a movie neither lowers nor raises the movie's score). In that case, it's entirely a matter of what you consider a "neutral" rating to be. Is it the mean rating for that film, for all films, for all films of the same genre, etc.? — Charles Hudgins, Oct 23 '22 at 15:58
@CharlesHudgins Yes. it's the mean rating of all users for a movie — bento, Oct 23 '22 at 16:12
With the formula in the question, a movie with ratings that were all either $1$ or $2$ could score as high (or even better) than a movie whose ratings were all $4$ or $5.$ If you want the mean rating, why not just divide the total of the $y$ values by the number of ratings (sum of $r$)? — David K, Oct 23 '22 at 21:19
@DavidK yes,I've thought about it, but I'm not sure about the normalization of y because it's related to r. — bento, Oct 24 '22 at 00:57
If $r=0$ the user didn't rate the movie and they shouldn't have a rating to count. Either skip their $y$ value, store $y=0$ by default, or multiply whatever the $y$ value is by $r,$ that is, the total of (meaningful) $y$ values is $\sum r_iy_i$. — David K, Oct 24 '22 at 01:02

How should I normalize the grade of a movie by users ( y)?

0 Answers0