0

I have two datasets.The first one contains the grade of users mark for a movie(y). The second dataset shows that if the users have rated the moive or not(r) ( if the users have rated the movie => r =1 , otherwise r = 0). I don't know how should I normalize with two variables y and r. Will it be wrong if I use this formula : $$y mean = \frac{y - y min }{ymax-ymin}$$

bento
  • 1
  • For some basic information about writing mathematics at this site see, e.g., here, here, here and here. – Another User Oct 23 '22 at 15:53
  • I'm missing context here, but I assume you want a "neutral" rating to correspond to $0$ (i.e. we assume that someone not rating a movie neither lowers nor raises the movie's score). In that case, it's entirely a matter of what you consider a "neutral" rating to be. Is it the mean rating for that film, for all films, for all films of the same genre, etc.? – Charles Hudgins Oct 23 '22 at 15:58
  • @CharlesHudgins Yes. it's the mean rating of all users for a movie – bento Oct 23 '22 at 16:12
  • With the formula in the question, a movie with ratings that were all either $1$ or $2$ could score as high (or even better) than a movie whose ratings were all $4$ or $5.$ If you want the mean rating, why not just divide the total of the $y$ values by the number of ratings (sum of $r$)? – David K Oct 23 '22 at 21:19
  • @DavidK yes,I've thought about it, but I'm not sure about the normalization of y because it's related to r. – bento Oct 24 '22 at 00:57
  • If $r=0$ the user didn't rate the movie and they shouldn't have a rating to count. Either skip their $y$ value, store $y=0$ by default, or multiply whatever the $y$ value is by $r,$ that is, the total of (meaningful) $y$ values is $\sum r_iy_i$. – David K Oct 24 '22 at 01:02

0 Answers0