4

I want to have a value for users between 0-1 , that shows how much they like movies of specific type depending on how many movies they have watched and movie type ( comedy , etc ). I have a data of users watching movies during years.

data format

  • user - movie - year - type

Was thinking of something like

user1_comdey = number of comedy movies / number of watched movies

How can I use the years in the equation to make recent years more important ?

What we call this in Math ? I didn't know what best tags to use.

Thanks

Gerry Myerson
  • 179,216
tnaser
  • 121

2 Answers2

1

Long comment:

Let's focus on one particular user, Alice, and on comedy movies, during a period of years $\{y_1, y_2, \ldots, y_n\},$ where $y_n$ is the most recent year. Assume Alice watched total $t_{i}$ movies during each year $i.$ Further more, she watched $c_{i}$ comedy movies during that year $i.$

To model Alice's preference of comedy movies, we can use the fraction $$ \frac{\sum_{i = 1}^{n} c_i}{\sum_{i = 1}^{n} t_i} = \frac{\text{total number of comedy movies watched}}{\text{total number of movies watched}} $$

To add more weight to recent years, you can use a weighted sum, and assign higher weights to recent years. In other words, Alice's comedy score would be: $$ \frac{\sum_{i = 1}^{n} w_i c_i}{\sum_{i = 1}^{n} t_i} $$ where, for example, $$w_n = n, w_{n-1} = n-1, \ldots, w_1 = 1.$$ This weight assignment intuitively says that if Alice watched a comedy movie recently then we are going to count it more than once.

You can use different weight schemes. For example, you can read about exponential weights here in this Wikipedia article on Moving Averages.

  • I was wondering about the min and max of the formula with weighted sums. It looks like it is not going to be between 0 and 1 as in the OP. – NoChance Mar 29 '12 at 23:28
  • @EmmadKareem The current range is $[0, x]$ (too lazy to figure out $x$). You can always rescale/normalize to $[0, 1].$ –  Mar 29 '12 at 23:34
  • In fact, if we change the denominator to $\sum_{i=1}^{n} w_i t_i,$ then it should be well within $[0, 1].$ –  Mar 29 '12 at 23:35
  • Thanks for the clarification. – NoChance Mar 29 '12 at 23:41
  • We call this function/equation a model ? and the below answer another model ? a model is a function that can be simple or complicated ? – tnaser Mar 30 '12 at 15:28
  • @tnaser We would call this a model. Models are composed of functions, and constants, and mathematical expressions. There could be different models to solve the same problem. For example, the answer by Emre presents a different model. –  Mar 30 '12 at 15:32
  • How can we increase the weight ? I am trying on some data and get very small number like 0.00004 . – tnaser Apr 05 '12 at 02:23
  • @tnaser I guess you need to post a new question, show us your model, your equations, and your sample data. –  Apr 05 '12 at 02:56
  • But I guess (without seeing your model), a temporary solution would be to scale all weights, i.e. $\dfrac{w_i}{\displaystyle\max_j {w_j}}$ should be fine. –  Apr 05 '12 at 02:58
  • didn't get that , what is j ? – tnaser Apr 05 '12 at 03:18
  • $\max_j w_j$ is another was to say find the maximum weight among all the weights you computed. –  Apr 05 '12 at 03:19
  • What about using log , square root , 2^ , exp , e . I see them a lot but don't know why and how they were selected ? – tnaser Apr 05 '12 at 03:40
0

A simple weighting function is $w(movie)=\exp(t_{movie}-t_0)$, where $t_0$ is the present year and $t_{movie}$ is the year of the movie. For user $i$:

$u_i (comedy) \equiv \frac{1}{T_i} \sum_{k \in comedies} w_{movie_k}$

where $T_i=\sum_{n \in movies} w_{movie_n}$

This will give you a number between 0 and 1.

Emre
  • 2,783
  • Why to use exp ? – tnaser Mar 30 '12 at 01:05
  • 1
    Because it is a simple function that smoothly tends to zero, and works with arbitrarily distant dates. If you want to tweak the attenuation rate, you can use $\exp(-a\Delta t)$ instead of just $\exp(-\Delta t)$, where $a$ is your tweaking knob. – Emre Mar 30 '12 at 01:07
  • What will be the difference between calculating , exp (-1) + exp(-2) than calculating exp(-3) ? – tnaser Mar 30 '12 at 02:05
  • Should I sum the years than get the exp , or sum the exp ? – tnaser Mar 30 '12 at 02:11
  • Sum the exponentials. The numerator is taken over the set of comedies. The denominator is taken over the set of all movies viewed by a particular user. – Emre Mar 30 '12 at 03:00