0

Suppose I want to know the number of people infected by Covid-19 in a week and a month using least squares method, assuming the function to be approximated is such that F(t) is the number of people infected in day t, and t is always >= 0.

Should I use all data I have to make the predictions? By all data I mean the number of people infected in the beginning of the pandemic, the number of people infected in the day after the beginning of the pandemic, and so on until today. Or would it be better to use only recent data? If so, how can I determine which data to exclude?

Thank you for your attention.

  • If you have a principled reason to use a subset of data to make a prediction, then that reason dictates which subset of data is best suited for inclusion. In what you describe the reason seems to be that rates of infection vary with time, so recent data is more closely related to the current rate of infection than data from long past observations. You might be interested the various methods called moving averages. – hardmath May 27 '22 at 16:11
  • It would be better if you look at all the data – MathGeek May 27 '22 at 16:12

0 Answers0