1

I have a lots of football data for 2010 and I'm interested in finding the dependency of the number of average goals per game on a) the competition (e.g. Premiership, World Cup, FA cup) and b) the weather (e.g. sun, rain, snow etc). Note both are categorical variables. What is the best guide to statistical dependency testing for categorical variables (i.e. what tests should be carried out)?

Also, the questions asks if the average goals per game is dependent on the year. Given I only have data for one year, I'm assuming this is unasnwerable?

Daniel
  • 1,015

1 Answers1

1

Do you have data for each game on weather and a variable for competition? If so, here's what I would do. Define dummies for weather, e.g., $Sunny_i=1$ if and only if there is sun on the day of game $i$, otherwise $Sunny_i=0$. Depending on your weather data, you may be able to make this more detailled (e.g., factor in temperature, or make dummies for sun, rain and snow).

What is your variable for competition? Perhaps you have a number between 1 and 10 for each of the leagues. Then this variable would take the same value for all games in the same league (of the same year). The easiest thing to do is to run OLS on this, i.e., you estimate the equation $$SumGoals_i=\beta_0+\beta_1 Sunny_i+\beta_2 Competition_i+\epsilon_i.$$ The coefficient $\beta_1$ then tells you how many goals on average you get per game if the weather is sunny, compared to when it's not. Similarly, the coefficient $\beta_2$ tells you how many more goals you get per game on average if your competition index increased by 1 (e.g., from 7 to 8).

This is a very crude estimation. Since you observe multiple games per team, you should probably use more sophisticated methods like team fixed effects, which controls for the fact that some teams (with weak strikers) make very few goals, while others make a lot.

Finally, you cannot compare whether there are more goals in one year than the other if you have no data on the other year. Not possible. If you had the data, you could estimate time trends very easily by including dummies for each year in the OLS estimation.

Nameless
  • 4,045
  • 2
  • 20
  • 36