1

I have 5 years of data giving the salaries for each department in an organisation shaped roughly like this (with the number of employees and salary changing each year):

Department Employees Salary
A 118 25,834
B 375 22,356
C 235 26,519

The question I'm trying to answer is: what was the average salary over the five years for the whole organisation? At first I thought I could do a weighted average for each year using the employee column as the weights (as this is the total number of employees in each department) and then average the averages; however, I have read that it is usually invalid to average averages.

If I was to do the weighted average for each year and then do another weighted average using the total number of employees across all departments for each year as the weight, would this make it valid or is there no case in which it would be valid to do this?

tmnsnmt
  • 25
  • 1
    In an extreme case... Think of this... two people make $$10$ a year on average. Ten billion people make a billion dollars a year on average. Do you think it makes sense then to average the two to say the average of the averages is 5 billion? – JMoravitz Aug 10 '21 at 23:35
  • 1
    In the end... it depends on what question you are trying to answer, and most of the time it makes no sense to do an average of averages and they should absolutely be weighted first. – JMoravitz Aug 10 '21 at 23:36
  • The weighted average salary for each year gives the overall average salary for that year (if weighted correctly.). If the number of employees did not change over the five years, than an average of the averages is the same as the weighted average of averages is the same as the overall average. However, if the number of employees changed much year-to-year than this doesn't work. – Jair Taylor Aug 10 '21 at 23:49
  • Now... suppose we had a story... that John Doe worked at this company for a year in Department A, then got changed to department B for a year, then finally department C... where he made an average salary from each during his time. What was the average amount that John made in salary over these three years? For that you would just average the averages... but that is because each of the three averages contributed the same amount to the final question you were asking. – JMoravitz Aug 11 '21 at 00:22
  • So does this mean I should weight the average of the weighted averages using the total employees in each department for each year as the weights? I just want to be able to say what the average salary was between 2015 & 2020. – tmnsnmt Aug 11 '21 at 08:08
  • 1
    What do you mean by "minimum average salary"? – Eric Wofsey Aug 11 '21 at 20:15
  • 1
    I am taking the average of the lowest salaries across all departments and the highest salaries (I excluded the latter from the example as it wasn't necessary), so by minimum average salary I mean the average of the lowest salaries across all departments – tmnsnmt Aug 11 '21 at 20:27

1 Answers1

2

You need to decide what “the average salary over the five years for the whole organisation” means. Here's an example to show how it can be ambiguous. In this example, there are no departments, but after the example, I will mention what to do if you only have aggregated data by department, which is the case for you.

Suppose the organization had one employee for the first four years, and that employee earned \$15,000 every year. In the fifth year, there were ten employees: the one who had been there before, still earning \$15,000, and nine new employees, each earning \$100,000.

What is the answer you want? Is it the average of all one-year salaries ever paid at the company, which would be $$A_s={15000+15000+15000+15000+15000+9\cdot100000\over14}$$

or would it be the average of the yearly average salaries, which would be

$$A_y={15000+15000+15000+15000+{15000+9\cdot100000\over10}\over5}\mbox{ ?}$$

Now let's see how to get each of these if we have departments.

Let the \$15,000 person be the sole person in department A, and let the \$100,000 employees be in department B.

So your input data is four years of this:

Department Employees Salary
A 1 15,000

and one year of this:

Department Employees Salary
A 1 15,000
B 9 100,000

It's easy to see that $A_y$ is the average of the five weighted average salaries (so you weight for each yearly average, but then you do not weight a second time when you average over years).

On the other hand, $A_s$ is the weighted-weighted average, where you weight by employee count twice: first when you calculate each yearly average salary, and then again when you average the five results.

$${1\cdot15000+1\cdot15000+1\cdot15000+1\cdot15000+10\cdot{15000+9\cdot100000\over10}\over14}$$

In other words, there is a case where weighting by number of employees twice is appropriate.

tl;dr: Always work out a simple but non-trivial example!

Steve Kass
  • 14,881
  • I think I confused things by mentioning the minimum factor. Have edited the question to make clear exactly what problem I'm trying to solve. – tmnsnmt Aug 11 '21 at 21:45
  • 1
    I rewrote the answer to match your edited question. – Steve Kass Aug 11 '21 at 22:55
  • Thanks for setting out these examples. Based on what you set out, I would say by “the average salary over the five years for the whole organisation” I meant the average of the yearly salaries for the five year period. I guess my follow up question is how do I determine which is the most valid method to use? Would I be correct in saying that if the number of employees in each department and respective salaries don't differ by too much year to year then it's not problematic to do it as in ? And if they do differ quite a lot, as in your example, then would be more accurate? – tmnsnmt Aug 12 '21 at 10:46
  • If you want the answer you say in the comment you want, find $A_s$. It’s not just “more accurate,” it’s just correct. But yes, if the number of employees stays the same (even if department sizes change), the employee-count-weighted average of year averages ($A_s$) will equal the unweighted average of yearly averages ($A_y$). Neither calculation is hard, so I’m not sure why you’d choose the one that depends on an extra assumption to be right; but regardless, you can look at some made-up data to see how far off the wrong calculation is when there are only modest changes in number of employees. – Steve Kass Aug 12 '21 at 18:30
  • Which one do you mean depends on an extra assumption to be right? – tmnsnmt Aug 13 '21 at 08:56
  • The extra assumption that the tital number of employees stays the same each year. – Steve Kass Aug 13 '21 at 16:50