I have a set of data, let's say describing a group of people. Let's say we know their income and color of hair:
N | hair | income
---|--------|------
1 | brown | £2000
2 | black | £1400
3 | brown | £1800
4 | red | £1600
5 | brown | £2500
6 | black | £2800
7 | white | £3000
8 | white | £1800
9 | red | £1600
Is it possible to find out whether the independent variable, hair color, has an impact to the dependent variable, salary? The problem I see is that we cannot "sort" the hair colors. However, I would like to know a result similar to:
Red color -> highest salary Brown and black -> middle, not significant difference White -> lowest salary
What's the best method to get such results? Is it safe to number the hair colors, or do we need to create a dummy variable for each color extra?