0

I am working on one way and two way ANOVA analyses as part of a python data science project and find myself lost in a rabbit-hole when thinking through combinations of possible comparisons between distributions. For example, if using the Northwind database, you could compare the Quantity (outcome variable) between different discount levels, ie, is there a significant difference between discount levels of 5%, 10%, 15%...etc. This is simple enough, but what about if we want to look at those distributions by Region, and what if we also want to drill down further and look at Quantities for each of 10 products, by 5 Regions', for the 6 levels of discount? The data manipulation quickly becomes complex and I don't know if I'm doing it right.

I'd like to learn more but don't know where to start since I don't even know what to call it. Can anyone help me with a subject or discipline name?

Matan
  • 1
  • You need to have an objective. Studying data in vacuum will be hard and may not be as fruitful. An objective may be "See if there is a correlation between discounting items and the sales rate across regions". People in Marketing may set many of those "small goals" and study the result to better the service. This may be related:https://knowledge.wharton.upenn.edu/article/data-analytics-challenges/ – NoChance May 21 '19 at 14:57
  • Thanks @NoChance, the objective would be to determine if the distribution of Order Quantities are similar or not across Regions, for the different products at different Discount Rates.

    $H_O$: Distributions are equal

    $H_a$: Distributions are not equal

    – Matan May 21 '19 at 15:41
  • As an advice, make your objective very clear. Also publish the question in Statistics forum such as https://stats.stackexchange.com/ – NoChance May 21 '19 at 22:19

0 Answers0