How do I get a realistic average when having a highly-segregated data groups.

Question

I have an instrument that collects travel_angle, work_angle and speed as the user moves this instrument and this data is collected in constant delta T. Assuming counting starts at 0ms and the increment is 100ms:

|==================================|
| interval | travel | work | speed |
|==========+========+======+=======|
| 0ms      |      3 |   14 |    15 |
| 100ms    |      8 |   15 |    16 |
| 200ms    |      1 |   15 |    14 |
| 300ms    |      6 |   19 |    15 |
| 400ms    |      2 |   22 |    15 |
...
====================================

The numbers that occur in the travel, work and speed do not follow any pattern. You can assume that they are completely arbitrary.

This next part applies to the travel, work and speed columns, but to simplify the matter, I will take work column as an example.

I have a pre-defined range where the system prefers the values to be. This range, in the case of work is 15 to 30. At the end of the sessions, I have to calculate the average_work_angle which is currently calculated as the sum of all the entries in the work column divided by the number of entries in the work column.

The problem arises when we have groups of highly-segregated data which are at a similar or equal distance from the range (15 to 30), on opposite ends of the range.

Say I have an array of entries [10, 10, 10, 35, 35, 35]. These are all on equal distance from the range (first three entries are -5 and the last three entries are +5), and they are on opposite ends from the range.

Although the user has held the instrument incorrectly throughout the entire session, the administrator will not know this because the average will calculate 135 / 6 which obviously falls perfectly in the middle of the accepted range (the middle is the exact correct value, the range around the middle is acceptable error).

I'd ask for some help in finding a more realistic average. I really don't know what to look for, so just providing me with a link to a method that solves this problem or a similar problem will be enough.

I would of course use a time-weighted average here but the problem is that I am working with already collected data and the data collected until now did not record the interval inside of my database.

Please be kind, I am not a math major.

I don't think an average will work. Maybe you need the standard deviation, which shows how wide the spread is. — Daniel, Oct 04 '23 at 13:46
why don't you just supplement the data by the the counts of "in range" and "out of range" — user619894, Oct 04 '23 at 13:48
Thanks for the info. I think I can use standard deviation in this case and provide the ideal value as reference value. — Dimitar Veljanovski, Oct 04 '23 at 14:02
Yes, so I worked it out thanks to @Daniel's hint to use standard deviation. The concept is that given array of [10, 10, 10, 35, 35, 35] and an acceptable value of 22.5 (center between 15 and 30), the deviation of this list from 22.5 is 12.5. Then I calculate 22.5 - 15 (15 is the lower threshold), I get 7.5. I can then compare these two values and see how far off the user is from the target. I will have to implement a filter for extremes (i.e. any values that is above 52.5 (30 + 22.5)). same for lower threshold (15 - 22.5 but it hits below zero and I only have positive integers). — Dimitar Veljanovski, Oct 04 '23 at 14:27
Let me know if I am talking nonsense or if this is a good way to measure user accuracy. @user619894 & Daniel. Thanks for the hint. — Dimitar Veljanovski, Oct 04 '23 at 14:28
The question is what properties do you need. If you are just interested in misses, counting misses as @user619894 suggested is the best way to go. The advantage of calculating deviation is you get an estimation how big misses are. The disadvantage is that data without any misses at all can have higher deviation than data with misses. — mihaild, Oct 04 '23 at 16:23
@mihaild I do not want to count the misses, I want to see how close the user was to hitting the target during a single session. I am aware that standard deviation is not ideal but in this case I really don't have any other clue how I could calculate this information more precisely. If you have any ideas, feel free to share. — Dimitar Veljanovski, Oct 05 '23 at 09:17

How do I get a realistic average when having a highly-segregated data groups.

0 Answers0