1

Let me explain what I mean by Cell Size Suppression in case it's known by another term and explain my problem.

I need to display potentially sensitive medical data in an interactive web application that shows counts of the number of people with a given medical condition, a given age, gender, height, region and other demographics when the user applies a filter to investigate the data.
To protect the data from being identifiable the imposed policy is that no count should be below a set number, the cell size e.g. 5 so that when filtering is applied, any count greater than 5 can be displayed but none below.
In a pie chart of 100 people with various conditions the slices could be:-
condition a= 80, b= 10,c= 6, d=4 - this fails the cell size limit so must be hidden

the solution is to roll up the smaller segments until the minimum is 5 or above like so:-
condition a= 80, b= 10,c & d rolled up = 10

now you can't determine the exact number of people with condition c or d.

This is straight forward but when another pie chart displays their age (or other demographic).
age20= 80 , age21= 10 , age22= 5 , age23= 5

then cell size suppressing each chart individually may not be enough - it may be that by iterating through all values of another filter and seeing how the different slices are rolled up you can determine how many are in the 'd' condition.
Presumably, by increasing the cell suppression size to say 6 (or some other value) in this 2 pie chart case it protects the underlying policy limit of 5.
With 3 charts it may be that I need to apply suppression of say 10. I'm thinking this is is analogous to multiple simultaneous equations but where they can be iterated through a large range of values.

The solution I'm looking for is a generalised formula that tells me for 'n' number of pie charts what suppression size I need to apply to all of them to ensure that the underlying suppression value of 5 is not breached for any one of them. One of the possibly complicating factors is that each data attribute/demographic has a different number of distinct values e.g. there might be 1000 medical conditions, 100 ages, 50 ethnic categories, a few gender categories etc. But maybe that doesn't matter.

I'm sure there's a basic mathematical problem at the heart of this but I'm unable to pose it as one I'm afraid. Thanks in advance for any help.

enter image description here

0 Answers0