0

I have $n$ unknown finite sets $A_1,A_2,...,A_n$. I only know:

  • $|A_n| = X_n$
  • $ |\cup^n_{x=1}A_x| = X$
  • $ |A_x \cup A_y| \leq |A_x| + |A_y|\ with\ x,y \in \{1,...,n\}$

Can I estimate the size of $ \cup^6_{x=0}A_x $ better than using the inequality above.

My current approach would be running a simulation but I would expect a mathematical approach that is simpler and probably a less ad-hoc.

Asaf Karagila
  • 393,674
Sim
  • 208
  • You mention something about taking the largest sets, so I think you didn't gave all information about your question. – Esgeriath Sep 02 '22 at 08:51
  • @Esgeriath what information is missing? Given that I have the individual sizes of the sets, I do know the largest ones? However, I deleted the second part of the question as it is confusing and can be answered by the first part of the question. – Sim Sep 02 '22 at 08:52
  • Ahh, sorry you're right – Esgeriath Sep 02 '22 at 08:53

2 Answers2

1

Since you mentioned using computer simulation, I assume your sets are finite.

The last condition you have does not give any information, since for any sets we have $$ | A \cup B | + | A \cap B | = | A | + | B | $$ so your inequality is trivial and you don't have any information about sets relation at all. The best you can do is to bound $$ \max\left\{\max_{i \leq 6}|A_i|, X - \sum_{i=7}^n|A_i|\right\}\leq|\bigcup_{i = 1}^6 A_i|\leq \min\left\{\sum_{i =1}^6|A_i|, X\right\}. $$

If $n > 6$ you can easily find examples which show that those bounds are the best you can get.

Esgeriath
  • 2,181
  • Yes, my sets are finite. The last condition does add the information that there may be intersections between sets. The answer to my question would be trivial if $|A_x| + |A_y| = |A_x \cup A_y|$. – Sim Sep 02 '22 at 09:06
  • 2
    That does not add information, since you always have $|A_x| + |A_y| \geq |A_x \cup A_y|$. If you had $|A_x| + |A_y| = |A_x \cup A_y|$ or $|A_x| + |A_y| > |A_x \cup A_y|$ that would be some information – Esgeriath Sep 02 '22 at 09:11
1

Are you looking for a probabalistic answer?

Without any additional knowledge about the sets, we assume the elements are random. If all the sets were disjoint then we would have $|\cup_{x=1}^n A_x|=\sum_{x=1}^n |A_x|:=Y$, that is $X=Y$. But if we have non-empty intersections, then $X<Y$. Hence, we could say, that the probably a random element from a random set will repeat in another set is $\frac{Y-X}{Y}$.

This would then appear to form a binomial calculation based on how large $n$ is and how large each set is. For example, given a set $A$, how much of $A$ will repeat in another set $B$. Well for each element in $A$ the odds it will repeat in any of the sets is $\frac{Y-X}{Y}$, so the odds it will repeat specifically in $B$ is $\frac{Y-X}{Y}\frac{|B|}{Y-|A|}$.

Continuing in this fashion we could find what the expected size of the union of two random sets is. Similarly we could go through the calculations for 3 random sets unioning, and 4, etc.

Bobby Ocean
  • 3,205
  • Yes! This is what I am looking for. How high is the confidence associated with that estimate? I.e., assuming the probability of an element of A being in B is 30% and A has the size of 100. How likely are 30 in B? I would assume that reducing the number to 20 would increase the confidence of the estimate? – Sim Sep 02 '22 at 09:57
  • I haven't worked through all the calculations. But in general, given a collection of N sets, there is some expected joint-wise overlap, such that when combined altogether we end up with on average an overall overlap that gives us a size of X. That is, we should expect that any two sets have a little bit of overlap, even when all combined together we get a lot of overlap. I don't the exact calculation between the two. – Bobby Ocean Sep 02 '22 at 10:57
  • Sounds about right. What field of math, or what topic of math is this? I would like to dive deeper into that topic matter but I am lacking the required vocabulary to find a starting point. – Sim Sep 02 '22 at 11:01
  • Combinatorics is the field of mathematics about counting. Like counting all possible ways that a given property can be satisfied. – Bobby Ocean Sep 02 '22 at 11:07
  • Statistics is the field of mathematics about the effects of transformations on a random variable. Like what happens on average to a sample of random numbers as my sample size in grows? Or how much worse (in probability) is the sample median capable of estimating the population mean than the sample mean? – Bobby Ocean Sep 02 '22 at 11:07
  • Measure theory is the field of mathematics that establishes the validity and usage of the words "probability" and "random variable". – Bobby Ocean Sep 02 '22 at 11:08