1

My goal is to find a systematic way of generating a set $B = \{b_1,b_2,...b_N\}$ (size $N$) of binary numbers of length $L$ with a fixed amount of "ones" $O$ so that the difference from each to each number is maximized.

Example $1$:

Let $N=2$, $L=4$ and $O=2$, so we are searching for $2$ binary numbers of Length $4$ that both have $2$ ones in them. (With maximal difference)

Possible solutions would be $B = \{1010, 0101\}$ or $B = \{1100,0011\}$

Example $2$:

Let $N=6$, $L=4$ and $O=2$, so we are searching for $6$ binary numbers of Length $4$ that all have $2$ ones in them. (With maximal difference)

The only solution in this case would be $$B=\{1100, 1010, 1001, 0110, 0101, 0011\}$$

My path of finding a general solution for higher numbers of $L$,$N$, and $O$ led me to code theory and hamming cubes.

My problem thereafter gets boiled down to finding $N$ vertices on an $L$ dimensional hamming cube so that all chosen vertices are constrained to having $O$ ones and yield to a maximized hamming distance.

In this post (which states a quite similar problem) the top answer states that there is no other method than brute-forcing.

With my actual parameters $N=90$, $L=16$, $O=4$ there are $1.83121849778 \cdot 10^{154}$ different permutations to check and here brute force is not applicable.

Do you know of any other approaches to find one (not all) of the possible solutions?

Jyrki Lahtonen
  • 133,153

2 Answers2

3

In this area the standard terminology is $n$ (your $L$) for codeword length and $w$ (your $O$) for Hamming weight.

This is a well studied problem, very difficult in general, but constructions for particular cases may exist. There is also a vast literature. The specific topic constant weight codes. What you call the difference is called the minimum distance $d$ of a code.

Let $d=2k-1$ if $d$ is odd and $d=2k$ if $d$ is even.

The Johnson bound gives an upper bound to the maximum number $A(n,d,w)$ of codewords for a constant weight $w$ code of length $n:$ $$ A(n,d,w)\leq \left \lfloor \frac{n}{w} \left\lfloor \frac{n-1}{w-1} \left \lfloor \cdots \left \lfloor \frac{n-w+k}{k}\right \rfloor \cdots \right \rfloor \right\rfloor \right \rfloor. $$

Clearly $A(n,2w,w)=\lfloor \frac{n}{w} \rfloor$ since if you could keep the support of the ones all distinct for each codeword you'd get distance $d=2w.$

Applying Johnson bound, with $n=16,w=4$ (please check my computations--thanks @JyrkiLahtonen) gives $$ A(16,6,4)=20. $$ See the nice construction by Jyrki in the comments achieving this. So no code achieving your requirement of 80 codewords exists.

There are other tabulated bounds online (for specific cases better than Johnson bound) but require a lot of trawling through (which you may want to do). I have linked to one of those sites by Erik Agrell here.

kodlu
  • 9,338
  • +1 for explaining the Johnson bound to the OP (and for the link to a table). – Jyrki Lahtonen Jun 28 '21 at 03:08
  • Also adding the remark that the minimum distance of a binary constant weight code is always even. – Jyrki Lahtonen Jun 28 '21 at 03:12
  • The Johnson bound $A(16,6,4)=20$ is achieved by the twenty lines in the affine plane $\Bbb{F}_4^2$. The lines have five possible slopes, the four elements of the field and the infinite slope. Each line comes with four parallel copies of itself for a total of $5\cdot4=20$ lines. No two lines can intersect at more than one point, so at most a single shared $1$, and hence minimum distance $d=6$. – Jyrki Lahtonen Jun 28 '21 at 04:34
  • 1
    Thanks for explaining the correct terminology and the Johnson Bound. Telling me that no codeword with my requirements exists sparked a new thought process about the underlying problem I try to solve. Great answer! – Freakwave Jun 28 '21 at 07:12
  • I am glad it led to you looking at the problem with fresh eyes. You can accept the answer if you like. – kodlu Jun 28 '21 at 07:14
1

As explained by @kodlu you are looking to maximixe the minimum distance of a length $n=16$ code of a constant weight $w=4$. The Johnson bound is tight in this case. The fact that $n$ happens to be a power of two plays a big role.

It is impossible that your bit sequences would have a maximum of one shared $1$ (corresponding to a minimum Hamming distance of $d=6$). The Johnson bound gives $$ A(16,6,4)=\lfloor\frac{16}4\lfloor\frac{15}3\rfloor\rfloor=20 $$ as the maximum size, but you wanted $90$.

So you need to allow overlaps of size two (corresponding to $d=4$), when Johnson gives $$ A(16,4,4)=\lfloor\frac{16}4\lfloor\frac{15}3\lfloor\frac{14}2\rfloor\rfloor\rfloor=140 $$ as the maximum size.

In general there is no guarantee that the Johnson bound could be achieved, but here it is easy to describe a set of size $140$.

Think of the $16$ bit positions as indexed by four bits, or points in the vector space $V=\{0,1\}^4$ over the field $\Bbb{Z}_2$. We can choose two non-zero vectors, call them $x$ and $y$, from $V$ in $\binom {15}2=210$ ways. We can then form the 2-dimensional subspace they generate: $\langle (x,y)\rangle=\{0,x,y,x+y\}$. Observe that the six pairs $(x,y)$, $(y,x)$, $(x,y+x)$, $(y+x,x)$, $(y,y+x)$, $(y+x,y)$ all generate the same subspace. So the total number of 2-dimensional subspaces is $210/6=35$.

Each such subspace has four distinct parallel translations of itself (called cosets, if you know a bit of algebra), giving us a total of $140$ cosets of 2-dimensional subspaces, also known as 2-dimensional affine subspaces of $V$.

The point is that two distinct 2-dimensional subspaces can intersect in at most a 1-dimensional affine subspace. Or, very conveniently for us, two affine 2-dimensional subspaces can share at most two points. This means that we have constructed a constant weight code of length $n=16$, $w=4$ and minimum distance $d=4$ containing a maximum possible of $N=140$ words.


A few examples of codewords $b_i$.

Let $x=1000$ and $y=0100$. Then $\langle(x,y)\rangle=\{0000,1000,0100,1100\}$. So this 2-dimensional subspace consists of those 4-tuples that end with two zeros. This means that its cosets are easy to describe: $$ \begin{aligned} 0001+\langle(x,y)\rangle)&=\{0001,1001,0101,1101\},\\ 0010+\langle(x,y)\rangle)&=\{0010,1010,0110,1110\},\\ 0011+\langle(x,y)\rangle)&=\{0011,1011,0111,1111\}. \end{aligned} $$ Observe that, between them. these four cosets cover all the sixteen 4-tuples of bits. This happens always with cosets of a subspace (consult algebra books again).

Equating sequences of four bits with the integers $0\ldots15$ in the obvious way associate these affine spaces with binary numbers as follows $$ \begin{aligned} \{0000,1000,0100,1100\}&\to\{0,8,4,12\}&\to 0001\,0001\,0001\,0001,\\ \{0001,1001,0101,1101\}&\to\{1,9,5,13\}&\to 0010\,0010\,0010\,0010,\\ \{0010,1010,0110,1110\}&\to\{2,10,6,14\}&\to 0100\,0100\,0100\,0100,\\ \{0011,1011,0111,1111\}&\to\{3,11,7,15\}&\to 1000\,1000\,1000\,1000.\\ \end{aligned} $$ If, instead, we use the subspace $\langle(1000,0001)\rangle=\{0000,1000,0001,1001\}$, its cosets yield the 16-tuples $$ \begin{aligned} \{0000,0001,1000,1001\}&\to\{0,1,8,9\}&\to 0000\,0011\,0000\,0011,\\ \{0010,0011,1010,1011\}&\to\{2,3,10,11\}&\to 0000\,1100\,0000\,1100,\\ \{0100,0101,1100,1101\}&\to\{4,5,12,13\}&\to 0011\,0000\,0011\,0000,\\ \{0110,0111,1110,1111\}&\to\{6,7,14,15\}&\to 1100\,0000\,1100\,0000.\\ \end{aligned} $$ You see how each 16-tuple in the second group share two ones with two 16-tuples from the first group. It is also possible that a 16-tuple from one group shares a single one with each 16-tuple from yet another group. But there will never be more than two shared ones.


If you want to read/search more, the construction I gave can be described as consisting of weight four words of the binary Reed-Muller code $RM(2,4)$.

Jyrki Lahtonen
  • 133,153
  • Thanks for clarifying the Johnson bound with overlaps. Your examples for construction really helped as well. – Freakwave Jun 28 '21 at 07:20
  • @Freakwave Digging more stuff about RM(2,4) may give you a copy/pastable algorithm for generating the codewords. You get a set of size 140 by tossing aside all the 16-tuples with weights $\neq4$. Depending, of course, on how comfortable you are with the algebra of my construction. One systematic way of using the idea of my answer is to use an ordering of combos of 4 bits, and only treat those where $x<y<x+y$. Here the plus in $x+y$ is the bitwise XOR, but you probably knew about that already. – Jyrki Lahtonen Jun 28 '21 at 07:59