How to compress a multi-dimensional unit vector?

Question

I would like to compress a multi dimensional unit vector to an integer [0, N), but with a guarantee that the restored vector and the original one, have angle at most a degrees. The goal is to make the number of required integers, N as small as possible.

For example, for 2D unit vector, and desired max angle of 60 degrees we can simply compress the vector to the nearest point on a circle-inscribed hexagon and use integers [0, 6) to encode the vertex.

Is there an easy way to achieve that in D dimensions? Perhaps https://en.wikipedia.org/wiki/Regular_polytope could be used? Approximate solutions are fine. Thanks a lot!

So in generalization of your example, you could be looking for some generator in $G\in SO(n)$ that is not cyclic or cyclic to a very large power so that $G^ne_1$, $n=0,...,N-1$, fills the unit sphere densely enough? This might not give a unique admissible approximation, and finding $n$ could also be non-trivial. — Lutz Lehmann, Feb 12 '22 at 08:21
A first important step in your issue is to consider the different ways to generate an almost-even distribution of points on an hypersphere. You can be interested by this reference: http://extremelearning.com.au/how-to-evenly-distribute-points-on-a-sphere-more-effectively-than-the-canonical-fibonacci-lattice/ for the 3D case — Jean Marie, Feb 12 '22 at 09:07
If still interested, see edited answer for code illustrating one method you might want to consider. — eigengrau, Feb 12 '22 at 22:08
What you are looking for is a net on the $D$-dimensional unit sphere. That is a set of point on the sphere so that any point on the sphere is not further away than angle $a$ from a point of the net, but that uses as few points as possible. Apparently, random lattices can be used to find such nets. Spherical codes might be another relevant keyword. — M. Winter, Feb 12 '22 at 23:52

eigengrau · Answer 1 · 2022-02-12T22:04:50.663

Not sure to what extent (if any) this answers/addresses your question. If you've already "compressed" the $2D$ case to $[1,n)$ like you illustrated, then the $3D$ case boils down to finding an $\mathbb N\times\mathbb N\to\mathbb N$ mapping (and its inverse), giving you one enumeration for two circles in, say, the $x,y-$plane and then in the orthogonal plane containing that $x,y-$unit vector. That mapping is canonically given by the pairing function, https://en.wikipedia.org/wiki/Pairing_function And then you can iterate that for $\mathbb N^n\to\mathbb N$ for higher-dimensional cases.

Or maybe even more straightforward, just use discrete $x_i,i=1\ldots D$ integer coordinates for the $D-$dimensional case (with some $\Delta x_i$'s of your choice for the discrete intervals). Then $n\in\mathbb N^D$ ($\sim\mathbb N$ by the iterated pairing function) specifies a point in that space. And just normalize it for a unit vector.

>>E d i t<<
-----------------------

If still interested, below's some code in C (I'll leave you to translate to javascript/python as needed) that accomplishes your "compression" pretty well. It's discussed a little further below the listing...

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* ==========================================================================

Function:    packcoords ( coords, ncoords )
Purpose:     pack coords[i],i=0...ncoords-1 into a single long word


Arguments:   coords (I)      int* to integer coords to be packed
         ncoords (I)     int containing #coords




Returns:     ( uint_64t )    usigned 64-bit int containing
                         single-word representation of coords




Notes:     o
======================================================================= */

/* --- entry point --- /
uint64_t packcoords ( int coords, int ncoords ) {
  /* ---

allocations and declarations
------------------------------- */

/* --- program parameters --- /
  static int maxrange = 1024,           / hival-loval<=maxrange /
             maxzero  = 1024;           / |loval|<=maxzero +1 bit for sign /
  / --- internal variables --- /
  uint64_t packed = 0ULL,               / packed coords[] back to caller /
           base   = 1ULL,               / current "base" /
           maxbase= (UINT64_MAX);       / divided by range, below /
  int   icoord= 0,                      / coords[] index /
        loval = (+9999999), hival = (-9999999), /low,high coords[] values/
        range = hival-loval,            / range of values to be packed /
        zero  = -loval;                 / added to each packed coords[] /
  / --- check caller's input --- /
  if ( coords==NULL || ncoords<1 ) goto end_of_job; /input error returns 0/
  / ---

find low,high values and set scaling accordingly
--------------------------------------------------- */

for ( icoord=0; icoord<ncoords; icoord++ ) {
    if ( coords[icoord] < loval ) loval=coords[icoord];   /* lowest coords[]/
    if ( coords[icoord] > hival ) hival=coords[icoord]; } /highest coords[]/
  range = hival-loval+2;                / range of values to be packed /
  zero  = -loval;                       / added to each packed coords[] /
  maxbase /= ((uint64_t)range);         / leaving "room" for last coord /
  if ( range>maxrange || abs(zero)>maxzero ) goto end_of_job; / sorry /
  / ---

pack caller's coords[]
------------------------- */

/* --- first pack zero and range so packed coords[] can be unpacked --- /
  if ( zero < 0 ) packed += base(1ULL); /* set "sign bit" for negative /
  base = 2ULL;                         /* "shift" base past sign bit /
  packed += base((uint64_t)abs(zero)); /* pack zero after sign bit /
  base = ((uint64_t)maxzero);          /* "shift" base past zero /
  packed += base((uint64_t)range);     /* pack range after zero /
  base = ((uint64_t)maxrange);         /* "shift" base past range /
  / --- now pack the coords[] --- /
  for ( icoord=0; icoord<ncoords; icoord++ ) {
    int thiscoord = coords[icoord] + zero + 1; / translate coord /
    packed += base((uint64_t)thiscoord); /* pack translated coord /
    if ( base > maxbase ) break;        / can't pack any more coords[] /
    base = ((uint64_t)range); }        /* "shift" base past coords[icoord]/
  / --- back to caller, returning packed coords[] --- /
  end_of_job:
    return ( packed );
  } / --- end-of-function packcoords() --- */
/* ==========================================================================

Function:    unpackcoords ( packed, coords )
Purpose:     unpack packed, returning coords[]


Arguments:   packed (I)      uint64_t containing output from packcoords()
         coords (O)      int * returning individual coords




Returns:     ( int )         #coords returned in coords[]


Notes:     o
======================================================================= */

/* --- entry point --- /
int unpackcoords ( uint64_t packed, int coords ) {
  /* ---

allocations and declarations
------------------------------- */

/* --- program parameters --- /
  static int maxrange = 1024,           / hival-loval<=maxrange /
             maxzero  = 1024;           / |loval|<=maxzero +1 bit for sign /
  / --- internal variables --- /
  int   ncoords = 0;                    / #coords returned to caller /
  int   range = 0,                      / range of packed values /
        zero  = 0,                      / subtracted from each coord /
        iszeronegative = 0;             / set true if zero<0 /
  / --- check caller's input --- /
  if ( coords == NULL ) goto end_of_job; / input error returns 0 /
  / ---

unpack caller's packed coords
-------------------------------- */

/* --- first unpack zero and range --- /
  iszeronegative = (packed)%(2ULL);     / first/low-order bit signals <0 /
  packed /= 2ULL;                       / "shift" out low-order bit /
  zero = (packed)%((uint64_t)maxzero);  / unpack zero value /
  if ( iszeronegative ) zero = (-zero); / set sign accordingly /
  packed /= ((uint64_t)maxzero);        / "shift" out maxzero /
  range = (packed)%((uint64_t)maxrange); / unpack maxrange value /
  packed /= ((uint64_t)maxrange);       / "shift" out maxrange /
  / --- now unpack the coords[] from packed --- /
  while ( packed > 0ULL ) {
    int thiscoord = (int)((packed)%((uint64_t)range)); /unpack next coord/
    coords[ncoords++] = thiscoord - zero - 1; / re-translate appropriately /
    packed /= ((uint64_t)range); }      / "shift" out this coord /
  / --- back to caller with unpacked coords[], returning #coords --- /
  end_of_job:
    return ( ncoords );
  } / --- end-of-function unpackcoords() --- */
#if defined(TESTDRIVE)
int main ( int argc, char argv[] ) {
  / ---

allocations and declarations
------------------------------- */

uint64_t packcoords(); int unpackcoords(); /* functions to be tested /
  uint64_t packed = 0ULL;       / packed coords[] /
  int   coords[99], ncoords=0,  / run as  ./packcoords  1 2 3 etc /
        iarg   = 0;             / indexes through your 1 2 3 argv[]'s /
  / --- accumulate input --- /
  if ( argc < 2 ) goto end_of_job;      / no command-line coord args /
  for ( iarg=1; iarg<argc; iarg++ )     / run through all argv[]'s /
    coords[ncoords++] = atoi(argv[iarg]); / and interpret as int coords /
  / --- pack user's coords[] and then unpack them --- /
  packed  =   packcoords(coords,ncoords); / pack the coords /
  ncoords = unpackcoords(packed,coords);  / and then unpack them /
  / --- display results --- /
  printf("Packed %d coords into &quot;%llu&quot;\nand unpacked...\n",ncoords,packed);
  for ( iarg=0; iarg<ncoords; iarg++ )
    printf(" %4d%s",coords[iarg],((iarg+1)%10==0||iarg+1==ncoords?"\n":" "));
  end_of_job:                   / testing completed... /
    exit ( 0 );                 / ...have a nice day /
  } / --- end-of-function main --- */
#endif

It basically uses the "JeanMarie" method rather than my original pairing function suggestion. This turns out to be much more compact if your range of coordinate values is small, because it first calculates range=hivalue-lovalue and uses that range as a base. And it also subtracts the smallest of your values from each value, so that your coordinates are translated from low-to-high to 0-to-range (with a slight little alteration). So all that can be somewhat more efficient than just choosing a constant number of bits for each coordinate value.

For example, save it as coordpack.c and compile as
cc -DTESTDRIVE coordpack.c -o coordpack
and then if you run it as, for example,
./coordpack 5 3 1 -2 4 4 2 6 -1 1 -1 2 3
the output is

bash-5.1$ ./coordpack 5 3 1 -2 4 4 2 6 -1 1 -1 2 3
Packed 13 coords into "13682439925725679620"
and unpacked...
    5     3     1    -2     4     4     2     6    -1     1
   -1     2     3

So it packed 13 numbers with a range of $6-(-2)=8$ into one 64-bit (unsigned long long) integer whose decimal value is $13682439925725679620$, as printed. That's about the max it can do for that range. It does a few checks to make sure your input isn't out-of-bounds, but it'll still get fooled and screw up -- needs a few more checks.

Sorry but your answer is too allusive. I don't see in particular how the pairing function brings something here... — Jean Marie, Feb 12 '22 at 10:14
Thanks, do you mean something like the euler angles but in higher dimensions? Maybe indeed quantizing each dimension would be the simplest - eg choosing for each dimension -1, 0, or 1 would give approx 3^D points. If we need less points then we can select only say K largest dimensions to quantize giving (K over N)*3^K. If we need more points then we can quantize to -1, 0.5, 0, 0.5, 1 or otherwise... I guess these points will be more or less "uniformly distributed"? — user1354439, Feb 12 '22 at 10:56
@JeanMarie It reduces the number of required integers to 1. Of course, that one integer typically assumes a pretty large value. But depending on your computer's internal representation, e.g., 64 bits, just one such integer may be enough. Of course, you could equally easily just use, say, two decimal digits of one large integer for "compression" if you only need values $0\ldots99$, e.g., represent the four integers $5,7,11,17$ as the one large integer $17110705$, or something like that. Lots of ways to skin this cat. — eigengrau, Feb 12 '22 at 15:17
@user1354439 Sure, uniformly distributed vis-a-vis a cartesian grid. And if you're only using a few discrete coordinate values in each dimension, as illustrated in your preceding comment, then I'd just use the way simpler "compression" method suggested in the last sentence of my above comment to JeanMarie. It's essentially just representing the $i$th coordinate as the $i$th digit of a base $100$ number. (And it's even easier to code than the pairing function.) — eigengrau, Feb 12 '22 at 15:25

How to compress a multi-dimensional unit vector?

1 Answers1