0

I am trying to look at how different floating points are stored in memory.

Firstly I looked at the System.Double (accessible by keyword Double in vb.net) which I think I understand. It is stored as follows:

$\pm (I+a)\times 2^b$

where

  • $\pm$ requires one bit.

  • $a \in [0,\sum_{k=1}^{52}2^{-k}]$ and consumes $52$ bits.

  • $b \in [-2^{10}+2, 2^{10}-1]$ and consumes the remaining $11$ bits.

So in total, this representation requires $64$ bits.

Note $11$ bits can represent a total of $2^{11}=2048$ distinct numbers while the set of possibilities of $b$ are $|[-2^{10}+2, 2^{10}-1]|=2046$. The remaining two binary representations of $All[0]:="00000000000"$ and $All[1]:="11111111111"$ are omitted to cover special cases of $\pm 0.$, $\pm \infty$ and different types of not a number, $NaN$.

  • $I$ is $1$ except for special cases of $b=All[0]$ where it is $0$ so doesn't consume any additional bit.

  • Like if $b=All[0]$ and $a$ also has all its bits $0$ then the resulting number represent $\pm0$.

  • Similarly, if $b=All[1]$ and $a$ has all its bits $0$ then the resulting number represents $\pm\infty$.

  • Similarly, if $b=All[1]$ and $a$ has some particular bits 1 then the resulting numbers represents various types of $NaN$.

  • Similarly, if $b=All[0]$ and $a$ has some particular bits 1 then the resulting $b$ is replaced by its minimum value $-2^{10}+2$ and since $I=0$, in this case, it allows for more closer numbers to $0.$ be represented (Subnormal numbers).

This is how I understand the entire Double (64bit floating point). Now when trying to understand the System.Decimal in the same spirit I am unable to decipher all the details.

I understand that decimal has the following structure:

$\pm a \times {10}^{-b}$

where $a$ is a 96 bit number so $a \in [0,2^{96}-1]$ while $b \in [0,28]$ is a scaling factor that decide where to put the decimal point. Thus it can store up to $28$ decimal places.

The documentation states that this number requires $128$ bits. Which means using the 96 bit for the $a$ and one bit for the sign, this scaling factor is using the remaining $128-96-1=31$ bits. Can somebody explain how this scaling factor is using the $31$ bits? It's presumingly has a base two representation of the scaling factor or something. Can someone who knows explain how the $31$ bits are used in this Decimal data type?

https://docs.microsoft.com/en-us/dotnet/visual-basic/language-reference/data-types/decimal-data-type

user13892
  • 231
  • 1
    I'm not sure a question about number representation for a specific programming language is appropriate here. Probably more of a question for StackOverflow? – Captain Lama Apr 09 '20 at 19:27
  • I agree with Captain Lama, This is a computer science question, not a math question, and should be asked on one of the many computer science related forums. That said, the point of the Decimal type is to calculate in decimal, not binary (thus avoiding certain rounding variances between people's expectations and how binary rounding occurs). This requires a representation by decimals, which results in some wastage of space. I.e. it takes more space to store the same number in decimal than it does in binary. – Paul Sinclair Apr 10 '20 at 03:35

0 Answers0