Help with calculating relative error in approximation of x

Question

So i have

$$x=\displaystyle\sum_{k=1}^{\infty}2^{-k}+\displaystyle\sum_{k=0}^{\infty}2^{-6k-1}$$

and i need to calculate relative error when approximating above x in $$MARC-32 \ \dots P(2,24,-127,127)$$ Hopefully the notation is understandable, basically it is a 32-bit computer that has base 2, significand is 24, exponents are in an interval [-127,127]

I know i will have to calculate: $$x_{-}, x_{+}, fl(x)$$ which are essential for calculating the relative error, but i believe that is not the hardest part(i used a notation by which i normally do this, so i apologize if it's not known by everyone)

I am stuck in changing the x in binary. I would appreciate the help.

Wouldn't the exponents normally be in the interval $[-128,127]$ if you are using 8-bit signed notation for it? Also are you assuming the significand has a leading one or does that need to be stored? I.e. it is a normalized significand? (I could find no references in google to MARC-32.) — Ian Miller, Apr 09 '16 at 07:51
On my university it was used as a standard(MARC-32), or possibly in my country. I am assuming that it has a leading one i presume. That's why i wrote down the notation how i rememembered using it. — MathIsTheWayOfLife, Apr 09 '16 at 07:58
I am familiar with the IEEE standard for 32-bit. Your description of MARC-32 sounded slightly different so I was wondering if you could provide a reference. — Ian Miller, Apr 09 '16 at 08:07
I think it's possible to find it on webpages in my university — MathIsTheWayOfLife, Apr 09 '16 at 08:18
@IanMiller you are right, it is IEEE standard, but we assume significand has a leading one. I think we just used MARC-32 just in our university, my bad. — MathIsTheWayOfLife, Apr 09 '16 at 08:24

score 1 · Accepted Answer · edited Apr 13 '17 at 12:20

The storage of this value will depend upon how you calculate it. If you are getting the computer to increase $k$ by one (through some sort of loop) and add the values to a total it will get a different answer to if you calculate the answer mathematically then try to store that.

Option 1 Do the math first then store the answer. Mathematically $x=\frac{95}{63}$. If you use the process in my answer to your other question you'll get:

$$\frac{95}{63}=1.100000100000100001000001...$$

The 25th bit after the binary point is a one so as we can only 24 bits after the point we need to work out if we round up or down. As it is a one we round up the 24th bit.

As we have a leading one the 24 bits of the significant are therefore: 1000010000010000100001. The exponent is clearly zero. The stored value is equal to: $$1+2^{-1}+2^{-7}+2^{-13}+2^{-19}+2^{-24}=\frac{25298977}{16777216}$$

This is slightly more than the value of $x$ by: $\frac{31}{1056964608}\approx3\times10^{-8}$.

Option 2 You get the computer to work it out using a loop similar to:

for (k=0 to 50) //50 is big enough to use all bits
  total = total +2^-k+2^(-6*k-1)

After the 4th loop the second term doesn't contribute anything as the terms $2^{-25}$ onwards are smaller than can be stored in total which is of the order of 1.

Carrying the loop out will end up storing you the value: $1.100000100000100000011111$ which is equal to: $$\frac{25298975}{16777216}$$ The difference comes because all values equal to or lower that $2^{-25}$ can not be added on.

This will give a value below the real value of $x$ by: $\frac{95}{1056964608}\approx9\times10^{-8}$.

I understand the given format as a theoretical formulation of 32 bit float, i.e., the 24 bit of the mantissa/significant include the leading hidden bit, so there are only 23 bits after the point used. — Lutz Lehmann, Apr 10 '16 at 18:57
In IEEE standards (and how I read the OP's comment for his system) the leading bit is an implied one so doesn't need to be stored giving you 24 bits after the binary point. — Ian Miller, Apr 10 '16 at 23:48
I read his last comment that the first bit of the 24 is the fixed 1 and the others are the variable 23 bit of the IEEE format. — Lutz Lehmann, Apr 11 '16 at 14:55

score 0 · Answer 2 · answered Apr 09 '16 at 09:33

0

Remember that $$ \sum_{k=1}^\infty2^{-k}=\frac12·\frac1{1-\frac12}=1 $$ to get the bit sequence directly from the definition of $x$. In the significant, bit 23, 24, 25 are zero, bit 26 would be 1.

answered Apr 09 '16 at 09:33

Lutz Lehmann

126,666

Help with calculating relative error in approximation of x

2 Answers2