Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.
Questions tagged [floating-point]
465 questions
0
votes
1 answer
Max Mantissa $2^{bits}-1$
if we look at a $5$ bit mantissa, the max value will be $11111$ which is $2^5-1$, Why is it in the form of $2^{bits}-1$ is it a combinatorial explanation?
newhere
- 3,115
0
votes
0 answers
How many discrete points can be expressed in $[-1/2, 1/2)$with IEEE 754 double floats and what is the meaning of precision?
I know that IEEE754 double floats (64-bit floating number) is known to provide 52 bits of precision (or 53 bits including implicit 1).
But I do not know the exact meaning of the precision.
Suppose we want to approximate a rational number $v$ using…
user9414424
- 111
0
votes
1 answer
Floating-point arithmetic and loss of precision: Shifting mantissa until exponents match
My book says the following about floating-point arithmetic involving the addition/subtraction of two numbers, $x$ and $y$, that differ in their exponent:
In adding or subtracting two floating-point numbers, their exponents must match before their…
Aleksandr Hovhannisyan
- 2,983
- 4
- 34
- 59
0
votes
1 answer
What is the purpose of regime bits in posit encoding?
Why do we need regime bits in posit?
posit encoding:
kevin998x
- 123
0
votes
0 answers
Why Cardinality not the same for rounded floats?
I need to solve a question:
Two float values are equivalent if they return same integer with Math.round(). Why the equivalence classes arising from this equivalence relationship does not have the same cardinality?
I was thinking that cardinality…
MichiganMagician
- 157
0
votes
1 answer
How to calculate a floating point of negative number?
In IEEE double precision n=53. So to represent 16 I can do the following:
The next biggest number from $16=+(.10 \dots 01)_22^5=2^{-1}2^5+2^{-53}2^5=16+2^{-48}$
Now the biggest number from $-16=-16+2^{-49}$, but how to show this formally like i did…
ASROMA
- 579
0
votes
1 answer
conversion of floating point to 8- bit binary word
What is the representation of $-(52.625)_{10}$ in 8 bit word? In 8 bit word , first bit represents the sign of the no. The next three bits represents the exponent and the last 4 bits the mantissa..Now since there is no bits to represent the sign…
shadow kh
- 953
- 5
- 15
0
votes
2 answers
How to convert 601.0 to IEEE-754 Single Precision
I am trying to understand how to convert from decimal to IEEE-754 Single Precision binary representation.
I make up a random number which happen to be 601.00
I tried my best to figure it out and this is what I got:
Step 01: I divided 601 by 9 (since…
malhobayyeb
- 421
0
votes
0 answers
Calculate the largets and smallest number fiven exp, bias, and fract
Give 8 bits total, where 3 bits are exp bits, 4 bits are frac bits and 1 for sign bit. Have to find the largest and smallest values.
0 110 1111 - largest
1 110 1111 - smallest
1) E = exponent - Bias = 6 - 3 =3
2) M = 1 + f = 1 + 1/2 + 1/4 + 1/8 +…
Inferno
- 11
0
votes
1 answer
Closest number to 1.22
Given:
1 bit for sign
3 bits for exp
4 bits for fract
How to find the closest floating-point number to 1.22 ??
Inferno
- 11
0
votes
0 answers
Whether assigning of single precision IEEE754 float to double is reversible?
Within scope of IEEE754 standard let's assign single precision variable s to double precision variable d and then assign d to single precision variable s'.
Whether this operation is reversible(lossless) for any value that can be represented in…
Vlad
- 129
0
votes
2 answers
16.75 How to convert to floating point representation?
16.75 convert to base 2 floating point representation.
Need help on formula, Thanks.
0
votes
1 answer
can anyone please explain how 2.8 modulo 2 is 0.7999999999999998?
I am not a mathematician, and just started programming in javascript and wonders how 2.8 % 2 = 0.7999999999999998.
Note: I know it is remainder operation.
May be I forgot my school mathematics concepts. Can anyone please explain this?
Thanks.
jsingh
- 31
0
votes
1 answer
biased exponent vs unbiased exponent
Following IEEE-754, I am looking for an example that shows processing an unbiased floating point representation is harder than processing a biased one. All I see in the texts is that unbiased numbers have to be compared with a negative system…
mahmood
- 223
0
votes
1 answer
Precision and Accuracy
How would I go about calculating the precision and accuracy of a given number?
For example
0.05 has an accuracy of 2 and a precision of 3.
1 has an accuracy of 0 and a precision of 1.
Is there an algorithm for calculating this?
cxdf
- 111