Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.
Questions tagged [floating-point]
465 questions
2
votes
2 answers
Floating point number,Mantissa,Exponent
In this computer, numbers are stored in $12$-bits. We will also assume
that for a floating point (real) number, $6$ bits of these bits are reserved for
the mantissa (or significand) with $2^{k-1}-1$ as the exponent bias (where
$k$ is the number of…
Wolverine
- 37
1
vote
1 answer
What is the set of all numbers that can be represented with a floating-point format?
Computers use single- (or, for more precise calculations, double-) precision floating-point formats to represent a subset of real numbers. While a decent chunk of real numbers can be stored with these formats, most real numbers including obviously…
SMMH
- 303
1
vote
1 answer
Floating Point Arithmetics
I have been experimenting with understanding floating-point arithmetic.
I have a 64-bit processor. I have asked Matlab to use format longe, which should display a floating-point with doubt precision.
I see that…
Celestina
- 1,146
1
vote
1 answer
How to subtract IEEE754 floating point?
I have two numbers represented in floating point:
$A: 10101001001110000000000000000000$
$B: 01000011011000000000000000000000$
For $A$ I know $e=82$ and for $B$, $e=134$ ($e$=exponent), but I don't know how to subtract these numbers in floating…
John D
- 545
1
vote
1 answer
The floating point function of Chopping. Absolute error and Relative error.
Consider a number $(x)_\beta$ :
$$x = \pm 0.d_1d_2 \ldots d_pd_{p+1}d_{p+2} \ldots \times \beta^E$$
The function $chop(x)$ considers only the first $p$ digits ignoring digits from $(p+1)$th to infinite. So the machine number $\tilde{x}$ is the…
JB-Franco
- 852
1
vote
3 answers
Representing $100001$ in floating point system
There's an example in my textbook about cancellation error that I'm not totally getting. It says that with a $5$ digit decimal arithmetic, $100001$ cannot be represented.
I think that's because when you try to represent it you get $1*10^5$, which is…
wzsun
- 89
1
vote
1 answer
Floating Point Numbers - Machine numbers
Consider the set of machine numbers $M(10, 2, 0)$. (The "zero-length" for the exponent is to be understood such that there is only the sign ± and 0 available for the exponent. We interpret "+" as "+1" and "−" as "−1". The available exponential…
NabbKitha
- 520
1
vote
0 answers
16-bit Floating point range of different values
Let's say there is a floating-point code that fit in 16 bits, with 1 bit for the sign and 4 bits for the exponent and the rest (11) for the significad. I've been able to find the range of normalized exponent, which I believe is [-6,7], by…
Andrew
- 133
1
vote
1 answer
How to add IEE754 half precision numbers
I'm getting stuck with an exercise on adding two IEE754 half precision numbers, the numbers are:
$1110001000000001$
$0000001100001111$
I have tried to solve it using this procedure:
Half precision is:
$1$ sign bit, $5$ bits exponent and $10$…
Giuseppe
- 11
1
vote
2 answers
Floating Point Calculation
Take the polynomial $x^2+(-4*10^3)x+2$.
In the floating-point system with $b=10$, $m=4$, $e=4$, if I wanted to find the roots using the quadratic formula what would be the values of the roots?
I got 3.999 as one of my roots and 1.000 as the other…
user153009
- 634
0
votes
1 answer
Accurate computation of KL divergence between binary RVs
I was wondering how one can compute the KL Divergence between two binary distributions (say, with parameters $p$ and $q$ and assume $p < \frac12$ and $q < \frac12$ for simplicity) accurately. The formula is clearly:
\begin{equation}
D(p,q) = p \log…
MikeL
- 627
0
votes
0 answers
Floating Point Precision Algorithm
In my database, data stored as a precision of 10 digits Decimal(30,10).
User can enter x or 1/x. I need to save in 1/x. If user enters 1310 it will be saved in database as 1/1310=0.0007633588. When I want to bring it back 1/0.0007633588=1309.999963…
0
votes
0 answers
Bias in Single Precision Floating numbers
I had a doubt regarding Single Precision Floating point numbers. It is about the bias number which can be derived from exponent part of this representation of numbers.
On searching up on google, most answers say that the bias number should just be…
crimsonKnight
- 162
0
votes
0 answers
How to Multiply 2 arrays with unique non-integers to prodice an array with unique results?
Is there an Algortihm/formulae to multiply two arrays (1D & 2D) of unique numbers such that the resultant array contains unique results.
Would one have to create the 2 initial arrays in a certain pattern in order to gaurantee unique results ?
How…
D.Price
- 1
0
votes
1 answer
Approach to address errors from float precision limits by its storage limit
Background
In computer, the limited float precision due to the storage limit e.g. 64 bit can cause problems. Trying to understand what approaches are available and being used to cope with or overcome the precision limitations.
The original issue is…
mon
- 250