Questions tagged [floating-point]

Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.

465 questions
2
votes
2 answers

Floating point number,Mantissa,Exponent

In this computer, numbers are stored in $12$-bits. We will also assume that for a floating point (real) number, $6$ bits of these bits are reserved for the mantissa (or significand) with $2^{k-1}-1$ as the exponent bias (where $k$ is the number of…
1
vote
1 answer

What is the set of all numbers that can be represented with a floating-point format?

Computers use single- (or, for more precise calculations, double-) precision floating-point formats to represent a subset of real numbers. While a decent chunk of real numbers can be stored with these formats, most real numbers including obviously…
SMMH
  • 303
1
vote
1 answer

Floating Point Arithmetics

I have been experimenting with understanding floating-point arithmetic. I have a 64-bit processor. I have asked Matlab to use format longe, which should display a floating-point with doubt precision. I see that…
Celestina
  • 1,146
1
vote
1 answer

How to subtract IEEE754 floating point?

I have two numbers represented in floating point: $A: 10101001001110000000000000000000$ $B: 01000011011000000000000000000000$ For $A$ I know $e=82$ and for $B$, $e=134$ ($e$=exponent), but I don't know how to subtract these numbers in floating…
John D
  • 545
1
vote
1 answer

The floating point function of Chopping. Absolute error and Relative error.

Consider a number $(x)_\beta$ : $$x = \pm 0.d_1d_2 \ldots d_pd_{p+1}d_{p+2} \ldots \times \beta^E$$ The function $chop(x)$ considers only the first $p$ digits ignoring digits from $(p+1)$th to infinite. So the machine number $\tilde{x}$ is the…
JB-Franco
  • 852
1
vote
3 answers

Representing $100001$ in floating point system

There's an example in my textbook about cancellation error that I'm not totally getting. It says that with a $5$ digit decimal arithmetic, $100001$ cannot be represented. I think that's because when you try to represent it you get $1*10^5$, which is…
wzsun
  • 89
1
vote
1 answer

Floating Point Numbers - Machine numbers

Consider the set of machine numbers $M(10, 2, 0)$. (The "zero-length" for the exponent is to be understood such that there is only the sign ± and 0 available for the exponent. We interpret "+" as "+1" and "−" as "−1". The available exponential…
NabbKitha
  • 520
1
vote
0 answers

16-bit Floating point range of different values

Let's say there is a floating-point code that fit in 16 bits, with 1 bit for the sign and 4 bits for the exponent and the rest (11) for the significad. I've been able to find the range of normalized exponent, which I believe is [-6,7], by…
Andrew
  • 133
1
vote
1 answer

How to add IEE754 half precision numbers

I'm getting stuck with an exercise on adding two IEE754 half precision numbers, the numbers are: $1110001000000001$ $0000001100001111$ I have tried to solve it using this procedure: Half precision is: $1$ sign bit, $5$ bits exponent and $10$…
1
vote
2 answers

Floating Point Calculation

Take the polynomial $x^2+(-4*10^3)x+2$. In the floating-point system with $b=10$, $m=4$, $e=4$, if I wanted to find the roots using the quadratic formula what would be the values of the roots? I got 3.999 as one of my roots and 1.000 as the other…
0
votes
1 answer

Accurate computation of KL divergence between binary RVs

I was wondering how one can compute the KL Divergence between two binary distributions (say, with parameters $p$ and $q$ and assume $p < \frac12$ and $q < \frac12$ for simplicity) accurately. The formula is clearly: \begin{equation} D(p,q) = p \log…
MikeL
  • 627
0
votes
0 answers

Floating Point Precision Algorithm

In my database, data stored as a precision of 10 digits Decimal(30,10). User can enter x or 1/x. I need to save in 1/x. If user enters 1310 it will be saved in database as 1/1310=0.0007633588. When I want to bring it back 1/0.0007633588=1309.999963…
0
votes
0 answers

Bias in Single Precision Floating numbers

I had a doubt regarding Single Precision Floating point numbers. It is about the bias number which can be derived from exponent part of this representation of numbers. On searching up on google, most answers say that the bias number should just be…
0
votes
0 answers

How to Multiply 2 arrays with unique non-integers to prodice an array with unique results?

Is there an Algortihm/formulae to multiply two arrays (1D & 2D) of unique numbers such that the resultant array contains unique results. Would one have to create the 2 initial arrays in a certain pattern in order to gaurantee unique results ? How…
0
votes
1 answer

Approach to address errors from float precision limits by its storage limit

Background In computer, the limited float precision due to the storage limit e.g. 64 bit can cause problems. Trying to understand what approaches are available and being used to cope with or overcome the precision limitations. The original issue is…
mon
  • 250