Hi,
This question is mainly directed to Gino van den Bergen as it is a little doubt from his book. However, anyone able to help is very welcome.
So, from page 59 I'm not understanding what Gino is trying to explain in the phrase that starts with "Assume". It's located right in the bottom of the page just before he finds the upper bound for the relative error for approximation of gamma. I don't want to break any copyright law, else, I would have copied it over here. There's no need to explain what's below the upper bound inequality.
Thank you!
Gino's Book - Robustness section
-
- Posts: 23
- Joined: Mon Jun 27, 2005 4:30 pm
- Location: Los Angeles
Re: Gino's Book - Robustness section
I'm not Gino, but perhaps I can explain anyway. Let's start by nothing that a typical floating point operation 'op' introduces an error along the lines of:omicron wrote:So, from page 59 I'm not understanding what Gino is trying to explain in the phrase that starts with "Assume".
fl(a op b) = (a op b)(1+e) where |e| <= u (where u is the machine epsilon).
Now consider the computation of e.g. a*b*c in floating point. This is equivalent to [((a*b)*(1+e1))*c]*(1+e2) = a*b*c*(1+e1)*(1+e2) = a*b*c*(1+e1+e2+e1*e2). This is just straightforward manipulation so far.
At this point, with |ei| <= u, we can simply say that 1+e1+e2+e1*e2 = 1+e1+e2+O(u^2) and we then promptly ignore the O(u^2) term because it is so small that it (typically) won't affect the result. That leaves us with:
a*b*c*(1+e1+e2)
as the value computed for the floating-point expression a*b*c. Gino's sentence on page 59 is in reference to dropping this O(u^2) term that I talked about above.
Hope that helps to clarify what's meant.
-
- Physics Researcher
- Posts: 22
- Joined: Mon Jun 27, 2005 9:28 am
- Location: Helmond, Netherlands
Re: Gino's Book - Robustness section
Well put. Thanks Christer.Christer Ericson wrote:Hope that helps to clarify what's meant.omicron wrote:So, from page 59 I'm not understanding what Gino is trying to explain in the phrase that starts with "Assume".
The only thing I'd like to add is that computations would still classify as 'fairly accurate' if |ei| <= C * u for C greater than 1. For example, for errors ei of 100 * u, the term e1 * e2 would still not be anywhere near the machine epsilon of single-precision floats and can be safely ignored in your analysis.
-
- Posts: 8
- Joined: Tue Jan 17, 2006 6:27 pm
Well, first off, thank you both for your help!
So, I wrote this little program to test cancellation when subtracting two nearly equal values.
The results are:
add = 0.86468714475631714;
sub = 5.96046644775390625e-8;
mul = 0.18692097067832947;
add_rel_error = 2.384185791015625e-7;
sub_rel_error = 1.7293744087219238;
mul_rel_error = 3.5763786865234375e-7;
Using exact arithmetic sub would be 4e-8 but with floating-point arithmetic it is quite far from it ( see above ). So, I guess that this turned out to be a catastrophic cancellation situation because both results ( from floating-point and exact arithmetic ) don't have any digits in common, therefore, the relative error is huge ( see above ), am I right ?
Thanks!
So, I wrote this little program to test cancellation when subtracting two nearly equal values.
Code: Select all
// Both delta0 and delta1 suffer from rounding
// errors because they're not representable in IEEE-754
float delta0 = 0.432343547394f;
float delta1 = 0.432343587394f;
// Relative errors from both delta0 and delta1
// are bounded by machine epsilon
float delta0_rel_error = FLT_EPSILON;
float delta1_rel_error = FLT_EPSILON;
float add = delta1 + delta0;
float sub = delta1 - delta0;
float mul = delta1 * delta0;
float add_rel_error = ( fabs( delta0 * delta0_rel_error ) + fabs( delta1 * delta1_rel_error ) ) / ( fabs( add ) ) + FLT_EPSILON;
float sub_rel_error = ( fabs( delta0 * delta0_rel_error ) + fabs( delta1 * delta1_rel_error ) ) / ( fabs( sub ) ) + FLT_EPSILON;
float mul_rel_error = fabs( delta0_rel_error ) + fabs( delta1_rel_error ) + FLT_EPSILON;
add = 0.86468714475631714;
sub = 5.96046644775390625e-8;
mul = 0.18692097067832947;
add_rel_error = 2.384185791015625e-7;
sub_rel_error = 1.7293744087219238;
mul_rel_error = 3.5763786865234375e-7;
Using exact arithmetic sub would be 4e-8 but with floating-point arithmetic it is quite far from it ( see above ). So, I guess that this turned out to be a catastrophic cancellation situation because both results ( from floating-point and exact arithmetic ) don't have any digits in common, therefore, the relative error is huge ( see above ), am I right ?
Thanks!
-
- Physics Researcher
- Posts: 22
- Joined: Mon Jun 27, 2005 9:28 am
- Location: Helmond, Netherlands
Correct. Note that IEEE 754 single-precision numbers have only 24 accurate bits (= +/- 7 decimal digits). In reality things may not be that bad (on Intel at least), since temporary values are often evaluated and stored at higher precisions (IA-32 float registers have 64-bit mantissas).omicron wrote:
Using exact arithmetic sub would be 4e-8 but with floating-point arithmetic it is quite far from it ( see above ). So, I guess that this turned out to be a catastrophic cancellation situation because both results ( from floating-point and exact arithmetic ) don't have any digits in common, therefore, the relative error is huge ( see above ), am I right ?
Thanks!