Real numbers and numerical precision: Difference between revisions

From phys660
Jump to navigationJump to search
No edit summary
Line 47: Line 47:
<math> f(x) = \frac{0.59999 \times 10^{-2}}{1+0.99998}=\frac{0.59999 \times 10^{-2}}{1.99998}=0.30000 \times 10^{-2} </math>
<math> f(x) = \frac{0.59999 \times 10^{-2}}{1+0.99998}=\frac{0.59999 \times 10^{-2}}{1.99998}=0.30000 \times 10^{-2} </math>


which is also the exact result. In the first expression, due to our choice of precision, we have only one relevant digit in the numerator, after the subtraction. This leads to a loss of precision and a wrong result due to a cancellation of two nearly equal numbers. If we had chosen a precision of six leading digits, both expressions yield the same answer. If we were to evaluate <math> x \sim \pi </math>, then the second expression for <math> f(x) </math> can lead to potential losses of precision due to cancellations of nearly equal numbers.
which is also the exact result. In the first expression, due to our choice of precision, we have only one relevant digit in the numerator, after the subtraction. '''This leads to a loss of precision and a wrong result due to a cancellation of two nearly equal numbers'''. If we had chosen a precision of six leading digits, both expressions yield the same answer. If we were to evaluate <math> x \sim \pi </math>, then the second expression for <math> f(x) </math> can lead to potential losses of precision due to cancellations of nearly equal numbers.


This simple example demonstrates the loss of numerical precision due to roundoff errors, ''where the number of leading digits is lost in a subtraction of two near equal numbers''. The lesson to be drawn is that we cannot blindly compute a function. We will always need to carefully analyze our algorithm in the search for potential pitfalls. There is no magic recipe however, the only guideline is an understanding of the fact that a '''machine cannot represent correctly all numbers.'''
This simple example demonstrates the loss of numerical precision due to roundoff errors, ''where the number of leading digits is lost in a subtraction of two near equal numbers''. The lesson to be drawn is that we cannot blindly compute a function. We will always need to carefully analyze our algorithm in the search for potential pitfalls. There is no magic recipe however, the only guideline is an understanding of the fact that a '''machine cannot represent correctly all numbers.'''

Revision as of 22:39, 19 February 2012

Introduction

An important aspect of computational physics is the numerical precision involved. To design a good algorithm, one needs to have a basic understanding of propagation of inaccuracies and errors involved in calculations. There is no magic recipe for dealing with underflow, overflow, accumulation of errors and loss of precision, and only a careful analysis of the functions involved can save one from serious problems.

Since we are interested in the precision of the numerical calculus, we need to understand how computers represent real and integer numbers. Most computers deal with real numbers in the binary system, or octal and hexadecimal, in contrast to the decimal system that we humans prefer to use. The binary system uses 2 as the base, in much the same way that the decimal system uses 10. Since the typical computer communicates with us in the decimal system, but works internally in e.g., the binary system, conversion procedures must be executed by the computer, and these conversions involve hopefully only small roundoff errors

Computers are also not able to operate using real numbers expressed with more than a fixed number of digits, and the set of values possible is only a subset of the mathematical integers or real numbers. The so-called word length we reserve for a given number places a restriction on the precision with which a given number is represented. This means in turn, that for example floating numbers are always rounded to a machine dependent precision, typically with 6-15 leading digits to the right of the decimal point. Furthermore, each such set of values has a processor-dependent smallest negative and a largest positive value. Why do we at all care about rounding and machine precision?

Example: Loss of precision in subtracting nearly equal numbers

Assume that we can represent a floating number with a precision of 5 digits only to the right of the decimal point. This is nothing but a mere choice of ours, but mimicks the way numbers are represented in the machine. Then we try to evaluate the function

for small values of . Note that we can also rewrite this expression by multiplying the denominator and numerator with to obtain the equivalent expression

.

If we now choose (in radians), our choice of precision results in

Failed to parse (syntax error): {\displaystyle \sin(0.007) \approx 0.59999 \times 10^{−2} } ,

and

. The first expression for results in

while the second expression results in

which is also the exact result. In the first expression, due to our choice of precision, we have only one relevant digit in the numerator, after the subtraction. This leads to a loss of precision and a wrong result due to a cancellation of two nearly equal numbers. If we had chosen a precision of six leading digits, both expressions yield the same answer. If we were to evaluate , then the second expression for can lead to potential losses of precision due to cancellations of nearly equal numbers.

This simple example demonstrates the loss of numerical precision due to roundoff errors, where the number of leading digits is lost in a subtraction of two near equal numbers. The lesson to be drawn is that we cannot blindly compute a function. We will always need to carefully analyze our algorithm in the search for potential pitfalls. There is no magic recipe however, the only guideline is an understanding of the fact that a machine cannot represent correctly all numbers.

Theory: Representation of real numbers in digital computers

Real numbers are stored with a decimal precision (or mantissa) and the decimal exponent range. The mantissa contains the significant figures of the number (and thereby the precision of the number). A number like in the decimal representation is given in a binary representation by

Failed to parse (syntax error): {\displaystyle (1001.11101)_2 = 1 \times 2^3 + 0 \times 2^2 + 0 \times 21 + 1 \times 20 + 1 \times 2 − 1+ 1 \times 2 − 2+1 \times 2 − 3 + 0 \times 2− 4 +1 \times 2^{-5} }