## Binary, Floats, and Modern Computers

I have been reading a lot about floats and computer-processed floating-point operations. The biggest question I see when reading about them is why are they so inaccurate? I understand this is because binary cannot accurately represent all real numbers, so the numbers are rounded to the 'best' approximation.

My question is, knowing this, why do we still use binary as the base for computer operations? Surely using a larger base number than 2 would increase the accuracy of floating-point operations exponentially, would it not?

What are the advantages of using a binary number system for computers as opposed to another base, and has another base ever been tried? Or is it even possible?

Computers are built on transistors, which have a "switched on" state, and a "switched off" state. This corresponds to high and low voltage. Pretty much all digital integrated circuits work in this binary fashion.

Ignoring the fact that transistors just simply work this way, using a different base (e.g. base 3) would require these circuits to operate at an intermediate voltage state (or several) as well as 0V and their highest operating voltage. This is more complicated, and can result in problems at high frequencies - how can you tell whether a signal is just transitioning between 2V and 0V, or actually at 1V?

When we get down to the floating point level, we are (as nhahtdh mentioned in their answer) mapping an infinite space of numbers down to a finite storage space. It's an absolute guarantee that we'll lose some precision. One advantage of IEEE floats, though, is that the precision is relative to the magnitude of the value.

Update: You should also check out Tunguska, a ternary computer emulator. It uses base-3 instead of base-2, which makes for some interesting (albeit mind-bending) concepts.

## printing float, preserving precision

I am writing a program that prints floating point literals to be used inside another program.

How many digits do I need to print in order to preserve the precision of the original float?

Since a float has `24 * (log(2) / log(10)) = 7.2247199` decimal digits of precision, my initial thought was that printing 8 digits should be enough. But if I'm unlucky, those `0.2247199` get distributed to the left and to the right of the 7 significant digits, so I should probably print 9 decimal digits.

Is my analysis correct? Is 9 decimal digits enough for all cases? Like `printf("%.9g", x);`?

Is there a standard function that converts a float to a string with the minimum number of decimal digits required for that value, in the cases where 7 or 8 are enough, so I don't print unnecessary digits?

Note: I cannot use hexadecimal floating point literals, because standard C++ does not support them.

In order to guarantee that a binary->decimal->binary roundtrip recovers the original binary value, IEEE 754 requires

``````
The original binary value will be preserved by converting to decimal and back again using:[10]

5 decimal digits for binary16
9 decimal digits for binary32
17 decimal digits for binary64
36 decimal digits for binary128

For other binary formats the required number of decimal digits is

1 + ceiling(p*log10(2))

where p is the number of significant bits in the binary format, e.g. 24 bits for binary32.
``````

In C, the functions you can use for these conversions are snprintf() and strtof/strtod/strtold().

Of course, in some cases even more digits can be useful (no, they are not always "noise", depending on the implementation of the decimal conversion routines such as snprintf() ). Consider e.g. printing dyadic fractions.

## Why does C# allow an *implicit* conversion from Long to Float, when this could lose precision?

A similar question Long in Float, why? here does not answer what I am searching for.

C# standard allows implicit conversion from long to float. But any long greater than 2^24 when represented as a float is bound to lose its 'value'. C# standard clearly states that long to float conversion may lose 'precision' but will never lose 'magnitude'.

My Questions are
1. In reference to integral types what is meant by 'precision' and 'magnitude'. Isn't number n totally different from number n+1 unlike real numbers where 3.333333 and 3.333329 may be considered close enough for a calculation (i.e. depending on what precision programmer wants)
2. Isn't allowing implicit conversion from long to float an invitation to subtle bugs as it can lead a long to 'silently' lose value (as a C# programmer I am accustomed to compiler doing an excellent job in guarding me against such issues)

So what could have been the rationale of C# language design team in allowing this conversion as implicit? What is it that I am missing here that justifies implicit conversion from long to float?

In general, floating point numbers don't represent many numbers exactly. By their nature they are inexact and subject to precision errors. It really doesn't add value to warn you about what is always the case with floating point.

## Round-twice error in .NET's Double.ToString method

Mathematically, consider for this question the rational number

``````8725724278030350 / 2**48
``````

where `**` in the denominator denotes exponentiation, i.e. the denominator is `2` to the `48`th power. (The fraction is not in lowest terms, reducible by 2.) This number is exactly representable as a `System.Double`. Its decimal expansion is

``````31.0000000000000'49'73799150320701301097869873046875 (exact)
``````

where the apostrophes do not represent missing digits but merely mark the boudaries where rounding to 15 resp. 17 digits is to be performed.

Note the following: If this number is rounded to 15 digits, the result will be `31` (followed by thirteen `0`s) because the next digits (`49...`) begin with a `4` (meaning round down). But if the number is first rounded to 17 digits and then rounded to 15 digits, the result could be `31.0000000000001`. This is because the first rounding rounds up by increasing the `49...` digits to `50 (terminates)` (next digits were `73...`), and the second rounding might then round up again (when the midpoint-rounding rule says "round away from zero").

(There are many more numbers with the above characteristics, of course.)

Now, it turns out that .NET's standard string representation of this number is `"31.0000000000001"`. The question: Isn't this a bug? By standard string representation we mean the `String` produced by the parameterles `Double.ToString()` instance method which is of course identical to what is produced by `ToString("G")`.

An interesting thing to note is that if you cast the above number to `System.Decimal` then you get a `decimal` that is `31` exactly! See this Stack Overflow question for a discussion of the surprising fact that casting a `Double` to `Decimal` involves first rounding to 15 digits. This means that casting to `Decimal` makes a correct round to 15 digits, whereas calling `ToSting()` makes an incorrect one.

To sum up, we have a floating-point number that, when output to the user, is `31.0000000000001`, but when converted to `Decimal` (where 29 digits are available), becomes `31` exactly. This is unfortunate.

Here's some C# code for you to verify the problem:

``````static void Main()
{
const double evil = 31.0000000000000497;
string exactString = DoubleConverter.ToExactString(evil); // Jon Skeet, http://csharpindepth.com/Articles/General/FloatingPoint.aspx

Console.WriteLine("Exact value (Jon Skeet): {0}", exactString);   // writes 31.00000000000004973799150320701301097869873046875
Console.WriteLine("General format (G): {0}", evil);               // writes 31.0000000000001
Console.WriteLine("Round-trip format (R): {0:R}", evil);          // writes 31.00000000000005

Console.WriteLine();
Console.WriteLine("Binary repr.: {0}", String.Join(", ", BitConverter.GetBytes(evil).Select(b => "0x" + b.ToString("X2"))));

Console.WriteLine();
decimal converted = (decimal)evil;
Console.WriteLine("Decimal version: {0}", converted);             // writes 31
decimal preciseDecimal = decimal.Parse(exactString, CultureInfo.InvariantCulture);
Console.WriteLine("Better decimal: {0}", preciseDecimal);         // writes 31.000000000000049737991503207
}
``````

The above code uses Skeet's `ToExactString` method. If you don't want to use his stuff (can be found through the URL), just delete the code lines above dependent on `exactString`. You can still see how the `Double` in question (`evil`) is rounded and cast.

OK, so I tested some more numbers, and here's a table:

``````  exact value (truncated)       "R" format         "G" format     decimal cast
-------------------------  ------------------  ----------------  ------------
6.00000000000000'53'29...  6.0000000000000053  6.00000000000001  6
9.00000000000000'53'29...  9.0000000000000053  9.00000000000001  9
30.0000000000000'49'73...  30.00000000000005   30.0000000000001  30
50.0000000000000'49'73...  50.00000000000005   50.0000000000001  50
200.000000000000'51'15...  200.00000000000051  200.000000000001  200
500.000000000000'51'15...  500.00000000000051  500.000000000001  500
1020.00000000000'50'02...  1020.000000000005   1020.00000000001  1020
2000.00000000000'50'02...  2000.000000000005   2000.00000000001  2000
3000.00000000000'50'02...  3000.000000000005   3000.00000000001  3000
9000.00000000000'54'56...  9000.0000000000055  9000.00000000001  9000
20000.0000000000'50'93...  20000.000000000051  20000.0000000001  20000
50000.0000000000'50'93...  50000.000000000051  50000.0000000001  50000
500000.000000000'52'38...  500000.00000000052  500000.000000001  500000
1020000.00000000'50'05...  1020000.000000005   1020000.00000001  1020000
``````

The first column gives the exact (though truncated) value that the `Double` represent. The second column gives the string representation from the `"R"` format string. The third column gives the usual string representation. And finally the fourth column gives the `System.Decimal` that results from converting this `Double`.

We conclude the following:

• Round to 15 digits by `ToString()` and round to 15 digits by conversion to `Decimal` disagree in very many cases
• Conversion to `Decimal` also rounds incorrectly in many cases, and the errors in these cases cannot be described as "round-twice" errors
• In my cases, `ToString()` seems to yield a bigger number than `Decimal` conversion when they disagree (no matter which of the two rounds correctly)

I only experimented with cases like the above. I haven't checked if there are rounding errors with numbers of other "forms".

So from your experiments, it appears that `Double.ToString` doesn't do correct rounding.

That's rather unfortunate, but not particularly surprising: doing correct rounding for binary to decimal conversions is nontrivial, and also potentially quite slow, requiring multiprecision arithmetic in corner cases. See David Gay's `dtoa.c` code here for one example of what's involved in correctly-rounded double-to-string and string-to-double conversion. (Python currently uses a variant of this code for its float-to-string and string-to-float conversions.)

Even the current IEEE 754 standard for floating-point arithmetic recommends, but doesn't require that conversions from binary floating-point types to decimal strings are always correctly rounded. Here's a snippet, from section 5.12.2, "External decimal character sequences representing finite numbers".

There might be an implementation-defined limit on the number of significant digits that can be converted with correct rounding to and from supported binary formats. That limit, H, shall be such that H ≥ M+3 and it should be that H is unbounded.

Here `M` is defined as the maximum of `Pmin(bf)` over all supported binary formats `bf`, and since `Pmin(float64)` is defined as `17` and .NET supports the float64 format via the `Double` type, `M` should be at least `17` on .NET. In short, this means that if .NET were to follow the standard, it would be providing correctly rounded string conversions up to at least 20 significant digits. So it looks as though the .NET `Double` doesn't meet this standard.

In answer to the 'Is this a bug' question, much as I'd like it to be a bug, there really doesn't seem to be any claim of accuracy or IEEE 754 conformance anywhere that I can find in the number formatting documentation for .NET. So it might be considered undesirable, but I'd have a hard time calling it an actual bug.

EDIT: Jeppe Stig Nielsen points out that the System.Double page on MSDN states that

Double complies with the IEC 60559:1989 (IEEE 754) standard for binary floating-point arithmetic.

It's not clear to me exactly what this statement of compliance is supposed to cover, but even for the older 1985 version of IEEE 754, the string conversion described seems to violate the binary-to-decimal requirements of that standard.

Given that, I'll happily upgrade my assessment to 'possible bug'.