One approach is to use "scaled" numbers, e.g. represent your dollar amount in cents or hundredths of cents so you are always working in integers. And it requires big integers, 32 bits is only about 9 decimal digits and the 52 bits of double floats is about 15. You really need 64 bit integers which are about 19 digits. (10 bits ~ 3 decimal digits) But that still doesn't give you the ability to deal with general purpose floating point.
So Suneido has always had a decimal floating point numeric type. (Internally, for performance, it also uses plain integers when possible.) Another advantage of a decimal type is that it is simple and quick to convert to and from string form.
Back when I first wrote Suneido (~ 15 years ago) there were no 64 bit integers in C++ compilers ("long" was 32 bits) and 32 bits wasn't sufficient precision. So I had to use multiple values to hold the coefficient. Since I had to use multiple integers anyway, to simplify overflow (by using 32 bit ints for intermediate results) I used four 16 bit ints, each one holding four decimal digits for an overall precision of 16 decimal digits. (To simplify "shifting" the exponent is in terms of the 16 bit ints, i.e. it jumps 4 decimals at a time. This "granularity" causes problems with precision. Depending on the exponent, in the worst case you get as few as 10 decimal digits of precision.)
Of course, having to use multiple integers and trying to get decent performance complicated the code, especially division. I won't claim it's the greatest code, but nevertheless it's worked reasonably well for a long time.
When I implemented jSuneido, I used Java's BigDecimal. Because of the different implementation there were a few minor differences, but they were mostly edge cases that didn't matter in practical usage. (Unfortunately I had made the external dump format for numbers mirror cSuneido's internal representation so it's a little awkward converting to and from BigDecimals.)
Recently, we've started to run into issues with using API's that deal with 64 bit integers, because we don't have enough precision to store them. In jSuneido it would be easy to bump up the BigDecimal precision to 20 digits. In theory I could do the same with cSuneido, but unfortunately, the code is fairly specific to the current precision. e.g. loops are unrolled. The thought of making this change is not pleasant :-(
The other problem is that some of the code assumes that you can convert to and from 64 bit integers losslessly. But 20 decimal digits won't always fit in a 64 bit integer.
Now that we have 64 bit integer types, the obvious answer seems to be to use a 64 bit integer for the coefficient. This will be faster and simpler than using multiple small integers, and probably faster than BigDecimal since it handles arbitrary precision. And if I used the same approach in both cSuneido and jSuneido this would ensure consistent results.
Since I'm in the middle of playing with Go, I figured I'd try writing a Go version first. It should be relatively easy to port to C++ and Java if I decide to.
It took me a couple of days to write it. One of the challenges is detecting overflow when calculating with 64 bit integers, since you don't have a larger type to use for intermediate calculations. Hacker's Delight provided a few tips for this. Another useful reference was General Decimal Arithmetic.
It's about 500 lines for add, subtract, multiply, divide, and conversion to and from strings. (That's about half the size of the cSuneido C++ code.) Since I'm new to Go, it may not be the most idiomatic code. And I have only done basic testing and refactoring. "float10" isn't the greatest name. Maybe "decimal" or even "dec"? (in keeping with Go's predilection for short names) I'm open to suggestions...
I chose to pass and return by value rather than by pointer. I'm not sure if this is the best choice for a 10 byte struct. Would it be faster to pass by pointer? Returning by pointer forces heap allocation for intermediate results which isn't ideal. Pass and return by value is a good fit for immutable values which are my preference.
Go makes it really easy to benchmark so I checked the speed of division (the slowest operation). Micro-benchmarks are always dubious, but it gave me a rough idea. It showed about 400 ns per divide (on my iMac). I don't have comparable benchmarks for cSuneido or jSuneido, but that seems pretty good. I'm pretty sure it's better than cSuneido. (Of course, it's nowhere near as fast as native binary floating point done in hardware. The same benchmark with float64 gives about 7 ns per divide, although this is so small that it's even less likely to be accurate.)
As far as evaluating Go, so far I like it. Of course, it's well suited to low level code like this. Sublime Text + GoSublime works well. (The only issues have been with learning Sublime since I haven't used it much.) I might have broken out the debugger a couple of times if I'd been working in Java or C++, but I get the impression the debugger story for Go isn't that great. I managed easily enough with old school prints :-) I plan to give Eclipse + GoClipse a try at some point since I'm already familiar with Eclipse.
The code is embedded below but it's probably easier to read (or download) on GitHub.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Package float10 implements decimal floating point numbers | |
// using uint64 to hold the coefficient. | |
package float10 | |
import "strconv" | |
import "errors" | |
import "strings" | |
import "math" | |
// value is -1^sign * coef * 10^exp | |
// zeroed value = 0 | |
type Float10 struct { | |
coef uint64 | |
sign int8 | |
exp int8 | |
} | |
const ( | |
POSITIVE = 0 | |
NEGATIVE = 1 | |
INF_EXP = math.MaxInt8 | |
) | |
var ( | |
Zero = Float10{} | |
Inf = Float10{exp: INF_EXP} | |
MinusInf = Float10{sign: NEGATIVE, exp: INF_EXP} | |
) | |
// Parse convert a string to a Float10 | |
func Parse(s string) (Float10, error) { | |
if len(s) < 1 { | |
return Zero, errors.New("cannot convert empty string to Float10") | |
} | |
if s == "0" { | |
return Zero, nil | |
} | |
var f Float10 | |
i := 0 | |
if s[i] == '+' { | |
i++ | |
} else if s[i] == '-' { | |
f.sign = NEGATIVE | |
i++ | |
} | |
before := spanDigits(s[i:]) | |
i += len(before) | |
after := "" | |
if i < len(s) && s[i] == '.' { | |
i++ | |
after = spanDigits(s[i:]) | |
i += len(after) | |
} | |
after = strings.TrimRight(after, "0") | |
coef, err := strconv.ParseUint(before+after, 10, 64) | |
if err != nil { | |
return Zero, errors.New("invalid number (" + s + ")") | |
} | |
f.coef = coef | |
exp := 0 | |
if i < len(s) && (s[i] == 'e' || s[i] == 'E') { | |
i++ | |
e, err := strconv.ParseInt(s[i:], 10, 8) | |
if err != nil { | |
return Zero, errors.New("invalid exponent (" + s + ")") | |
} | |
exp = int(e) | |
} | |
if coef == 0 { | |
return Zero, nil | |
} | |
exp -= len(after) | |
if exp < -127 || exp >= 127 { | |
return Zero, errors.New("exponent out of range (" + s + ")") | |
} | |
f.exp = int8(exp) | |
return f, nil | |
} | |
// spanDigits returns the number of leading digits | |
func spanDigits(s string) string { | |
i := 0 | |
for i < len(s) && '0' <= s[i] && s[i] <= '9' { | |
i++ | |
} | |
return s[:i] | |
} | |
// String converts a Float10 to a string. | |
// It will avoid scientific notation | |
// adding up to 4 zeroes at the end or 3 zeroes at the beginning. | |
// If the exponent is 0 it will print the number as an integer. | |
func (f Float10) String() string { | |
if f == Zero { | |
return "0" | |
} | |
sign := "" | |
if f.sign == NEGATIVE { | |
sign = "-" | |
} | |
if f.exp == INF_EXP { | |
return sign + "inf" | |
} | |
sexp := "" | |
exp := int(f.exp) | |
digits := strconv.FormatUint(f.coef, 10) | |
if 0 <= exp && exp <= 4 { | |
// decimal to the right | |
digits += strings.Repeat("0", exp) | |
} else if -len(digits)-4 < exp && exp <= -len(digits) { | |
// decimal to the left | |
digits = "." + strings.Repeat("0", -exp-len(digits)) + digits | |
digits = strings.TrimRight(digits, "0.") | |
} else if -len(digits) < exp && exp <= -1 { | |
// decimal within | |
i := len(digits) + exp | |
digits = digits[:i] + "." + digits[i:] | |
digits = strings.TrimRight(digits, "0.") | |
} else { | |
// use exponent | |
sexp = "e" + strconv.FormatInt(int64(exp), 10) | |
} | |
return sign + digits + sexp | |
} | |
// operations ------------------------------------------------------------------ | |
func (f Float10) Neg() Float10 { | |
if f == Zero { | |
return Zero | |
} else { | |
return Float10{f.coef, f.sign ^ 1, f.exp} | |
} | |
} | |
func Cmp(x Float10, y Float10) int { | |
switch { | |
case x == y: | |
return 0 | |
case x == MinusInf, y == Inf: | |
return -1 | |
case x == Inf, y == MinusInf: | |
return 1 | |
} | |
switch d := Sub(x, y); { | |
case d == Zero: | |
return 0 | |
case d.sign == NEGATIVE: | |
return -1 | |
default: | |
return +1 | |
} | |
} | |
func Add(x Float10, y Float10) Float10 { | |
switch { | |
case x == Zero: | |
return y | |
case y == Zero: | |
return x | |
case x == Inf: | |
if y == MinusInf { | |
return Zero | |
} else { | |
return Inf | |
} | |
case x == MinusInf: | |
if y == Inf { | |
return Zero | |
} else { | |
return MinusInf | |
} | |
case y == Inf: | |
return Inf | |
case y == MinusInf: | |
return MinusInf | |
case x.sign != y.sign: | |
return usub(x, y) | |
default: | |
return uadd(x, y) | |
} | |
} | |
func Sub(x Float10, y Float10) Float10 { | |
switch { | |
case x == Zero: | |
return y.Neg() | |
case y == Zero: | |
return x | |
case x == Inf: | |
if y == Inf { | |
return Zero | |
} else { | |
return Inf | |
} | |
case x == MinusInf: | |
if y == MinusInf { | |
return Zero | |
} else { | |
return MinusInf | |
} | |
case y == Inf: | |
return MinusInf | |
case y == MinusInf: | |
return Inf | |
case x.sign != y.sign: | |
return uadd(x, y) | |
default: | |
return usub(x, y) | |
} | |
} | |
func uadd(x Float10, y Float10) Float10 { | |
sign := x.sign | |
align(&x, &y) | |
if x.coef == 0 { | |
return Float10{y.coef, sign, y.exp} | |
} | |
coef := x.coef + y.coef | |
if coef < x.coef || coef < y.coef { // overflow | |
x.shiftRight() | |
y.shiftRight() | |
coef = x.coef + y.coef | |
} | |
return result(coef, sign, int(x.exp)) | |
} | |
func align(x *Float10, y *Float10) (flipped int8) { | |
if x.exp == y.exp { | |
return | |
} | |
if x.exp > y.exp { | |
*x, *y = *y, *x // swap | |
flipped = 1 | |
} | |
for y.exp > x.exp && y.shiftLeft() { | |
} | |
for y.exp > x.exp && x.shiftRight() { | |
} | |
return | |
} | |
// returns true if it was able to shift (losslessly) | |
func (f *Float10) shiftLeft() bool { | |
if !mul10safe(f.coef) { | |
return false | |
} | |
f.coef *= 10 | |
// don't decrement past min | |
if f.exp > math.MinInt8 { | |
f.exp-- | |
} | |
return true | |
} | |
const HI4 = 0xf << 60 | |
func mul10safe(n uint64) bool { | |
return (n & HI4) == 0 | |
} | |
// NOTE: may lose precision and round | |
// returns false only if coef is 0 | |
func (f *Float10) shiftRight() bool { | |
if f.coef == 0 { | |
return false | |
} | |
roundup := (f.coef % 10) >= 5 | |
f.coef /= 10 | |
if roundup { | |
f.coef++ // can't overflow | |
} | |
// don't increment past max | |
if f.exp < math.MaxInt8 { | |
f.exp++ | |
} | |
return true | |
} | |
func result(coef uint64, sign int8, exp int) Float10 { | |
switch { | |
case exp >= INF_EXP: | |
return inf(sign) | |
case exp < math.MinInt8 || coef == 0: | |
return Zero | |
default: | |
return Float10{coef, sign, int8(exp)} | |
} | |
} | |
func usub(x Float10, y Float10) Float10 { | |
sign := x.sign | |
sign ^= align(&x, &y) | |
if x.coef < y.coef { | |
x, y = y, x | |
sign ^= 1 // flip sign | |
} | |
return result(x.coef-y.coef, sign, int(x.exp)) | |
} | |
func Mul(x Float10, y Float10) Float10 { | |
sign := x.sign ^ y.sign | |
switch { | |
case x == Zero || y == Zero: | |
return Zero | |
case x.IsInf() || y.IsInf(): | |
return result(0, sign, INF_EXP) | |
} | |
x.minCoef() | |
y.minCoef() | |
if bits.Nlz(x.coef)+bits.Nlz(y.coef) >= 64 { | |
// coef won't overflow | |
if int(x.exp)+int(y.exp) >= INF_EXP { | |
return result(0, sign, INF_EXP) | |
} | |
return result(x.coef*y.coef, sign, int(x.exp)+int(y.exp)) | |
} | |
// drop 5 least significant bits off x and y | |
// 59 bits < 18 decimal digits | |
// 32 bits > 9 decmal digits | |
// so we can split x & y into two pieces | |
// and multiply separately guaranteed not to overflow | |
xlo, xhi := x.split() | |
ylo, yhi := y.split() | |
exp := x.exp + y.exp | |
r1 := result(xlo*ylo, sign, int(exp)) | |
r2 := result(xlo*yhi, sign, int(exp)+9) | |
r3 := result(xhi*ylo, sign, int(exp)+9) | |
r4 := result(xhi*yhi, sign, int(exp)+18) | |
return Add(r1, Add(r2, Add(r3, r4))) | |
} | |
// makes coef as small as possible (losslessly) | |
// i.e. trim trailing zero decimal digits | |
func (f *Float10) minCoef() { | |
for f.coef > 0 && f.coef%10 == 0 { | |
f.shiftRight() | |
} | |
} | |
// makes coef as large as possible (losslessly) | |
// i.e. trim leading zero decimal digits | |
func (f *Float10) maxCoef() { | |
for f.shiftLeft() { | |
} | |
} | |
func (f *Float10) split() (lo, hi uint64) { | |
const HI5 = 0x1f << 59 | |
for f.coef&HI5 != 0 { | |
f.shiftRight() | |
} | |
const NINE = 1000000000 | |
return f.coef % NINE, f.coef / NINE | |
} | |
func Div(x Float10, y Float10) Float10 { | |
sign := x.sign ^ y.sign | |
switch { | |
case x == Zero: | |
return Zero | |
case y == Zero || x.IsInf(): | |
return inf(sign) | |
case y.IsInf(): | |
return Zero | |
} | |
if x.coef%y.coef == 0 { | |
// divides evenly | |
return result(x.coef/y.coef, sign, int(x.exp)-int(y.exp)) | |
} | |
x.maxCoef() | |
y.minCoef() | |
if x.coef%y.coef == 0 { | |
// divides evenly | |
return result(x.coef/y.coef, sign, int(x.exp)-int(y.exp)) | |
} | |
return longDiv(x, y) | |
} | |
func longDiv(x Float10, y Float10) Float10 { | |
// shift y so it is just less than x | |
xdiv10 := x.coef / 10 | |
for y.coef < xdiv10 && y.shiftLeft() { | |
} | |
exp := int(x.exp) - int(y.exp) | |
rem := x.coef | |
ycoef := y.coef | |
coef := uint64(0) | |
// each iteration calculates one digit of the result | |
for rem != 0 && mul10safe(coef) { | |
// shift so y is less than the remainder | |
for ycoef > rem { | |
rem, ycoef = shift(rem, ycoef) | |
coef *= 10 | |
exp-- | |
} | |
if ycoef == 0 { | |
break | |
} | |
// subtract repeatedly | |
for rem >= ycoef { | |
rem -= ycoef | |
coef++ | |
} | |
} | |
// round final digit | |
if 2*rem >= ycoef { | |
coef++ | |
} | |
return result(coef, x.sign^y.sign, exp) | |
} | |
// shift x left (preferably) or y right | |
func shift(x, y uint64) (x2, y2 uint64) { | |
if mul10safe(x) { | |
x *= 10 | |
} else { | |
roundup := (y % 10) >= 5 | |
y /= 10 | |
if roundup { | |
y++ | |
} | |
} | |
return x, y | |
} | |
func (f Float10) IsInf() bool { | |
return f.exp == INF_EXP | |
} | |
func inf(sign int8) Float10 { | |
switch sign { | |
case POSITIVE: | |
return Inf | |
case NEGATIVE: | |
return MinusInf | |
default: | |
panic("invalid sign") | |
} | |
} | |
// Nlz returns the number of leading zero bits | |
// Algorithm from Hacker's Delight by Henry Warren | |
func Nlz(x uint64) int { | |
if x == 0 { | |
return 64 | |
} | |
n := 1 | |
if (x >> 32) == 0 { | |
n = n + 32 | |
x = x << 32 | |
} | |
if (x >> 48) == 0 { | |
n = n + 16 | |
x = x << 16 | |
} | |
if (x >> 56) == 0 { | |
n = n + 8 | |
x = x << 8 | |
} | |
if (x >> 60) == 0 { | |
n = n + 4 | |
x = x << 4 | |
} | |
if (x >> 62) == 0 { | |
n = n + 2 | |
x = x << 2 | |
} | |
n = n - int(x>>63) | |
return n | |
} |
3 comments:
Based on just a quick browse of the source code, it looks pretty good. Two things popped out at me. One is that elements with common type can be elided, so
func Add(x, y Float10) Float10
rather than
func Add(x Float10, y Float10) Float10
Second, I don't like the Nlz function name (coming from someone who doesn't do decimal math). Maybe LeadZero, or LZero or something? Nlz sounds like some form of algorithms. LeadZero combined with the signature is pretty clear, though maybe too long for the simplicity. LZero at least hilights the zero rather than the N
Thanks for the review.
Yeah, I still forget about eliding common types, and reading it still takes me a minute to figure out what it means. My first response is to think there's a type missing. I'll get used to it.
"Nlz" (Number of Leading Zeroes) is a somewhat standard name (e.g. in Hacker's Delight) I agree it's a bit cryptic if you're not familiar with it.
I wonder why gofmt doesn't clean up the common types? Does that mean there isn't a consensus on whether that's "good" or not?
Could you clean this up with a fmt rewrite rule?
Post a Comment