PROPOSAL FOR AN OPTIONAL IEEE 754, BINARY FLOATING-POINT WORD SET version 0.5.1 dnw 7-Jun-09 TABLE of CONTENTS 1 INTRODUCTION 2 TERMINOLOGY AND NOTATION 3 IMPLEMENTATION 4 DATA TYPES 5 ENVIRONMENTAL QUERIES 6 TEXT INPUT 6.1 Decimal input 6.2 Hexadecimal input 7 GLOSSARY 7.1 Conversion 7.2 Output 7.3 Comparison 7.4 Classification 7.5 Arithmetic 7.6 Math functions 7.7 Sign bit operations 7.8 Nearest integer functions 7.9 Number manipulation 7.10 Exceptions 7.11 Rounding modes 8 REFERENCES AND FOOTNOTES 1 INTRODUCTION This is a proposal for an optional Forth 200x word set that supports the binary part of the IEEE 754-2008 standard for floating-point arithmetic [1]. The most recent, freely available, but less comprehensive version is IEEE 754 draft 1.2.9, January 27, 2007 [2]. There is also a Wikipedia summary [3]. This specification requires that ISO Forth [5,6] floating-point and floating-point extension words in the optional Floating-Point word set, when present, satisfy additional IEEE 754-2008 requirements. Words in that word set and this that correspond to operations or functions in IEEE 754-2008 adopt the behavior required or recommended there by reference, as far as that is possible and makes sense, unless otherwise stated. The specification is compatible with, rather than conformant to, IEEE 754-2008, because not all IEEE requirements are to be implemented, and some are qualified here by "should" rather than an IEEE "shall". The current C99 standard [4-7] has a comprehensive treatment of IEEE 754-1985, which offers a route to implementation, for those Forth systems that can call C libraries. [**Bracketed statements like this are for editorial questions and comments, eventually to be removed.] 2 TERMINOLOGY AND NOTATION "IEEE 754" or "IEEE": IEE 754-2008. Section numbers in the standard [1] are indicated by IEEE
. "C99": ISO/IEC 9899:1999 C [4]. "DPANS94": American National Standard for Forth, ANSI X3.215-1994, final draft [8]. The final draft is believed to be the same as the published version, which is now ISO/IEC 15145:1997 [9]. This document adopts the official terminology of DPANS94 unless otherwise stated. Section numbers in that document are indicated by DPANS94
. "fp" or "bfp": Short for "binary floating point". The Forth floating-point stack is called the "fp stack". "special number": Signed zero, a quiet or signaling signed nan, or signed infinity. "full set of numbers": For an IEEE binary format, the set of normal, subnormal, and special numbers that it represents. "IEEE number": Any member of a full set of numbers. "IEEE arithmetic": Arithmetic defined by IEEE 754-2008 for IEEE numbers. "nan load" or "nan payload": The value of the fractional bits in the binary format of a nan, excluding the quiet bit, considered as a positive integer. The smallest signaling load is unity, and the smallest quiet load is zero. "qnan", resp., "snan": Any quiet or signaling nan, respectively, of any sign or load. "single": In the context of Forth fp, the IEEE 754-2008 32-bit interchange format. "double": In the context of Forth fp, the IEEE 754-2008 64-bit interchange format. "default": In the context of Forth fp, the default, DPANS94 float format for numbers that can appear on the fp stack. "exception": Used in the IEEE sense, rather than the throw sense. [**Anton suggests a new terminology is needed for this.] [**Other terminology defined by IEEE probably has been used in the document that needs to be spelled out.] Each IEEE bfp format has two fixed parameters, p > 0 (precision) and emax > 0 (maximum exponent), and defines emin = 1 - emax (minimum exponent). Each such format represents all real numbers of the form r = (-1)^s * 2^e * b_0.b_1 ... b_{p-1} where s = 0 or 1, emin <= e <= emax, b_i = 0 or 1, p = #significand bits. IEEE 754-2008 defines three basic binary fp formats, binary32, binary64, and binary128, plus three corresponding extended binary formats, whose parameters are shown in Tables 1 and 2 below. It also defines four binary interchange formats, of which only three are shown in Table 3 (IEEE binary16 is omitted). Table 1: Parameters for IEEE 754-2008 basic binary formats. binary32 binary64 binary128 --------------------------------------- p = 24 53 113 emax = 127 1023 16383 Table 2: Parameters for IEEE 754-2008 extended binary formats. binary32 binary64 binary128 --------------------------------------- p >= 32 64 128 emax >= 1023 16383 65535 Table 3: Parameters for IEEE 754-2008 binary interchange formats (k is the storage width in bits). binary32 binary64 binary128 --------------------------------------- k = 32 64 128 p = 24 53 113 emax = 127 1023 16383 Note that the intel 80-bit format corresponds to binary64 extended, with p = 64 and emax = 16383. Its precision is greater than that of basic binary64 and less than that of basic binary128, with exponent range the same as basic binary128. Although it is not defined as a basic IEEE binary format, we call this the "binary80" basic format. Note that the binary128 interchange format is the only one in Table 3 that can contain the binary80 basic format. 3 IMPLEMENTATION According to DPANS94, Section 3, "Usage requirements": A system shall provide all of the words defined in 6.1 Core Words. It may also provide any words defined in the optional word sets and extension word sets. The DPANS94 Floating-Point word set is an optional word set, and so is the word set described by this document. The word "shall" in the remainder of this document states a requirement when the environmental query for IEEE-FP returns true. "Should" means "strongly recommended". The internal fp representation of default fp numbers, i.e., numbers that can appear on the fp stack, shall correspond to one of the IEEE basic, or extended, full binary formats. 4 DATA TYPES The DPANS94 r type is extended to include the special numbers. 5 ENVIRONMENTAL QUERIES [**UNDER CONSTRUCTION] A true result for the IEEE-FP environmental query shall mean that at least the DPANS94 floating-point words are present, and that all new [**required] words in this document are present. Those and any DPANS94 floating-point words that are present, shall obey the specifications of this document. Value String Data Type Constant? Meaning ---------------------------------------------------------------- IEEE-FP n no binary float format: n=0, extended and not listed below n=32, binary32 n=64, binary64 n=80, binary80 n=128, binary128 ---------------------------------------------------------------- [**Case n=0 seems unlikely. Better way to handle it? How can a portable application know its parameters when n=0?] 6 TEXT INPUT 6.1 Decimal input ----------------- IEEE requires that conversion between text and binary fp formats shall include signed zero, signed infinity, and signed nans, with and without loads. See IEEE 5.4.2, "Conversion operations for floating-point formats and decimal character sequences", and IEEE 5.12, "Details of conversion between floating-point data and external character sequences". The syntax specification in DPANS94 12.3.7, "Text input number conversion", shall be replaced by Convertible string := { | } [**Separate RfD to allow []. as well.] := []{ [.] | . } [**Note the extra . option.] := { []| } [**Note that a sign with no digits is not allowed.] := := * := { + | - } := { E | e } := []{ | } := { Inf | inf | INF |Infinity | infinity | INFINITY } := [:] := { NaN | nan | NAN } Interpretation shall convert to the default IEEE format, with roundTiesToEven for real numbers, and subject to overflow and underflow exceptions. See IEEE 7.4 for overflow and IEEE 7.5 for underflow. The special number produced for an unloaded NaN without a sign or with a plus sign is the positive, quiet nan with zero load, and that produced by unloaded negative NaN is its negative. [**Make overflow/underflow ambiguous conditions?] Nan loads shall be encoded as if they are binary integers fitting into the trailing significand field of the default fp format, with corresponding least signficant bits. An ambiguous condition exists if the input load is too large for the default format. [**IEEE is not specific about the least significant bit, AFAICT, but this makes sense to me.] [**Separate RfD for some of this? If so, which part?] 6.2 Hexadecimal input --------------------- IEEE requires a text format for numbers with a hexadecimal significand, and decimal radix two exponent, with exact conversion to and from binary fp formats where possible. See IEEE 5.12.3, "External hexadecimal-significand character sequences representing finite numbers". The following syntax with mixed hexadecimal significand and decimal radix two exponent is based on IEEE 5.12.3, and is believed to be the same as C99. The definition of is the natural one, and that of is also natural, with both upper and lower case allowed. Convertible string := := []{ [.] | . } := { []| } [**Note that a sign with no digits is not allowed.] := := * := := * := { + | - } := { P | p } := { 0X | 0x } Interpretation shall convert real numbers expressed with the above syntax to the default fp format, subject to rounding via roundTiesToEven, and subject to overflow/underflow exceptions. See IEEE 7.4 for overflow and IEEE 7.5 for underflow. When the input number is exactly representable in the default format, the conversion shall be exact. [**Make overflow/underflow ambiguous conditions?] [**Separate RfD? The sentiment in comp.lang.forth seemed to be yes, but I'm not sure that took into account the IEEE requirement.] 7 GLOSSARY Unless otherwise stated, all fp words that do computations or comparisons shall obey the requirements and recommendations of IEEE 5 and IEEE 6, for binary formats. From IEEE 4, "Attributes and rounding", this specifiction adopts only IEEE 4.3, "Rounding-direction attributes", including roundTiesToEven, with the following caveat. The directed rounding modes, roundTowardPositive, roundTowardNegative, and roundTowardZero, need be implemented only if supported by the underlying cpu. The roundTiesToAway mode, which IEEE does not require, need not be implemented. See Sec. 7.x "for rounding". 7.1 Conversion -------------- D>F ( d -- f: r ) F>D ( f: r -- s: d ) The DPANS94 specification is amended to include the specification of IEEE 5.8, "Details of conversions from floating-point to integer formats". In particular, when an input r is signed zero, the output integer is zero; and when the input integer is zero, the output r is +0. [**IEEE mentions nan input, not that clear to me.] >FLOAT ( c-addr u -- r: r s: flag ) The DPANS94 specification is extended to include special numbers, with modified syntax. Conversion shall be governed by IEEE 5.12, "Details of conversion between floating-point data and external character sequences", except for paragraph 4 on hexadeximal conversions. [**Separate Rfd for hex? A separate word, say, MIX>FLOAT ?] A string of blanks [**or the empty string?] shall be treated as +0. [**Allow formatting blanks or characters?] Syntax of a convertible string := { [exponent] | } := []{ [.] | . } := := { | } := [] := { + | - } := { D | d | E | e } := []{ | } := { Inf | inf | INF | infinity | Infinity } := { NaN | nan | NAN } [**mh prefers a mandatory in . dnw favors keeping it optional, in line with the omnivorous philosophy of DPANS94 A.12.6.1.0558.] REPRESENT ( f: r s: c-addr u -- n flag1 flag2 ) The DPANS94 specification is extended to include special numbers. The valid-result flag, flag2, is false if r is infinity or a nan. [**This needs discussion. 1. Should the current rounding mode be used, instead of roundTiesToEven? 2. Marcel says "It should be possible to format/output +Inf or +NaN, also with the existing REPRESENT." Other opinions? If nan is output, should its load be output as well?] SF! ( f: r s: sf-addr -- ) SF@ ( sf-addr -- f: f ) DF! ( f: r s: df-addr -- ) DF@ ( sf-addr -- f: f ) The specification for these DPANS94 words is amended to explicitly require conversion to or from the respective IEEE binary32 or binary64 interchange formats, with exact conversion in either direction for signed zero, signed infinity, and real numbers to a wider format, and with roundTiesToEven for conversion of real numbers to a narrower format (see IEEE Sec. 5.4.2, formatOf-convertFormat). The conversion of nans shall preserve the sign and signaling bit, shall not signal an exception, and should treat payloads according to IEEE 6.2.3. [**Should the current rounding mode be used, instead of roundTiesToEven?] 7.2 Output ---------- F. ( f: r -- ) FE. ( f: r -- ) FS. ( f: r -- ) The DPANS94 specification is extended to include the special numbers, with output text of the appropriate form below, up to implementation-dependent case sensitivity: [**Default rounding? Or use the current rounding mode?] []0{ E | e } [] [][:] 7.3 Comparison -------------- [**Much of the following might go into a rationale section.] IEEE has twenty-two required comparisons which apply to the full set of numbers in any implemented fp format. Twelve of these are quiet, and ten are signaling. See IEEE 5.6.1, "Comparisons", and IEEE 5.11, "Details of comparison predicates". This proposal requires only the quiet comparisons, which do not raise exceptions. IEEE identifies four fundamental, mutually exclusive comparisons: less than ("<"), equal ("="), greater than (">"), and unordered ("?"). Each of these is true iff all of the others is false. We call these comparisons "fundamentals". Together with logical negation and combinations, the fundamentals can be used to express all twelve required comparisons. The basic rules are the following: * The sign of zero is ignored. * The sign of infinity is not ignored, and is treated in the natural way for the "ordinary" comparisons with real numbers or infinity, namely <, =, and >. In particular, either signed infinity is equal to itself. * The r1 ? r2 unordered comparison is true iff at least one of r1 and r2 is a nan. That implies that any of the other three, "ordinary" comparisons involving a nan is false. The twelve required comparisons are the following, where "N" indicates logical negation: < = > ? N< N= N> N? <= >= ? The four comparisons that involve two fundamentals, such as "<=", are defined as the OR of the fundamentals, the usual notation. The "<=" and ">=" comparisons are no longer simple negations of ">" and "<", but are rather the AND's of those with "N?", so fp stack gymnastics cannot be avoided if they are to be expressed in terms of fundamentals. It can be shown that =), and >? is N(<=). See IEEE Table 5.3, "Required unordered-quiet predicates and negations". DPANS94 has only F< and F~. Since IEEE "=" is semantically different from 0E F~, adding F= is inevitable. IEEE ">" is not expressible in terms of "<" and "=" plus logical operations, but F> is expressible in Forth as FSWAP F<. Furthermore, IEEE "?" is expressible as the AND of the negations of the other three. Thus there is an absolute minimum of two Forth words, say F< and F=, which are sufficient to express all twelve IEEE comparisons, with the help of 0=, AND, OR, and fp stack gymnastics. The stack gymnastics, however, is already pretty bad for combinations of two fundamentals, like F<=. The upshot is that we propose two required and six recommended Forth words, which can express the twelve IEEE comparisons with the help of negations. As negations of the recommended F>= and F<=, F? could have been left out. They are included because their names have more intuitive meaning than the corresponding 0= phrases. Implementation of the recommended words can be either primitive or high level. The same discussion applies to the F0< family owords. F< ( f: r1 r2 -- s: [r1 ( f: r1 r2 -- s: [r1>r2]? ) rec F? ( f: r1 r2 -- s: [r1?r2]? ) rec F<= ( f: r1 r2 -- s: [r1<=r2]? ) rec F>= ( f: r1 r2 -- s: [r1>=r2]? ) rec F? ( f: r1 r2 -- s: [r1>?r2]? ) rec F0< ( f: r1 r2 -- s: [r<0]? ) F0= ( f: r1 r2 -- s: [r=0]? ) F0> ( f: r1 r2 -- s: [r>0]? ) rec F0? ( f: r1 r2 -- s: [r?0]? ) rec F0<= ( f: r1 r2 -- s: [r<=0]? ) rec F0>= ( f: r1 r2 -- s: [r>=0]? ) rec F0? ( f: r1 r2 -- s: [r>?0]? ) rec The DPANS94 specification for F<, F0<, and F0= is extended to the IEEE comparison semantics for the special numbers, with the indicated predicate flag. The remaining words implement the indicated IEEE comparisons. Words marked with "rec" are recommended. The others are required. [**The "?" for "unordered" can of course be confused with flag notation, although it does have a bit of the right connotation. Maybe FU and F0U, etc.?] [**High level reference implementations of the recommended words to be added later.] [**Personally, I would add recommended words for the negations of the four fundamentals, FN<, etc. :-) ] F~ ( f: r1 r2 r3 -- s: flag ) If the sign of r3 is plus, flag is true iff the absolute value of r1 minus r2 is less than r3, taking into account IEEE arithmetic and comparison rules. If r3 is signed zero, flag is true iff r1 and r2 have identical formats. If the sign of r3 is minus, flag is true iff the absolute value of r1 minus r2 is greater than r3, taking into account IEEE arithmetic and comparison rules. [**Double check that the middle case is identical to the other two when r3 is signed zero.] 7.4 Classification ------------------ IEEE 5.7.2, "General operations", requires a large number of classification operations. This documents defines only those corresponding to: isSignMinus isNormal isFinite isZero isSubnormal isInfinite isNaN isSignaling Actually isSignMinus corresponds to FSIGNBIT, and isZero corresponds to F0=, which leaves the following: FINITE? ( r: r -- s: [normal|subnormal]? ) FNORMAL? ( r: r -- s: normal? ) FSUBNORMAL? ( r: r -- s: subnormal? ) FINFINITE? ( r: r -- s: [+|-]Inf? ) FNAN? ( r: r -- s: nan? ) FSIGNALING? ( r: r -- s: snan? ) [**Should FQUIET? be recommended.] 7.5 Arithmetic -------------- See IEEE 5.4.1, "Arithmetic operations". F* ( f: r1 r2 -- r1*r2 ) F*+ ( f: r1 r2 r3 -- [r2*r3]+r1 ) new F+ ( f: r1 r2 -- r1+r2 ) F- ( f: r1 r2 -- r1-r2 ) F/ ( f: r1 r2 -- r1/r2 ) FSQRT ( f: r -- sqrt[r] ) The DPANS94 specification is extended to IEEE arithmetic. See IEEE 5.1, "Overview" for precision, rounding, special number treatment, and exceptions. See IEEE 5.4.1, "Arithmetic operations", for the arithmetic words. [** F*+ is the Forth name for the IEEE required fusedMultiplyAdd operation.] 7.6 Math functions ------------------- The Forth words FABS, FMAX, FMIN, and FSQRT are covered elsewhere. The DPANS94 specification for the following words is extended to adopt the corresponding IEEE behavior. See IEEE 9.2, "Recommended correctly rounded functions", and 9.2.1, "Special values". F** FACOS FACOSH FALOG FASIN FASINH FATAN FATAN2 FATANH FCOS FCOSH FEXP FEXPM1 FLN FLNP1 FLOG FSIN FSINCOS FSINH FSQRT FTAN FTANH IEEE recommends additional functions, whose recommended Forth names are: FEXP2 FEXP2M1 FEXP10 FEXP10M1 FLOG2 FLOGP1 FLOG2P1 FHYPOT 1/FSQRT FCOMPOUND FROOTN F**N |F|** FSINPI FCOSPI FATANPI FATAN2PI [**Separate RfD?] 7.7 Sign bit operations ----------------------- FSIGNBIT ( f: r -- s: minus? ) C99:7.12.3.6 The following are all required by IEEE. See IEEE 5.5.1, "Sign bit operations". The IEEE copy() function is superfluous in Forth [**IIUC]. FNEGATE ( f: r -- -r ) FABS ( f: r -- |r| ) The DPANS94 specification is extended to the special numbers. FCOPYSIGN ( f: r1 r2 -- r3 ) The output r3 is r1 with its sign bit replaced by that of r2. 7.8 Nearest integer functions ----------------------------- FCEIL ( f: r1 -- r2 ) new FLOOR ( f: r1 -- r2 ) FROUND ( f: r1 -- r2 ) FNEARBYINT ( f: r1 -- r2 ) new FTRUNC ( f: r1 -- r2 ) new These words correspond to the respective IEEE required operations: roundToIntegralTowardPositive roundToIntegralTowardNegative roundToIntegralTiesToEven roundToIntegralExact roundToIntegralTowardZero See IEEE 5.3.1, "General operations" and 5.9, "Details of operations to round a floating-point datum to integral value". The names are based on C99. FTRUNC has already passed a CfV. FNEARBYINT performs the function of whichever of the other four corresponds to the current rounding mode [**IIUC]. No word is defined for IEEE roundToIntegralTiesToAway. [**Separate RfD for the new words?] 7.9 Number manipulation ------------------------ FMAX ( f: r1 r2 -- r3 ) FMIN ( f: r1 r2 -- r3 ) The DPANS94 specification is extended to IEEE behavior for the special numbers. See minNum and maxNum in IEEE 5.3.1, "General operations" and 6.2, "Operations with NaNs". [**IEEE also requires the equivalent of |F|MAX and |F|MIN. Should we? It also requires an FREMAINDER.] FNEXTUP ( f: r1 r2 -- r3 ) FNEXTUP returns the next number after r1 in the r2 direction. See 5.3.1, "General operations". FNEXTDOWN is not defined. According to IEEE, nextDown(x) is -nextUp(-x). FSCALBN ( f: r s: n -- f: r*2^n.r ) The output is efficiently scaled by 2^n. See IEEE 5.3.3, "logBFormat operations". FLOGB ( f: r -- e.r ) Leave the radix-two exponent e of the fp representation. If r is subnormal, the exponent is computed as if r were normalized, with e < emin. See IEEE 5.3.3, "logBFormat operations". 7.10 Exceptions --------------- [**UNDER CONSTRUCTION] 7.11 Rounding modes ------------------- [**UNDER CONSTRUCTION] From IEEE 9.3, "Operations on dynamic modes for attributes", we define only words corresponding to 9.3.1, "Operations on individual dynamic modes". As stated earlier, roundTiesToAway is not defined. [**This scheme is based on a suggestion by Andrew Haley in comp.lang.forth, which emerged from a discussion with Anton Ertl.] NEAR-ROUNDING ( -- ) CEIL-ROUNDING ( -- ) FLOOR-ROUNDING ( -- ) TRUNC-ROUNDING ( -- ) NEAR-ROUNDING{ ( -- ) compilation only CEIL-ROUNDING{ ( -- ) compilation only FLOOR-ROUNDING{ ( -- ) compilation only TRUNC-ROUNDING{ ( -- ) compilation only }ROUNDING ( -- ) compilation only The first four words set the current rounding mode to roundTiesToEven, roundTowardPositive, roundTowardNegative, and roundTowardZero, respectively. The next four, compilation only words save the current rounding mode first, then set the current mode as indicated. They must occur in balanced pairs with }ROUNDING (within the same word definition), which restores the mode saved by the first member of the pair. The return stack may be used for saving and restoring the mode, which entails the usual restrictions on the use of these words with DO ... LOOP, locals, etc. 8 FOOTNOTES AND REFERENCES [1] "IEEE Standard for Floating-Point Arithmetic", approved June 12, 2008 as IEEE Std 754-2008: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935 [2] "DRAFT Standard for Floating-Point Arithmetic P754", IEEE 754 draft 1.2.9, January 27, 2007: http://www.validlab.com/754R/nonabelian.com/754/comments/Q754.129.pdf [3] Wikipedia, "IEEE 754-2008": http://en.wikipedia.org/wiki/IEEE_754 [4] ISO/IEC 9899:1999 (December 1, 1999), ISO/IEC 9899:1999 Cor. 1:2001(E), ISO/IEC 9899:1999 Cor. 2:2004(E), ISO/IEC 9899:1999 Cor. 3:2007(E): http://www.open-std.org/jtc1/sc22/wg14/www/standards.html#9899 [5] C99 + TC1 + TC2 is included in WG14/N1124, May 6, 2005: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf [6] TC3: http://www.iec.ch/cgi-bin/getcorr.pl/yab/iso/ isoiec9899-cor3{ed1.0}en.pdf?file=iso/isoiec9899-cor3{ed1.0}en.pdf [7] Single UNIX 3: http://www.unix.org/single_unix_specification/ [8] ANSI X3.215-1994 final draft: http://www.taygeta.com/forth/dpans.html [9] ISO/IEC 15145:1997: http://webstore.ansi.org/RecordDetail.aspx?sku=ISO%2fIEC+15145%3a1997 http://www.iso.org/iso/catalogue_detail.htm?csnumber=26479