PROPOSAL FOR AN OPTIONAL IEEE 754, BINARY FLOATING-POINT WORD SET version 0.5.2 dnw 07-Jul-09 TABLE of CONTENTS 1 INTRODUCTION 2 TERMINOLOGY AND NOTATION 2a IEEE BINARY FLOATING-POINT FORMATS 3 IMPLEMENTATION 4 DATA TYPES 5 ENVIRONMENTAL QUERIES 6 TEXT INPUT 6.1 Constants 6.2 Decimal input 6.3 Hexadecimal input 7 GLOSSARY 7.1 Conversion 7.2 Output 7.3 Comparison 7.4 Classification 7.5 Arithmetic 7.6 Math functions 7.7 Sign bit operations 7.8 Nearest integer functions 7.9 Number manipulation 7.10 Exceptions 7.11 Rounding modes 8 REFERENCES A.2a IEEE FLOATING-POINT FORMATS A.6.1 NaN signs and loads A.7.3 Comparison (informative rationale) 1 INTRODUCTION This is a proposal for an optional Forth 200x word set, called the "IEEE floating-point word set", that supports the binary part of the IEEE 754-2008 standard for floating-point arithmetic [1]. The most recent, freely available, but less comprehensive version is IEEE 754 draft 1.2.9, January 27, 2007 [2]. There is also a Wikipedia summary [3]. The standard [1] is hereafter referred to as "IEEE 754-2008", with section numbers indicated by IEEE
. This specification requires that ISO Forth [4,5] floating-point and floating-point extension words in the optional floating-point word set, when present *with the IEEE floating-point word set*, satisfy additional IEEE 754-2008 requirements. Words in that word set and this that correspond to mathematical, including logical, operations or functions in IEEE 754-2008 adopt the behavior required or recommended there by reference, as far as that is possible and makes sense, unless otherwise stated. The specification is compatible with, rather than conformant to, IEEE 754-2008, because it includes only a subset of the IEEE requirements. Reference [4], the final draft of "ANSI X3.215-1994, American National Standard for Information Systems--Programming Languages--Forth", is hereafter referred to as "DPANS94". It is believed to be the same as the published version, ISO/IEC 15145:1997 [5]. This document adopts the official terminology of DPANS94 unless otherwise stated. Section numbers in that document are indicated by DPANS94
. When it refers to the IEEE floating-point word set, the term "required" in this document is to be understood in the context of the following two paragraphs from DPANS94 A.1.3.1, which discuss the meaning of "optional word set": The basic requirement is that if the implementor claims to have a particular optional word set the entire required portion of that word set must be available. If the implementor wishes to offer only part of an optional word set, it is acceptable to say, for example, "This system offers portions of the [named] word set", particularly if the selected or excluded words are itemized clearly. and Optional word sets may be offered in source form or otherwise factored so that the user may selectively load them. The current C99 standard [6-8], ISO/IEC 9899:1999, has a comprehensive treatment of IEEE 754-1985, which offers a route to implementation, for those Forth systems that can call C libraries. Section numbers from reference [7] are indicated by C99
. [**Bracketed statements like this are for editorial questions and comments, eventually to be removed.] 2 TERMINOLOGY AND NOTATION "fp" or "bfp": Short for "binary floating point". The Forth floating-point stack is called the "fp stack". "IEEE special datum", or an "IEEE special": Signed zero, a quiet or signaling signed nan, or signed infinity. "full IEEE set": For an IEEE binary format, the set of normal and subnormal numbers plus special data that it represents. "IEEE datum": Any member of a full IEEE set. "IEEE arithmetic": Arithmetic defined by IEEE 754-2008 for IEEE data. "affinely extended reals": Finite real numbers and +/-infinity, with -infinity < {every finite number} < +infinity. "nan load" or "nan payload": The value of the fractional bits in the binary format of a nan, excluding the quiet bit, considered as a positive integer. The smallest signaling load is unity, and the smallest quiet load is zero. "qnan", resp., "snan": A quiet or signaling nan, respectively, of any sign or load. "single": In the context of Forth fp, an IEEE 754-2008 32-bit interchange format. "double": In the context of Forth fp, an IEEE 754-2008 64-bit interchange format. "default": In the context of Forth fp, the float format for data that can appear on the fp stack. "exception": Used in the sense of IEEE 2.1.18. [**QUOTE An event that occurs when an operation on some particular operands has no outcome suitable for every reasonable application. That operation might signal one or more exceptions by invoking the default or, if explicitly requested, a language-defined alternate handling. Note that "event", "exception", and "signal" are defined in diverse ways in different programming environments. ] 2a IEEE BINARY FLOATING-POINT FORMATS Each IEEE bfp format has two fixed parameters, p > 0 (precision) and emax > 0 (maximum exponent), and defines emin = 1 - emax (minimum exponent). Each such format represents all real numbers of the form r = (-1)^s * 2^e * b_0.b_1 ... b_{p-1} where s = 0 or 1, emin <= e <= emax, b_i = 0 or 1, p = #significand bits. See Sec. A.2a for more information about IEEE bfp formats. 3 IMPLEMENTATION According to DPANS94, Section 3, "Usage requirements": A system shall provide all of the words defined in 6.1 Core Words. It may also provide any words defined in the optional word sets and extension word sets. The DPANS94 Floating-Point word set is an optional word set, and so is the word set described by this document. The word "shall" in the remainder of this document states a requirement when the environmental query for IEEE-FP returns true. "Should" means "strongly recommended". The internal fp representation of default fp data, i.e., data that can appear on the fp stack, shall correspond to one of the IEEE basic, or extended, full binary formats. 4 DATA TYPES For the purpose of this document, the DPANS94 r type is extended to include all IEEEE data for the default fp format. 5 ENVIRONMENTAL QUERIES [**CHANGES There are two queries instead of one. It seemed essential that there be a query returning the fp format parameters, even when not all words are present. ] Value String Data Type Constant? Meaning ------------------------------------------------------------------ IEEE-FP flag no IEEE and DPANS94 floating-point word sets present IEEE-FP-FORMAT d no in usual stack notation, the default format has IEEE parameters ( emax p ) ------------------------------------------------------------------ A true result for the IEEE-FP environmental query (not the data value) shall mean that any words that are present from the DPANS94 floating-point word set, the DPANS94 floating-point extensions word set, or the IEEE floating-point word set shall obey the specifications of this document. A false data value means that only a subset of the IEEE and DPANS94 word sets is present. A true result for the IEEE-FP-FORMAT query shall mean that the DPANS94 MAX-FLOAT query shall return true and the largest, finite IEEE number in the default format. Nothing in this document depends on the encoding of the format corresponding to emax and p. 6 TEXT INPUT 6.1 Constants ------------- +INF ( f: -- +Inf ) -INF ( f: -- -Inf ) +NAN ( f: -- +NaN ) -NAN ( f: -- -NaN ) These words return, respectively, IEEE signed infinity and the quiet signed nan with zero load, in the default format. See Sec. A.6.1 for more information about the encoding of NaN. 6.2 Decimal input ----------------- IEEE requires that conversion between text and binary fp formats shall include signed zero, signed infinity, and signed nans, with and without loads. See IEEE 5.4.2, "Conversion operations for floating-point formats and decimal character sequences", and IEEE 5.12, "Details of conversion between floating-point data and external character sequences". Conversion of nan loads is not included in this specification. Signed infinity and signed, unloaded nans are covered by the constants defined in Sec. 6.1. Signed zero is already included in the syntax specification in DPANS94 12.3.7, "Text input number conversion". When IEEE-FP is present, that specification shall be replaced by Convertible string := := []{ [.] | . } [**Note the extra . option.] := [] [**Note that a sign with no digits is still recognized, as in DPANS94.] := := * := { + | - } := { E | e } [**Note the extra "e" option.] := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 } Interpretation shall convert to the default IEEE format, with roundTiesToEven for real numbers, and subject to overflow and underflow exceptions. See IEEE 7.4 for overflow and IEEE 7.5 for underflow. 6.3 Hexadecimal input --------------------- IEEE requires a text format for numbers with a hexadecimal significand, and decimal radix two exponent, with exact conversion to and from binary fp formats where possible. See IEEE 5.12.3, "External hexadecimal-significand character sequences representing finite numbers". Conversion of that format is not included in this specification. 7 GLOSSARY Unless otherwise stated, all fp words that do computations or comparisons shall obey the requirements and recommendations of IEEE 5 and IEEE 6, for binary formats. 7.1 Conversion -------------- D>F ( d -- f: r ) The DPANS94 specification is amended to require that when d cannot be represented precisely in the default fp format, r shall be the roundTiesToEven value. [**IEEE and C99 background: IEEE does not seem to specify an equivalent. C99-n1256 6.3.1.4, paragraph 2 says: When a value of integer type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined. Note that d is always in range for any of the formats binary32, binary64, binary80, and binary128. ] F>D ( f: r -- s: d ) [**Suggested specificaton #1: The DPANS94 specification is amended to state that an ambiguous condition exists, not only when the integer part of r is not representable by a signed, double number, but also when r is a nan or infinity. ] [**Suggested specificaton #2: The DPANS94 specification is amended to require that a Forth exception be thrown when r is a nan or infinity, or has integer part out of range for a signed, double number. ] [**Suggested specificaton #3: The DPANS94 specification is amended to require that the invalid operation exception be quietly signaled when r is a nan or infinity, or has integer part out of range for a signed, double number. In that case the value of d is undefined. ] [**IEEE background: The DPANS94 version corresponds to convertToIntegerTowardZero(); see IEEE 5.8, "Details of conversions from floating-point to integer formats". The IEEE version requires that the invalid operation exception be signaled when r is a nan or infinity or out of range of the destination format. IEEE also requires convertToIntegerTiesToEven() convertToIntegerTowardPositive() convertToIntegerTowardNegative() convertToIntegerTiesToAway() plus versions of the five conversions that signal inexact when appropriate. ] [**Candidate for rationale section: Note that some of the other conversions in IEEE 5.8, "Details of conversions from floating-point to integer formats", are equivalent to Forth phrases such as "FROUND F>D". ] [** ALTERNATIVE 1: >FLOAT ( c-addr u -- [r: r s: true]|[false] ) The DPANS94 specification is extended to include IEEE specials, with modified syntax. Conversion shall be governed by IEEE 5.12, "Details of conversion between floating-point data and external character sequences", except for paragraph 4 on hexadeximal conversions. If the string represents a valid IEEE datum in the syntax below the datum r and true are returned. Otherwise only false is returned. A string of blanks [**or the empty string?] shall be treated as +0. Syntax of a convertible string := { [exponent] | } := []{ [.] | . } := := { | } := [] := { + | - } := { D | d | E | e } := []{ | } := { Inf | inf | INF | infinity | Infinity } := { NaN | nan | NAN } [**mh prefers a mandatory in . dnw favors keeping it optional, in line with the omnivorous philosophy of DPANS94 A.12.6.1.0558.] ] [**ALTERNATIVE 2: >FLOAT ( c-addr u -- [r: r s: true]|[false] ) The expression "the string represents a valid floating-point number" in the DPANS94 specification shall be interpreted to mean that it represents a finite number in the range of the default format. >IEEE-FLOAT ( c-addr u -- [r: r s: true]|[false] ) The specification is exactly that of >FLOAT in alternative 1. ] [**I prefer alternative 1. I mean, really, a "float" in the IEEE context is any IEEE default datum. But I would accept a variant of 2 if 1 should prove to be a show stopper.] REPRESENT ( f: r s: c-addr u -- n flag1 flag2 ) The DPANS94 specification is extended to include IEEE specials. The valid-result flag, flag2, is false if r is infinity or a nan. [**IMHO this is still unsettled, still needing discussion. 1. Should the current rounding mode be used, instead of roundTiesToEven? I'll have to check to what extent that has already been discussed. 2. Marcel says "It should be possible to format/output +Inf or +NaN, also with the existing REPRESENT." The DPANS94 spec for false flag2, that n and flag1 are implementation defined, does seem (perhaps intentionally?) well-suited to the spec that flag2 continues to be the negative sign flag, while n distinguishes among nans and infinity. For example, n = 0 means infinity, and n <> 0 means nan with implementation defined values related to quietness and load. ] SF! ( f: r s: sf-addr -- ) SF@ ( sf-addr -- f: f ) DF! ( f: r s: df-addr -- ) DF@ ( sf-addr -- f: f ) The specification for these DPANS94 words is amended to explicitly require conversion to or from the respective IEEE binary32 or binary64 interchange formats, with exact conversion in either direction for signed zero, signed infinity, and real numbers to a wider format, and with the current rounding mode for conversion of real numbers to a narrower format (see IEEE Sec. 5.4.2, formatOf-convertFormat). The conversion of nans shall preserve the sign and signaling bit, shall not signal an exception, and should treat payloads according to IEEE 6.2.3. The addresses sf-addr and df-addr are aligned according to SFALIGNED and DFALIGNED. The memory representation of the corresponding binary interchange formats is implementation defined [**AFAIK that's the common interpretation of IEEE]. [**CHANGE: current rounding mode is used instead of the former roundTiesToEven.] [**I think an RfD for XF@ and XF!, for binary80, is in order. And perhaps names like QF@ and QF!, for binary128, should be reserved.] 7.2 Output ---------- F. ( f: r -- ) FE. ( f: r -- ) FS. ( f: r -- ) The DPANS94 specification is extended to include IEEE specials, with output text of the appropriate form below, with implementation-dependent case sensitivity: [**Default rounding? Or use the current rounding mode?] []0{ E | e } []{ Inf | INF | inf } []{ NaN | NAN | nan } [**CHANGE: Removed nan load.] 7.3 Comparison -------------- IEEE has twenty-two required comparisons which apply to the full set of IEEE data. Twelve of these are quiet, and ten are signaling. See IEEE 5.6.1, "Comparisons", and IEEE 5.11, "Details of comparison predicates". This proposal requires only quiet comparisons, which do not raise exceptions, and of those, only a subset of five, which is sufficient for expressing all twelve. See Sec. A.7 for rationale and more information about the remaining comparisons, with high-level implementation examples. IEEE identifies four fundamental, mutually exclusive comparisons: less than ("<"), equal ("="), greater than (">"), and unordered (see rule 3 below). Each of these is true iff each of the others is false. The basic rules are the following: 1. The sign of zero is ignored. 2. The sign of infinity is not ignored, and is treated in the natural way for the "ordinary" comparisons with real numbers or infinity, namely <, =, and >. In particular, either signed infinity is equal to itself. 3. The unordered comparison is true iff at least one of its two arguments is a nan. That implies that any of the other three, "ordinary" comparisons involving a nan is false. The five required comparisons are "<", ">", "=", "<=", and ">=", where "<=" and ">=" stand for the usual phrases "less than or equal" and "greater than or equal". Note that familiar identities for real numbers are generally not satisfied by IEEE comparisons. For example, the negation of "<" is not the same as ">=". See Sec. A.7. F< ( f: r1 r2 -- s: [r1 ( f: r1 r2 -- s: [r1>r2]? ) F<= ( f: r1 r2 -- s: [r1<=r2]? ) F>= ( f: r1 r2 -- s: [r1>=r2]? ) F0< ( f: r1 r2 -- s: [r<0]? ) F0= ( f: r1 r2 -- s: [r=0]? ) F0> ( f: r1 r2 -- s: [r>0]? ) F0<= ( f: r1 r2 -- s: [r<=0]? ) F0>= ( f: r1 r2 -- s: [r>=0]? ) The data stack outputs are DPANS94 flags corresponding to the indicated IEEE predicates. In particular, the specifications for the existing DPANS94 words F<, F0<, and F0= are extended to include IEEE specials. [** All of the F0< family of words could be replaced by simple phrases like "0E F<", so maybe we shouldn't have them at all. An argument for keeping them is that the principle of least surprise demands the extension of the already exsisting DPANS94 F0= to IEEE, just like F<. Given that, the same principle demands that the names and meanings of the other five be reserved. An alternative might be to declare ambiguous conditions for F0= and F0< evaluated at IEEE specials, and remove all five from the IEEE-FP word set in favor of the corresponding phrases. That violates the principle of least surprise, because F< has no ambiguity. ] F~ ( f: r1 r2 r3 -- s: flag ) If the r3 is positive and not a nan or zero, flag is true iff the absolute value of r1 minus r2 is less than r3, taking into account IEEE arithmetic and comparison rules. [**+Inf is regarded as positive.] If r3 is signed zero, flag is true iff r1 and r2 have identical formats. If r3 is negative and not a nan or zero, flag is true iff the absolute value of r1 minus r2 is less than the absolute value of r3 times the sum of the absolute values of r1 and r2, taking into account IEEE arithmetic and comparison rules. [**-Inf is regarded as negative.] If r3 is a nan, flag is false. [** There is a lot of feeling that F~ is a nasty word. It has three distinct functions: 1. Do a comparison of the form: |r1 - r2| < |r3| 2. Do a comparison of the form: |r1 - r2| < |r3| * (|r1| + |r2|) 3. Test whether r1 and r2 have identical fp encodings. In the IEEE context, the specification above that flag be false when r3 is a nan is natural because of comparisons 1 and 2. In the IEEE context, something equivalent to comparison 3 is a convenience for comparing to signed zero, since F= ignores the sign of zero. It can be used to distinguish nan types and loads, although better tools might be designed for that. Basically, the above specification for r3 equal to signed zero seems the natural extension of the DPANS94 spec. Any alternative would have to specify some other behavior when any of r1, r2, or r3 is and IEEE special. Either way, IMHO, a separate RfD should be submitted to declare F~ obsolescent and implement its functionality with two or three new words. ] 7.4 Classification ------------------ IEEE 5.7.2, "General operations", requires a large number of classification operations. This documents defines only those corresponding to: isSignMinus isNormal isFinite isZero isSubnormal isInfinite isNaN isSignaling Actually isSignMinus corresponds to FSIGNBIT, and isZero corresponds to F0=, which leaves the following: FINITE? ( r: r -- s: [normal|subnormal]? ) FNORMAL? ( r: r -- s: normal? ) FSUBNORMAL? ( r: r -- s: subnormal? ) FINFINITE? ( r: r -- s: [+|-]Inf? ) FNAN? ( r: r -- s: nan? ) FSIGNALING? ( r: r -- s: snan? ) 7.5 Arithmetic -------------- See IEEE 5.4.1, "Arithmetic operations". F* ( f: r1 r2 -- r1*r2 ) F*+ ( f: r1 r2 r3 -- [r2*r3]+r1 ) [**new] F+ ( f: r1 r2 -- r1+r2 ) F- ( f: r1 r2 -- r1-r2 ) F/ ( f: r1 r2 -- r1/r2 ) FSQRT ( f: r -- sqrt[r] ) The DPANS94 specification is extended to IEEE arithmetic. See IEEE 5.1, "Overview" for precision, rounding, special data treatment, and exceptions. See IEEE 5.4.1, "Arithmetic operations", for the arithmetic words. [** F*+ is the Forth name for the IEEE required fusedMultiplyAdd operation.] [**Note that IEEE requires these to be "correctly rounded".] 7.6 Math functions ------------------- The Forth words FABS, FMAX, FMIN, and FSQRT are covered elsewhere. The DPANS94 specification for the following words is extended to adopt the corresponding IEEE behavior. See IEEE 9.2, "Recommended correctly rounded functions", and 9.2.1, "Special values". F** FACOS FACOSH FALOG FASIN FASINH FATAN FATAN2 FATANH FCOS FCOSH FEXP FEXPM1 FLN FLNP1 FLOG FSIN FSINCOS FSINH FSQRT FTAN FTANH IEEE recommends additional functions, whose recommended Forth names would be: FEXP2 FEXP2M1 FEXP10 FEXP10M1 FLOG2 FLOGP1 FLOG2P1 FHYPOT 1/FSQRT FCOMPOUND FROOTN F**N |F|** FSINPI FCOSPI FATANPI FATAN2PI [**Separate RfD for the new names?] 7.7 Sign bit operations ----------------------- FSIGNBIT ( f: r -- s: minus? ) This word corresponds to isSignMinus in IEEE 5.7.2, "General operations". The name is based on C99. The following are all required by IEEE. See IEEE 5.5.1, "Sign bit operations". The IEEE copy() function is superfluous in Forth [**IIUC]. FNEGATE ( f: r -- -r ) FABS ( f: r -- |r| ) The DPANS94 specification is extended to IEEE specials. FCOPYSIGN ( f: r1 r2 -- r3 ) The output r3 is r1 with its sign bit replaced by that of r2. 7.8 Nearest integer functions ----------------------------- FCEIL ( f: r1 -- r2 ) [**new] FLOOR ( f: r1 -- r2 ) FROUND ( f: r1 -- r2 ) FTRUNC ( f: r1 -- r2 ) These words correspond to the respective IEEE required operations: roundToIntegralTowardPositive roundToIntegralTowardNegative roundToIntegralTiesToEven roundToIntegralTowardZero See IEEE 5.3.1, "General operations" and 5.9, "Details of operations to round a floating-point datum to integral value". The names are based on C99. No word is defined for IEEE roundToIntegralTiesToAway. [**Separate RfD for FCEIL ?] FNEARBYINT ( f: r1 -- r2 ) This word corresponds to the IEEE required operation: roundToIntegralExact It performs the function of whichever of the other four corresponds to the current rounding mode, and shall be provided only if the rounding mode words are provided. 7.9 Number manipulation ------------------------ FMAX ( f: r1 r2 -- r3 ) FMIN ( f: r1 r2 -- r3 ) The DPANS94 specification is extended to IEEE specials. See minNum and maxNum in IEEE 5.3.1, "General operations" and 6.2, "Operations with NaNs". FNEXTUP ( f: r1 -- r2 ) When r1 is a nonzero real number, FNEXTUP returns the next affinely extended real in the default format that compares larger than r1. See IEEE 5.3.1, "General operations" for the behavior when r1 is an IEEE special. FNEXTDOWN is not defined. According to IEEE, nextDown(x) is -nextUp(-x). FSCALBN ( f: r s: n -- f: r*2^n ) The output is efficiently scaled by 2^n. See IEEE 5.3.3, "logBFormat operations". FLOGB ( f: r -- e ) Leave the radix-two exponent e of the fp representation as an fp integer. If r is subnormal, the exponent is computed as if r were normalized, with e < emin. See IEEE 5.3.3, "logBFormat operations" for treatment of IEEE specials. 7.10 Exceptions --------------- [**UNDER CONSTRUCTION This is an important section, because the IEEE treatment of exceptions is central to the philosophy of its treatment of nans, infinities, and subnormal numbers. AFAIU, that was strongly influenced by William Kahan. However, it doesn't follow that it needs to be in this document. It certainly needs discussion. ] 7.11 Rounding modes ------------------- [**UNDER CONSTRUCTION This section is currently under discussion in comp.lang.forth. See IEEE 9.3, "Operations on dynamic modes for attributes". Only words corresponding to 9.3.1, "Operations on individual dynamic modes", are expected to be implemented, and among those, roundTiesToAway is not expected to be implemented. ] 8 REFERENCES [1] "IEEE Standard for Floating-Point Arithmetic", approved June 12, 2008 as IEEE Std 754-2008: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935 [2] "DRAFT Standard for Floating-Point Arithmetic P754", IEEE 754 draft 1.2.9, January 27, 2007: http://www.validlab.com/754R/nonabelian.com/754/comments/Q754.129.pdf [3] Wikipedia, "IEEE 754-2008": http://en.wikipedia.org/wiki/IEEE_754 [4] ANSI X3.215-1994 final draft: http://www.taygeta.com/forth/dpans.html [5] ISO/IEC 15145:1997: http://webstore.ansi.org/RecordDetail.aspx?sku=ISO%2fIEC+15145%3a1997 http://www.iso.org/iso/catalogue_detail.htm?csnumber=26479 [6] ISO/IEC 9899:1999 (December 1, 1999), ISO/IEC 9899:1999 Cor. 1:2001(E), ISO/IEC 9899:1999 Cor. 2:2004(E), ISO/IEC 9899:1999 Cor. 3:2007(E): http://www.open-std.org/jtc1/sc22/wg14/www/standards.html#9899 [7] C99 + TC1 + TC2 + TC3 is included in the freely available WG14/N1256, September 7, 2007 [**thanks to David Thompson]: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf [8] Single UNIX 3, AFAICS duplicates the C99 library spec, with some things pinned down more tightly [**thanks to David Thompson]: http://www.unix.org/single_unix_specification/ A.2a IEEE BINARY FLOATING-POINT FORMATS IEEE 754-2008 defines three basic binary fp formats, binary32, binary64, and binary128, plus three corresponding extended binary formats, whose parameters are shown in Tables 1 and 2 below. It also defines the four binary interchange formats shown in Table 3, plus those with storage widths of more than 128 bits that are a multiple of 32 bits. Table 1: Parameters for IEEE 754-2008 basic binary formats. binary32 binary64 binary128 --------------------------------------- p = 24 53 113 emax = 127 1023 16383 Table 2: Parameters for IEEE 754-2008 extended binary formats. binary32 binary64 binary128 --------------------------------------- p >= 32 64 128 emax >= 1023 16383 65535 Table 3: Parameters for IEEE 754-2008 binary interchange formats (k is the storage width in bits). binary 16 binary32 binary64 binary128 --------------------------------------------------- k = 16 32 64 128 p = 11 24 53 113 emax = 15 127 1023 16383 Note that the intel 80-bit format corresponds to one of the extended binary64 formats, with p = 64 and emax = 16383. Its precision is greater than that of basic binary64 and less than that of basic binary128, with exponent range the same as basic binary128. Although it is not defined as a basic IEEE binary format, it may be called the "binary80" basic format. Its implementation normally differs from that of the other basic formats by having an explicit leading bit for normal and subnormal numbers. Binary interchange formats are logical formats, with unspecified memory layout. They all have an implicit leading bit for normal and subnormal numbers. Note that the binary128 interchange format is the only one in Table 3 that can contain the binary80 basic format. IEEE does not define a binary80 interchange format. A.6.1 NAN SIGNS AND LOADS IEEE allows the load for nan results like 0E 0E F/ to be anything, so the following does not necessarily give a zero load: 0E 0E F/ FABS FCONSTANT +NAN Aside from the ambiguous load, the FABS (extended to nan) is necessary here, because not only does IEEE not specify it, but both pfe and gforth actually give opposite signs for 0E 0E F/ under ppc (+) vs. intel (-) Mac OS X. They do both give quiet nans with zero load. As a matter of fact, the intel QNaN "floating-point indefinite" is the qnan with zero load and negative sign, according to "Intel(R) 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture", Table 4-3: http://www.intel.com/Assets/PDF/manual/253665.pdf A.7.3 COMPARISON (INFORMATIVE RATIONALE) [**Any high-level definitions in this section assume a separate floating-point stack.] The twelve IEEE required comparisons are the following, where "N" means logical negation, "?" stands for "unordered", and "?" stand for "less than or unordered" and "greater than or unordered": < = > ? N< N= N> N? <= >= ? Unfortunately, the common notation "?" for the unordered predicate clashes with Forth practice, where "?" usually [**always?] indicates a flag or test. The ? notation is used here as a convenience for IEEE predicates, and does not appear in any corresponding Forth names. The <= and >= comparisons are no longer simple negations of > and <, but are rather the AND's of those negations with N?. It can be shown that =), and >? is N(<=). See IEEE Table 5.3, "Required unordered-quiet predicates and negations". It is possible to implement all of the IEEE comparisons via high-level definitions in terms of a few low-level words, even fewer than the five required for this word set. The following remarks are offered as a guide to possibilities for the choice of low-level words. DPANS94 has only F< and F~. Since IEEE "=" is semantically different from 0E F~, low-level implementation of F= seems inevitable. The two words F< and F= are probably a minimum set for high-level implementation of the rest. For example, IEEE ">" is not expressible in terms of "<" and "=" plus logical operations, but F> can be defined as: : F> ( f: r1 r2 -- s: [r1>r2]? ) FSWAP F< ; FUNORDERED is not required by this document; in particular, the name is not reserved; but it can be defined as : FUNORDERED ( f: r1 r2 -- s: [r1?r2]? ) FDUP F= FDUP F= AND 0= ; The logical negations N<, N=, N>, and N? can be expressed with the Forth phrases "F< 0=", etc. Forth words for the <= and >= predicates can be defined as : F2DUP ( f: r1 r2 -- r1 r2 r1 r2 ) FOVER FOVER ; : F<= ( f: r1 r2 -- s: [r1<=r2]? ) F2DUP F< F= OR ; : F>= ( f: r1 r2 -- s: [r1>=r2]? ) F2DUP F> F= OR ; The ? predicates can be expressed by the Forth phrases "F>= 0=" and "F<= 0=". Thus all twelve IEEE predicates can be expressed with the five required words, and as few as two. [**Treat the following as a footnote.] It can be shown that the closure of the four fundamental IEEE comparison predicates under AND, OR, and negation consists of sixteen independent relations, including the twelve that IEEE requires. Two additional elements are the trivial, identically true and false relations, and the other two are "less than or greater than" and its negation, "unordered or equal". The five words of this word set, plus their five negations, implement the only nontrivial, transitive relations among the sixteen. On the other hand, several current cpu's have efficient operations for all four of the fundamental IEEE comparisons, <, >, =, and ?. Low-level implementations of at least F<, F>, and F=, would be natural for such systems. All of the words in the F0< family can be defined by analogy to : F0< ( f: r -- s: [r<0]? ) 0E F< ;