PROPOSAL FOR AN OPTIONAL IEEE 754, BINARY
FLOATING-POINT WORD SET
version 0.5.2
dnw 07-Jul-09
TABLE of CONTENTS
1 INTRODUCTION
2 TERMINOLOGY AND NOTATION
2a IEEE BINARY FLOATING-POINT FORMATS
3 IMPLEMENTATION
4 DATA TYPES
5 ENVIRONMENTAL QUERIES
6 TEXT INPUT
6.1 Constants
6.2 Decimal input
6.3 Hexadecimal input
7 GLOSSARY
7.1 Conversion
7.2 Output
7.3 Comparison
7.4 Classification
7.5 Arithmetic
7.6 Math functions
7.7 Sign bit operations
7.8 Nearest integer functions
7.9 Number manipulation
7.10 Exceptions
7.11 Rounding modes
8 REFERENCES
A.2a IEEE FLOATING-POINT FORMATS
A.6.1 NaN signs and loads
A.7.3 Comparison (informative rationale)
1 INTRODUCTION
This is a proposal for an optional Forth 200x word set, called
the "IEEE floating-point word set", that supports the binary
part of the IEEE 754-2008 standard for floating-point arithmetic
[1]. The most recent, freely available, but less comprehensive
version is IEEE 754 draft 1.2.9, January 27, 2007 [2]. There is
also a Wikipedia summary [3].
The standard [1] is hereafter referred to as "IEEE 754-2008",
with section numbers indicated by IEEE .
This specification requires that ISO Forth [4,5] floating-point
and floating-point extension words in the optional
floating-point word set, when present *with the IEEE
floating-point word set*, satisfy additional IEEE 754-2008
requirements. Words in that word set and this that correspond
to mathematical, including logical, operations or functions in
IEEE 754-2008 adopt the behavior required or recommended there
by reference, as far as that is possible and makes sense, unless
otherwise stated.
The specification is compatible with, rather than conformant to,
IEEE 754-2008, because it includes only a subset of the IEEE
requirements.
Reference [4], the final draft of "ANSI X3.215-1994, American
National Standard for Information Systems--Programming
Languages--Forth", is hereafter referred to as "DPANS94". It is
believed to be the same as the published version, ISO/IEC
15145:1997 [5]. This document adopts the official terminology
of DPANS94 unless otherwise stated. Section numbers in that
document are indicated by DPANS94 .
When it refers to the IEEE floating-point word set, the term
"required" in this document is to be understood in the context
of the following two paragraphs from DPANS94 A.1.3.1, which
discuss the meaning of "optional word set":
The basic requirement is that if the implementor claims to
have a particular optional word set the entire required
portion of that word set must be available. If the
implementor wishes to offer only part of an optional word set,
it is acceptable to say, for example, "This system offers
portions of the [named] word set", particularly if the
selected or excluded words are itemized clearly.
and
Optional word sets may be offered in source form or otherwise
factored so that the user may selectively load them.
The current C99 standard [6-8], ISO/IEC 9899:1999, has a
comprehensive treatment of IEEE 754-1985, which offers a route
to implementation, for those Forth systems that can call C
libraries. Section numbers from reference [7] are indicated by
C99 .
[**Bracketed statements like this are for editorial questions
and comments, eventually to be removed.]
2 TERMINOLOGY AND NOTATION
"fp" or "bfp": Short for "binary floating point". The Forth
floating-point stack is called the "fp stack".
"IEEE special datum", or an "IEEE special": Signed zero, a
quiet or signaling signed nan, or signed infinity.
"full IEEE set": For an IEEE binary format, the set of normal
and subnormal numbers plus special data that it represents.
"IEEE datum": Any member of a full IEEE set.
"IEEE arithmetic": Arithmetic defined by IEEE 754-2008 for IEEE
data.
"affinely extended reals": Finite real numbers and +/-infinity,
with -infinity < {every finite number} < +infinity.
"nan load" or "nan payload": The value of the fractional bits
in the binary format of a nan, excluding the quiet bit,
considered as a positive integer. The smallest signaling load
is unity, and the smallest quiet load is zero.
"qnan", resp., "snan": A quiet or signaling nan, respectively,
of any sign or load.
"single": In the context of Forth fp, an IEEE 754-2008 32-bit
interchange format.
"double": In the context of Forth fp, an IEEE 754-2008 64-bit
interchange format.
"default": In the context of Forth fp, the float format for
data that can appear on the fp stack.
"exception": Used in the sense of IEEE 2.1.18.
[**QUOTE
An event that occurs when an operation on some particular
operands has no outcome suitable for every reasonable
application. That operation might signal one or more
exceptions by invoking the default or, if explicitly
requested, a language-defined alternate handling. Note that
"event", "exception", and "signal" are defined in diverse
ways in different programming environments.
]
2a IEEE BINARY FLOATING-POINT FORMATS
Each IEEE bfp format has two fixed parameters, p > 0 (precision)
and emax > 0 (maximum exponent), and defines emin = 1 - emax
(minimum exponent). Each such format represents all real
numbers of the form
r = (-1)^s * 2^e * b_0.b_1 ... b_{p-1}
where
s = 0 or 1, emin <= e <= emax,
b_i = 0 or 1, p = #significand bits.
See Sec. A.2a for more information about IEEE bfp formats.
3 IMPLEMENTATION
According to DPANS94, Section 3, "Usage requirements":
A system shall provide all of the words defined in 6.1 Core
Words. It may also provide any words defined in the optional
word sets and extension word sets.
The DPANS94 Floating-Point word set is an optional word set,
and so is the word set described by this document.
The word "shall" in the remainder of this document states a
requirement when the environmental query for IEEE-FP returns
true. "Should" means "strongly recommended".
The internal fp representation of default fp data, i.e., data
that can appear on the fp stack, shall correspond to one of the
IEEE basic, or extended, full binary formats.
4 DATA TYPES
For the purpose of this document, the DPANS94 r type is extended
to include all IEEEE data for the default fp format.
5 ENVIRONMENTAL QUERIES
[**CHANGES
There are two queries instead of one. It seemed essential that
there be a query returning the fp format parameters, even when
not all words are present.
]
Value
String Data Type Constant? Meaning
------------------------------------------------------------------
IEEE-FP flag no IEEE and DPANS94 floating-point word
sets present
IEEE-FP-FORMAT d no in usual stack notation, the default
format has IEEE parameters ( emax p )
------------------------------------------------------------------
A true result for the IEEE-FP environmental query (not the data
value) shall mean that any words that are present from the
DPANS94 floating-point word set, the DPANS94 floating-point
extensions word set, or the IEEE floating-point word set shall
obey the specifications of this document.
A false data value means that only a subset of the IEEE and
DPANS94 word sets is present.
A true result for the IEEE-FP-FORMAT query shall mean that the
DPANS94 MAX-FLOAT query shall return true and the largest,
finite IEEE number in the default format.
Nothing in this document depends on the encoding of the format
corresponding to emax and p.
6 TEXT INPUT
6.1 Constants
-------------
+INF ( f: -- +Inf )
-INF ( f: -- -Inf )
+NAN ( f: -- +NaN )
-NAN ( f: -- -NaN )
These words return, respectively, IEEE signed infinity and the quiet
signed nan with zero load, in the default format.
See Sec. A.6.1 for more information about the encoding of NaN.
6.2 Decimal input
-----------------
IEEE requires that conversion between text and binary fp formats
shall include signed zero, signed infinity, and signed nans,
with and without loads. See IEEE 5.4.2, "Conversion operations
for floating-point formats and decimal character sequences", and
IEEE 5.12, "Details of conversion between floating-point data
and external character sequences".
Conversion of nan loads is not included in this specification.
Signed infinity and signed, unloaded nans are covered by the
constants defined in Sec. 6.1. Signed zero is already included
in the syntax specification in DPANS94 12.3.7, "Text input
number conversion". When IEEE-FP is present, that specification
shall be replaced by
Convertible string :=
:= []{ [.] | . }
[**Note the extra . option.]
:= []
[**Note that a sign with no digits is still
recognized, as in DPANS94.]
:=
:= *
:= { + | - }
:= { E | e }
[**Note the extra "e" option.]
:= { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }
Interpretation shall convert to the default IEEE format, with
roundTiesToEven for real numbers, and subject to overflow and
underflow exceptions. See IEEE 7.4 for overflow and IEEE 7.5
for underflow.
6.3 Hexadecimal input
---------------------
IEEE requires a text format for numbers with a hexadecimal
significand, and decimal radix two exponent, with exact
conversion to and from binary fp formats where possible. See
IEEE 5.12.3, "External hexadecimal-significand character
sequences representing finite numbers".
Conversion of that format is not included in this specification.
7 GLOSSARY
Unless otherwise stated, all fp words that do computations or
comparisons shall obey the requirements and recommendations of
IEEE 5 and IEEE 6, for binary formats.
7.1 Conversion
--------------
D>F ( d -- f: r )
The DPANS94 specification is amended to require that when d
cannot be represented precisely in the default fp format, r
shall be the roundTiesToEven value.
[**IEEE and C99 background:
IEEE does not seem to specify an equivalent. C99-n1256 6.3.1.4,
paragraph 2 says:
When a value of integer type is converted to a real floating
type, if the value being converted can be represented exactly
in the new type, it is unchanged. If the value being
converted is in the range of values that can be represented
but cannot be represented exactly, the result is either the
nearest higher or nearest lower representable value, chosen in
an implementation-defined manner. If the value being
converted is outside the range of values that can be
represented, the behavior is undefined.
Note that d is always in range for any of the formats
binary32, binary64, binary80, and binary128.
]
F>D ( f: r -- s: d )
[**Suggested specificaton #1:
The DPANS94 specification is amended to state that an
ambiguous condition exists, not only when the integer part of
r is not representable by a signed, double number, but also
when r is a nan or infinity.
]
[**Suggested specificaton #2:
The DPANS94 specification is amended to require that a Forth
exception be thrown when r is a nan or infinity, or has
integer part out of range for a signed, double number.
]
[**Suggested specificaton #3:
The DPANS94 specification is amended to require that the
invalid operation exception be quietly signaled when r is a
nan or infinity, or has integer part out of range for a
signed, double number. In that case the value of d is
undefined.
]
[**IEEE background:
The DPANS94 version corresponds to
convertToIntegerTowardZero(); see IEEE 5.8, "Details of
conversions from floating-point to integer formats". The IEEE
version requires that the invalid operation exception be
signaled when r is a nan or infinity or out of range of the
destination format.
IEEE also requires
convertToIntegerTiesToEven()
convertToIntegerTowardPositive()
convertToIntegerTowardNegative()
convertToIntegerTiesToAway()
plus versions of the five conversions that signal inexact when
appropriate.
]
[**Candidate for rationale section:
Note that some of the other conversions in IEEE 5.8, "Details
of conversions from floating-point to integer formats", are
equivalent to Forth phrases such as "FROUND F>D".
]
[** ALTERNATIVE 1:
>FLOAT ( c-addr u -- [r: r s: true]|[false] )
The DPANS94 specification is extended to include IEEE
specials, with modified syntax. Conversion shall be governed
by IEEE 5.12, "Details of conversion between floating-point
data and external character sequences", except for paragraph 4
on hexadeximal conversions. If the string represents a valid
IEEE datum in the syntax below the datum r and true are
returned. Otherwise only false is returned.
A string of blanks [**or the empty string?] shall be treated
as +0.
Syntax of a convertible string := { [exponent]
| }
:= []{ [.] | . }
:=
:= { | }
:= []
:= { + | - }
:= { D | d | E | e }
:= []{ | }
:= { Inf | inf | INF | infinity | Infinity }
:= { NaN | nan | NAN }
[**mh prefers a mandatory in . dnw favors
keeping it optional, in line with the omnivorous philosophy of
DPANS94 A.12.6.1.0558.]
]
[**ALTERNATIVE 2:
>FLOAT ( c-addr u -- [r: r s: true]|[false] )
The expression "the string represents a valid floating-point
number" in the DPANS94 specification shall be interpreted to
mean that it represents a finite number in the range of the
default format.
>IEEE-FLOAT ( c-addr u -- [r: r s: true]|[false] )
The specification is exactly that of >FLOAT in alternative 1.
]
[**I prefer alternative 1. I mean, really, a "float" in the
IEEE context is any IEEE default datum. But I would accept a
variant of 2 if 1 should prove to be a show stopper.]
REPRESENT ( f: r s: c-addr u -- n flag1 flag2 )
The DPANS94 specification is extended to include IEEE
specials. The valid-result flag, flag2, is false if r is
infinity or a nan.
[**IMHO this is still unsettled, still needing discussion.
1. Should the current rounding mode be used, instead of
roundTiesToEven? I'll have to check to what extent that
has already been discussed.
2. Marcel says "It should be possible to format/output +Inf or
+NaN, also with the existing REPRESENT." The DPANS94 spec
for false flag2, that n and flag1 are implementation
defined, does seem (perhaps intentionally?) well-suited to
the spec that flag2 continues to be the negative sign flag,
while n distinguishes among nans and infinity. For
example, n = 0 means infinity, and n <> 0 means nan with
implementation defined values related to quietness and
load.
]
SF! ( f: r s: sf-addr -- )
SF@ ( sf-addr -- f: f )
DF! ( f: r s: df-addr -- )
DF@ ( sf-addr -- f: f )
The specification for these DPANS94 words is amended to
explicitly require conversion to or from the respective IEEE
binary32 or binary64 interchange formats, with exact
conversion in either direction for signed zero, signed
infinity, and real numbers to a wider format, and with
the current rounding mode for conversion of real numbers to a
narrower format (see IEEE Sec. 5.4.2, formatOf-convertFormat).
The conversion of nans shall preserve the sign and signaling
bit, shall not signal an exception, and should treat payloads
according to IEEE 6.2.3.
The addresses sf-addr and df-addr are aligned according to
SFALIGNED and DFALIGNED. The memory representation of the
corresponding binary interchange formats is implementation
defined [**AFAIK that's the common interpretation of IEEE].
[**CHANGE: current rounding mode is used instead of the
former roundTiesToEven.]
[**I think an RfD for XF@ and XF!, for binary80, is in order.
And perhaps names like QF@ and QF!, for binary128, should be
reserved.]
7.2 Output
----------
F. ( f: r -- )
FE. ( f: r -- )
FS. ( f: r -- )
The DPANS94 specification is extended to include IEEE
specials, with output text of the appropriate form below, with
implementation-dependent case sensitivity:
[**Default rounding? Or use the current rounding mode?]
[]0{ E | e }
[]{ Inf | INF | inf }
[]{ NaN | NAN | nan }
[**CHANGE: Removed nan load.]
7.3 Comparison
--------------
IEEE has twenty-two required comparisons which apply to the full
set of IEEE data. Twelve of these are quiet, and ten are
signaling. See IEEE 5.6.1, "Comparisons", and IEEE 5.11,
"Details of comparison predicates".
This proposal requires only quiet comparisons, which do not
raise exceptions, and of those, only a subset of five, which is
sufficient for expressing all twelve.
See Sec. A.7 for rationale and more information about the
remaining comparisons, with high-level implementation examples.
IEEE identifies four fundamental, mutually exclusive
comparisons: less than ("<"), equal ("="), greater than (">"),
and unordered (see rule 3 below). Each of these is true iff
each of the others is false.
The basic rules are the following:
1. The sign of zero is ignored.
2. The sign of infinity is not ignored, and is treated in the
natural way for the "ordinary" comparisons with real numbers
or infinity, namely <, =, and >. In particular, either
signed infinity is equal to itself.
3. The unordered comparison is true iff at least one of its two
arguments is a nan. That implies that any of the other
three, "ordinary" comparisons involving a nan is false.
The five required comparisons are "<", ">", "=", "<=", and ">=",
where "<=" and ">=" stand for the usual phrases "less than or
equal" and "greater than or equal". Note that familiar
identities for real numbers are generally not satisfied by IEEE
comparisons. For example, the negation of "<" is not the same
as ">=". See Sec. A.7.
F< ( f: r1 r2 -- s: [r1 ( f: r1 r2 -- s: [r1>r2]? )
F<= ( f: r1 r2 -- s: [r1<=r2]? )
F>= ( f: r1 r2 -- s: [r1>=r2]? )
F0< ( f: r1 r2 -- s: [r<0]? )
F0= ( f: r1 r2 -- s: [r=0]? )
F0> ( f: r1 r2 -- s: [r>0]? )
F0<= ( f: r1 r2 -- s: [r<=0]? )
F0>= ( f: r1 r2 -- s: [r>=0]? )
The data stack outputs are DPANS94 flags corresponding to the
indicated IEEE predicates. In particular, the specifications
for the existing DPANS94 words F<, F0<, and F0= are extended
to include IEEE specials.
[**
All of the F0< family of words could be replaced by simple
phrases like "0E F<", so maybe we shouldn't have them at all.
An argument for keeping them is that the principle of least
surprise demands the extension of the already exsisting DPANS94
F0= to IEEE, just like F<. Given that, the same principle
demands that the names and meanings of the other five be
reserved.
An alternative might be to declare ambiguous conditions for F0=
and F0< evaluated at IEEE specials, and remove all five from the
IEEE-FP word set in favor of the corresponding phrases. That
violates the principle of least surprise, because F< has no
ambiguity.
]
F~ ( f: r1 r2 r3 -- s: flag )
If the r3 is positive and not a nan or zero, flag is true iff
the absolute value of r1 minus r2 is less than r3, taking into
account IEEE arithmetic and comparison rules. [**+Inf is
regarded as positive.]
If r3 is signed zero, flag is true iff r1 and r2 have
identical formats.
If r3 is negative and not a nan or zero, flag is true iff the
absolute value of r1 minus r2 is less than the absolute value
of r3 times the sum of the absolute values of r1 and r2,
taking into account IEEE arithmetic and comparison rules.
[**-Inf is regarded as negative.]
If r3 is a nan, flag is false.
[**
There is a lot of feeling that F~ is a nasty word. It has
three distinct functions:
1. Do a comparison of the form:
|r1 - r2| < |r3|
2. Do a comparison of the form:
|r1 - r2| < |r3| * (|r1| + |r2|)
3. Test whether r1 and r2 have identical fp encodings.
In the IEEE context, the specification above that flag be
false when r3 is a nan is natural because of comparisons 1 and 2.
In the IEEE context, something equivalent to comparison 3 is a
convenience for comparing to signed zero, since F= ignores the
sign of zero. It can be used to distinguish nan types and
loads, although better tools might be designed for that.
Basically, the above specification for r3 equal to signed zero
seems the natural extension of the DPANS94 spec.
Any alternative would have to specify some other behavior when
any of r1, r2, or r3 is and IEEE special.
Either way, IMHO, a separate RfD should be submitted to
declare F~ obsolescent and implement its functionality with
two or three new words.
]
7.4 Classification
------------------
IEEE 5.7.2, "General operations", requires a large number of
classification operations. This documents defines only those
corresponding to:
isSignMinus
isNormal
isFinite
isZero
isSubnormal
isInfinite
isNaN
isSignaling
Actually isSignMinus corresponds to FSIGNBIT, and isZero
corresponds to F0=, which leaves the following:
FINITE? ( r: r -- s: [normal|subnormal]? )
FNORMAL? ( r: r -- s: normal? )
FSUBNORMAL? ( r: r -- s: subnormal? )
FINFINITE? ( r: r -- s: [+|-]Inf? )
FNAN? ( r: r -- s: nan? )
FSIGNALING? ( r: r -- s: snan? )
7.5 Arithmetic
--------------
See IEEE 5.4.1, "Arithmetic operations".
F* ( f: r1 r2 -- r1*r2 )
F*+ ( f: r1 r2 r3 -- [r2*r3]+r1 ) [**new]
F+ ( f: r1 r2 -- r1+r2 )
F- ( f: r1 r2 -- r1-r2 )
F/ ( f: r1 r2 -- r1/r2 )
FSQRT ( f: r -- sqrt[r] )
The DPANS94 specification is extended to IEEE arithmetic. See
IEEE 5.1, "Overview" for precision, rounding, special data
treatment, and exceptions. See IEEE 5.4.1, "Arithmetic
operations", for the arithmetic words.
[** F*+ is the Forth name for the IEEE required
fusedMultiplyAdd operation.]
[**Note that IEEE requires these to be "correctly rounded".]
7.6 Math functions
-------------------
The Forth words FABS, FMAX, FMIN, and FSQRT are covered
elsewhere.
The DPANS94 specification for the following words is extended to
adopt the corresponding IEEE behavior. See IEEE 9.2, "Recommended
correctly rounded functions", and 9.2.1, "Special values".
F** FACOS FACOSH FALOG FASIN FASINH FATAN FATAN2
FATANH FCOS FCOSH FEXP FEXPM1 FLN FLNP1 FLOG FSIN
FSINCOS FSINH FSQRT FTAN FTANH
IEEE recommends additional functions, whose recommended Forth
names would be:
FEXP2 FEXP2M1 FEXP10 FEXP10M1
FLOG2 FLOGP1 FLOG2P1 FHYPOT 1/FSQRT
FCOMPOUND FROOTN F**N |F|**
FSINPI FCOSPI FATANPI FATAN2PI
[**Separate RfD for the new names?]
7.7 Sign bit operations
-----------------------
FSIGNBIT ( f: r -- s: minus? )
This word corresponds to isSignMinus in IEEE 5.7.2, "General
operations". The name is based on C99.
The following are all required by IEEE. See IEEE 5.5.1, "Sign
bit operations". The IEEE copy() function is superfluous in
Forth [**IIUC].
FNEGATE ( f: r -- -r )
FABS ( f: r -- |r| )
The DPANS94 specification is extended to IEEE specials.
FCOPYSIGN ( f: r1 r2 -- r3 )
The output r3 is r1 with its sign bit replaced by that of r2.
7.8 Nearest integer functions
-----------------------------
FCEIL ( f: r1 -- r2 ) [**new]
FLOOR ( f: r1 -- r2 )
FROUND ( f: r1 -- r2 )
FTRUNC ( f: r1 -- r2 )
These words correspond to the respective IEEE required
operations:
roundToIntegralTowardPositive
roundToIntegralTowardNegative
roundToIntegralTiesToEven
roundToIntegralTowardZero
See IEEE 5.3.1, "General operations" and 5.9, "Details of
operations to round a floating-point datum to integral value".
The names are based on C99. No word is defined for IEEE
roundToIntegralTiesToAway.
[**Separate RfD for FCEIL ?]
FNEARBYINT ( f: r1 -- r2 )
This word corresponds to the IEEE required operation:
roundToIntegralExact
It performs the function of whichever of the other four
corresponds to the current rounding mode, and shall be
provided only if the rounding mode words are provided.
7.9 Number manipulation
------------------------
FMAX ( f: r1 r2 -- r3 )
FMIN ( f: r1 r2 -- r3 )
The DPANS94 specification is extended to IEEE specials. See
minNum and maxNum in IEEE 5.3.1, "General operations" and 6.2,
"Operations with NaNs".
FNEXTUP ( f: r1 -- r2 )
When r1 is a nonzero real number, FNEXTUP returns the next
affinely extended real in the default format that compares
larger than r1. See IEEE 5.3.1, "General operations" for the
behavior when r1 is an IEEE special. FNEXTDOWN is not
defined. According to IEEE, nextDown(x) is -nextUp(-x).
FSCALBN ( f: r s: n -- f: r*2^n )
The output is efficiently scaled by 2^n. See IEEE 5.3.3,
"logBFormat operations".
FLOGB ( f: r -- e )
Leave the radix-two exponent e of the fp representation as an
fp integer. If r is subnormal, the exponent is computed as if
r were normalized, with e < emin. See IEEE 5.3.3, "logBFormat
operations" for treatment of IEEE specials.
7.10 Exceptions
---------------
[**UNDER CONSTRUCTION
This is an important section, because the IEEE treatment of
exceptions is central to the philosophy of its treatment of
nans, infinities, and subnormal numbers. AFAIU, that was
strongly influenced by William Kahan.
However, it doesn't follow that it needs to be in this document.
It certainly needs discussion.
]
7.11 Rounding modes
-------------------
[**UNDER CONSTRUCTION
This section is currently under discussion in comp.lang.forth.
See IEEE 9.3, "Operations on dynamic modes for attributes". Only
words corresponding to 9.3.1, "Operations on individual dynamic
modes", are expected to be implemented, and among those,
roundTiesToAway is not expected to be implemented.
]
8 REFERENCES
[1] "IEEE Standard for Floating-Point Arithmetic", approved June
12, 2008 as IEEE Std 754-2008:
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935
[2] "DRAFT Standard for Floating-Point Arithmetic P754", IEEE
754 draft 1.2.9, January 27, 2007:
http://www.validlab.com/754R/nonabelian.com/754/comments/Q754.129.pdf
[3] Wikipedia, "IEEE 754-2008":
http://en.wikipedia.org/wiki/IEEE_754
[4] ANSI X3.215-1994 final draft:
http://www.taygeta.com/forth/dpans.html
[5] ISO/IEC 15145:1997:
http://webstore.ansi.org/RecordDetail.aspx?sku=ISO%2fIEC+15145%3a1997
http://www.iso.org/iso/catalogue_detail.htm?csnumber=26479
[6] ISO/IEC 9899:1999 (December 1, 1999),
ISO/IEC 9899:1999 Cor. 1:2001(E),
ISO/IEC 9899:1999 Cor. 2:2004(E),
ISO/IEC 9899:1999 Cor. 3:2007(E):
http://www.open-std.org/jtc1/sc22/wg14/www/standards.html#9899
[7] C99 + TC1 + TC2 + TC3 is included in the freely available
WG14/N1256, September 7, 2007 [**thanks to David Thompson]:
http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
[8] Single UNIX 3, AFAICS duplicates the C99 library spec, with
some things pinned down more tightly [**thanks to David
Thompson]:
http://www.unix.org/single_unix_specification/
A.2a IEEE BINARY FLOATING-POINT FORMATS
IEEE 754-2008 defines three basic binary fp formats, binary32,
binary64, and binary128, plus three corresponding extended
binary formats, whose parameters are shown in Tables 1 and 2
below. It also defines the four binary interchange formats
shown in Table 3, plus those with storage widths of more than
128 bits that are a multiple of 32 bits.
Table 1: Parameters for IEEE 754-2008
basic binary formats.
binary32 binary64 binary128
---------------------------------------
p = 24 53 113
emax = 127 1023 16383
Table 2: Parameters for IEEE 754-2008
extended binary formats.
binary32 binary64 binary128
---------------------------------------
p >= 32 64 128
emax >= 1023 16383 65535
Table 3: Parameters for IEEE 754-2008
binary interchange formats
(k is the storage width in
bits).
binary 16 binary32 binary64 binary128
---------------------------------------------------
k = 16 32 64 128
p = 11 24 53 113
emax = 15 127 1023 16383
Note that the intel 80-bit format corresponds to one of the
extended binary64 formats, with p = 64 and emax = 16383. Its
precision is greater than that of basic binary64 and less than
that of basic binary128, with exponent range the same as basic
binary128. Although it is not defined as a basic IEEE binary
format, it may be called the "binary80" basic format. Its
implementation normally differs from that of the other basic
formats by having an explicit leading bit for normal and
subnormal numbers.
Binary interchange formats are logical formats, with unspecified
memory layout. They all have an implicit leading bit for normal
and subnormal numbers.
Note that the binary128 interchange format is the only one in
Table 3 that can contain the binary80 basic format. IEEE does
not define a binary80 interchange format.
A.6.1 NAN SIGNS AND LOADS
IEEE allows the load for nan results like 0E 0E F/ to be
anything, so the following does not necessarily give a zero
load:
0E 0E F/ FABS FCONSTANT +NAN
Aside from the ambiguous load, the FABS (extended to nan) is
necessary here, because not only does IEEE not specify it, but
both pfe and gforth actually give opposite signs for 0E 0E F/
under ppc (+) vs. intel (-) Mac OS X. They do both give quiet
nans with zero load.
As a matter of fact, the intel QNaN "floating-point indefinite"
is the qnan with zero load and negative sign, according to
"Intel(R) 64 and IA-32 Architectures Software Developer's
Manual, Volume 1: Basic Architecture", Table 4-3:
http://www.intel.com/Assets/PDF/manual/253665.pdf
A.7.3 COMPARISON (INFORMATIVE RATIONALE)
[**Any high-level definitions in this section assume a separate
floating-point stack.]
The twelve IEEE required comparisons are the following, where
"N" means logical negation, "?" stands for "unordered", and "?" stand for "less than or unordered" and "greater than or
unordered":
< = > ? N< N= N> N? <= >= ?
Unfortunately, the common notation "?" for the unordered
predicate clashes with Forth practice, where "?" usually
[**always?] indicates a flag or test. The ? notation is used
here as a convenience for IEEE predicates, and does not appear
in any corresponding Forth names.
The <= and >= comparisons are no longer simple negations of >
and <, but are rather the AND's of those negations with N?. It
can be shown that =), and >? is N(<=). See IEEE Table
5.3, "Required unordered-quiet predicates and negations".
It is possible to implement all of the IEEE comparisons via
high-level definitions in terms of a few low-level words, even
fewer than the five required for this word set. The following
remarks are offered as a guide to possibilities for the choice
of low-level words.
DPANS94 has only F< and F~. Since IEEE "=" is semantically
different from 0E F~, low-level implementation of F= seems
inevitable. The two words F< and F= are probably a minimum set
for high-level implementation of the rest.
For example, IEEE ">" is not expressible in terms of "<" and "="
plus logical operations, but F> can be defined as:
: F> ( f: r1 r2 -- s: [r1>r2]? ) FSWAP F< ;
FUNORDERED is not required by this document; in particular, the
name is not reserved; but it can be defined as
: FUNORDERED ( f: r1 r2 -- s: [r1?r2]? )
FDUP F= FDUP F= AND 0= ;
The logical negations N<, N=, N>, and N? can be expressed with
the Forth phrases "F< 0=", etc.
Forth words for the <= and >= predicates can be defined as
: F2DUP ( f: r1 r2 -- r1 r2 r1 r2 ) FOVER FOVER ;
: F<= ( f: r1 r2 -- s: [r1<=r2]? ) F2DUP F< F= OR ;
: F>= ( f: r1 r2 -- s: [r1>=r2]? ) F2DUP F> F= OR ;
The ? predicates can be expressed by the Forth
phrases "F>= 0=" and "F<= 0=".
Thus all twelve IEEE predicates can be expressed with the five
required words, and as few as two.
[**Treat the following as a footnote.]
It can be shown that the closure of the four fundamental IEEE
comparison predicates under AND, OR, and negation consists of
sixteen independent relations, including the twelve that IEEE
requires. Two additional elements are the trivial,
identically true and false relations, and the other two are
"less than or greater than" and its negation, "unordered or
equal". The five words of this word set, plus their five
negations, implement the only nontrivial, transitive relations
among the sixteen.
On the other hand, several current cpu's have efficient
operations for all four of the fundamental IEEE comparisons, <,
>, =, and ?. Low-level implementations of at least F<, F>, and
F=, would be natural for such systems.
All of the words in the F0< family can be defined by analogy to
: F0< ( f: r -- s: [r<0]? ) 0E F< ;