( Title: Paartial Reference Implementation
for IEEE 754-2008 Floating Point
File: ieeefp-ref.fs
Test file: ieeefp-ref-test.fs
Version: 0.5.0
Revised: December 23, 2020
Author: David N. Williams
License: Creative Commons CC0 1.0 Universal
The revision date may reflect cosmetic changes not logged below.
Version 0.5.0
11Sep20 * Started.
* Added bit manipulation constants, RAW-CONSTANT, RAW-DROP, RAW=.
* Added classification queries.
12Sep20 * Added the 10 raw constants with positive sign, |RAW|.
* Rewrote classification queries.
15Sep20 * Started comparisons with T-F<.
18Sep20 * Finished T-F<, a serious effort.
19Seo20 * Added: T-F> T-F= T-F<= T-F>=
30Sep20 * Made into a module named IEEEFP-REF.
* Removed T- prefixes.
1Oct20 * Added: FDATUM= FABSDIFF< F~ FABS
2Oct20 * Added: FNEGATE FCOPYSIGN FMAX FMIN
3Oct20 * Rewrote analysis of FMAX and FMIN, not easy.
9Oct20 * Added: RAW-1+ RAW-1- FNEXTUP FNEXTDOWN -MAX-N-RAW
10Oct20 * Added 80-bit gap handling to FNEXTUP.
11Oct20 * Added: FSNAN? FQNAN?
12Oct20 * Moved raw constants and RAW-DROP into rawfloat.fs.
15Oct20 * Replaced 32-BITS/CELL? by bits/cell expression.
25Oct20 * Added: #LZB 2#LZB 4#LZB
* Added fraction masks and RAW-FRAC-#LZB.
26Oct20 * Added: prec emax emin
27Oct20 * Added exp masks and FLOGB.
28Oct20 * Added FILOGB.
29Oct20 * Moved QUIET from tests to here.
* Moved to float-bit-constants.fs: prec emax emin
* Added quieting to: FMAX FMIN FLOGB
23Dec20 * Added missing 128-bit code for RAW-1+ and RAW-1-;
\ *** OVERVIEW
This library works with 32|64-bit cells, and a total big|little-endian
binary floating-point format of the same endianness as that for cpu
integers.
It provides reference implementations of selected parts of "Proposal for
an Optional IEEE 754, Binary Floating-Point Word Set", draft version
0.5.5, August 31, 2009. A copy can be found at:
http://www.umich.edu/~williams/archive/forth/ieeefp-drafts/
Further development of the proosal is being taken up by Krisnha Myneni:
https://forth-standard.org/~KrishnaMyneni
This library loads rawfloat.fs, a small libary based on a multi-cell,
cell-wise big-endian representation on the data stack of the Forth
system default binary floating-point format, called the raw-float
format, to get convenient access its bits. The rawfloat library
requires that the default format have a width of 32, 64, 80, or 128
bits, corresponding to an IEEE 754-2008 single, double, [intel] extended
double, or quad format.
The library has not been tested with quad floats, nor on a big-endian
system.
Much of the action has to do with the extended ieee data types, signed
zero, signed infinity, and the various nans. Because the reference
implementation treats these quantities through the raw-float format, it
is expected to work on Forth systems which do not explicitly implement
extended data.
There is, however, an assumption that the cpu has IEEE 754 floating
point, and that the Forth system has an implementation of the optional
Floating-Point word set for which F@ amd F! properly handle extended
data.
The test file ieeefp-ext-test.fs assumes that this has been verfied by
testing the rawfloat library, so it freely uses conversions between the
raw and default formats based on F@ and F!. It double checks that with
a few of its own tests.
MODULARITY: This library loads the modules.fs library [km, dnw], which
allows the selective configuration of tests to use either reference or
native host implementations. That includes the use of F~ in the
floating-point part of the tester itself.
NOTATION: Since Forth floating point is being extended to include IEEE
754-2008 special values, in floating-point stack comments we use the
Forth r for finite real values including signed zero, x for extended
real values including signed infinity, and f for a general ieee value
including quiet or signaling signed nans.
For floating-point quantities, we use "=", "<", and ">" to mean IEEE 754
comparisons. When we want to say that two fp quantities have identical
encodings, we try to use "is" instead of "=". Thus, +0 = -0 is true,
but +0 is -0 is false.
For raw quantities and their individual cells, we use "=", "<", and ">"
in the conventional sense.
)
\ *** WORDS
(
See rawfloat.fs for raw multicell definitions and basic manipulations.
THE 32-BIT AND BINARY32 BRANCHES OF THIS CODE ARE UNTESTED.
)
\ CONSTANTS
\ signbit-raw.hi +inf-raw.hi +max-subn-raw.hi +min-qnan-raw.hi+
\ [extended double only] intbit-raw.nx +min-qnan-raw.nx
\ RAW32-FRAC-MASK RAW64-FRAC-MASK RAW80-FRAC-MASK RAW128-FRAC-MASK
\ RAW-FRAC-MASK
\ RAW32-EXP-MASK RAW64-EXP-MASK RAW80-EXP-MASK RAW128-EXP-MASK
\ RAW-EXP-MASK
\ RAW MANIPULATION
\ raw= |raw| raw-negate
\ raw-1+ raw-1-
\ #LZB 2#LZB 4#LZB RAW-FRAC-#LZB
\ raw-quiet
\ QUIETING
\ QUIET
\ CLASSIFICATION
\ FSIGNBIT F0= FINFINITE? FNAN? FSNAN? FQNAN? FFINITE?
\ FSUBNORMAL? FNORMAL?
\ SIGN MANIPULATION
\ FNEGATE FABS FCOOPYSIGN
\ COMPARISON
\ F< F> F= F<= F>=
\ FDATUM= FABSDIFF< F~
\ DATA MANIPULATION
\ FMAX FMIN FNEXTUP FNEXTDOWN
\ FLOGB FILOGB
\
\ [not implemented yet]
\ FSCALBN FREMAINDER
\ FCEIL FLOOR FROUND FTRUNC FNEARBYINT
[UNDEFINED] REQUIRE [IF] INCLUDE required-ref.fs [THEN]
REQUIRE modules.fs
REQUIRE rawfloat.fs
[UNDEFINED] (f: [IF] : (f: postpone ( ; IMMEDIATE [THEN]
[UNDEFINED] -FROT [IF]
: -FROT (f: f1 f2 f3 -- f3 f1 f2 ) frot frot ; [THEN]
[UNDEFINED] F2DUP [IF]
: F2DUP (f: f1 f2 -- f1 f2 f1 f2 ) fover fover ; [THEN]
[UNDEFINED] F2DROP [IF]
: F2DROP (f: f1 f2 -- f1 f2 f1 f2 ) fdrop fdrop ; [THEN]
[UNDEFINED] FNIP [IF]
: FNIP (f: f1 f2 -- f2 ) fswap fdrop ; [THEN]
[UNDEFINED] MAX-N [IF]
-1 1 rshift CONSTANT MAX-N
MAX-N negate CONSTANT -MAX-N
-MAX-N 1- CONSTANT MIN-N [THEN]
MODULE: IEEEFP-REF
\ Debugging
: .# ( n -- ) cr ." #" . .s cr ;
\ *** CONSTANTS
1 \ for lshift
bits/float #32 = [IF] #31 [THEN]
bits/float #64 = [IF] bits/cell #32 = [IF] #31 [ELSE] #63 [THEN] [THEN]
bits/float #80 = [IF] #15 [THEN]
bits/float #128 = [IF] bits/cell #32 = [IF] #31 [ELSE] #63 [THEN] [THEN]
lshift CONSTANT signbit-raw.hi \ .hi means most significant cell
bits/float #80 = [IF]
bits/cell #32 = [IF] $80000000 [ELSE] $8000000000000000 [THEN]
CONSTANT intbit-raw.nx \ .nx means next most significant cell
[THEN]
+inf-raw dup CONSTANT +inf-raw.hi raw-drop
+max-subn-raw dup CONSTANT +max-subn-raw.hi raw-drop
+min-qnan-raw dup CONSTANT +min-qnan-raw.hi
bits/float #80 = [IF]
over CONSTANT +min-qnan-raw.nx [THEN] raw-drop
(
The following FRAC-MASK's are used to find the normalized exponent of
subnormalized numbers, by counting zero bits of the fractional part of
the significand. The raw format has leading unused bits cleared. To
leave the significand-fraction field, we only need masks for the leading
raw cell containing significand bits. That's the leading cell for all
but raw80, where it's the next leading cell.
)
$007fffff CONSTANT RAW32-FRAC-MASK
bits/cell #32 = [IF] $000fffff [ELSE] $000fffffffffffff [THEN]
CONSTANT RAW64-FRAC-MASK
bits/cell #32 = [IF] $7fffffff [ELSE] $7fffffffffffffff [THEN]
CONSTANT RAW80-FRAC-MASK
bits/cell #32 = [IF] $0000ffff [ELSE] $0000ffffffffffff [THEN]
CONSTANT RAW128-FRAC-MASK
bits/float #32 = [IF] RAW32-FRAC-MASK [THEN]
bits/float #64 = [IF] RAW64-FRAC-MASK [THEN]
bits/float #80 = [IF] RAW80-FRAC-MASK [THEN]
bits/float #128 = [IF] RAW128-FRAC-MASK [THEN]
CONSTANT RAW-FRAC-MASK
$7f800000 CONSTANT RAW32-EXP-MASK
bits/cell #32 = [IF] $7ff00000 [ELSE] $7ff0000000000000 [THEN]
CONSTANT RAW64-EXP-MASK
$7fff CONSTANT RAW80-EXP-MASK
bits/cell #32 = [IF] $7fff0000 [ELSE] $7fff000000000000 [THEN]
CONSTANT RAW128-EXP-MASK
bits/float #32 = [IF] RAW32-EXP-MASK [THEN]
bits/float #64 = [IF] RAW64-EXP-MASK [THEN]
bits/float #80 = [IF] RAW80-EXP-MASK [THEN]
bits/float #128 = [IF] RAW128-EXP-MASK [THEN]
CONSTANT RAW-EXP-MASK
\ *** FCONSTANTS
[UNDEFINED] +INF [IF] +inf-raw raw>f FCONSTANT +INF [THEN]
[UNDEFINED] -INF [IF] -inf-raw raw>f FCONSTANT -INF [THEN]
[UNDEFINED] +QNAN [IF] +min-qnan-raw raw>f FCONSTANT +QNAN [THEN]
[UNDEFINED] -QNAN [IF] -min-qnan-raw raw>f FCONSTANT -QNAN [THEN]
\ *** RAW OPERATIONS
: raw= ( raw1 raw2 -- [=]? ) \ raw data exactly equal
[ bits/float #32 = ] [IF] = [THEN]
[ bits/float #64 = ] [IF] [ bits/cell #32 = ]
[IF] d= [ELSE] = [THEN] [THEN]
[ bits/float #80 = ] [IF] [ bits/cell #32 = ]
[IF] d= >r = r> and [ELSE] d= [THEN] [THEN]
[ bits/float #128 = ] [IF] [ bits/cell #32 = ]
[IF] d= >r d= r> and [ELSE] d= [THEN] [THEN] ;
: |raw| ( raw -- |raw| ) \ clear the raw sign bit
[ signbit-raw.hi invert ] literal and ;
: raw-negate ( raw -- -raw ) signbit-raw.hi xor ;
bits/cell #32 = [IF]
bits/float #80 = bits/float #128 = or
[IF]
: [UD1+]-CARRY ( ud -- ud' carry )
dup $80000000 and >r 1 0
d+ dup $80000000 and 0= r> and [ bits/cell 1- ] literal rshift ;
: [UD1-]-BORROW ( ud -- ud' borrow )
dup $80000000 and >r 1 0
d- dup $80000000 and 0= r> and negate ;
[THEN]
bits/float #80 = [IF]
: UT1+ ( ut -- ut+1 ) >r [ud1+]-carry r> + ;
: UT1- ( ut -- ut-1 ) >r [ud1-]-borrow r> + ;
[THEN]
bits/float #128 = [IF]
: UQ1+ ( uq -- uq+1 ) 2>r [ud1+]-carry 2r> d+ ;
: UQ1- ( uq -- uq-1 ) 2>r [uq1-]-borrow 2r> d+ ;
[THEN]
[THEN] \ bits/cell
: raw-1+ ( raw -- raw+1 )
[ bits/float #32 = ] [IF] 1+ [THEN]
[ bits/float #64 = ] [IF]
[ bits/cell #32 = ] [IF] 1 0 d+ [ELSE] 1+ [THEN] [THEN]
[ bits/float #80 = ] [IF]
[ bits/cell #32 = ] [IF] ut1+ [ELSE] 1 0 d+ [THEN] [THEN]
[ bits/float #128 = ] [IF]
[ bits/cell #32 = ] [IF] uq1+ [ELSE] 1 0 d+ [THEN] [THEN] ;
: raw-1- ( raw -- raw-1 )
[ bits/float #32 = ] [IF] 1- [THEN]
[ bits/float #64 = ] [IF]
[ bits/cell #32 = ] [IF] 1 0 d- [ELSE] 1- [THEN] [THEN]
[ bits/float #80 = ] [IF]
[ bits/cell #32 = ] [IF] ut1- [ELSE] 1 0 d- [THEN] [THEN]
[ bits/float #128 = ] [IF]
[ bits/cell #32 = ] [IF] uq1- [ELSE] 1 0 d- [THEN] [THEN] ;
: #LZB ( u -- +n ) \ based on LOG2_0, Wil Baden / Marcel Hendrix
(
Return the number +n of leading zero bits in u. For this inefficient
version, the more leading zero bits the faster. It could possibly be
replaced by a single cpu command.
)
0 >r
BEGIN
?DUP WHILE
r> 1+ >r 1 rshift
REPEAT
bits/cell r> - ;
: 2#LZB ( ud -- +n )
#LZB dup bits/cell <> IF nip EXIT THEN swap #LZB + ;
: 4#LZB ( uq -- +n )
2#LZB dup >r bits/cell 2* <> IF 2drop r> EXIT THEN 2#LZB r> + ;
: raw-frac-#lzb ( raw -- +n )
(
Leave the number of leading zero bits in the fractional part of the raw
significand field. For use with subnormals.
)
[ bits/float #80 = ] [IF] \ leading cell contains no frac bits
drop [THEN] RAW-FRAC-MASK and
[ bits/float #32 = ] [IF] #LZB
[ bits/cell #32 = ] [IF] #15 [ELSE] #47 [THEN]
[THEN]
[ bits/float #64 = ] [IF]
[ bits/cell #32 = ] [IF] 2#LZB [ELSE] #LZB [THEN] #12
[THEN]
[ bits/float #80 = ] [IF]
[ bits/cell #32 = ] [IF] 2#LZB [ELSE] #LZB [THEN] #1
[THEN]
[ bits/float #128 = ] [IF]
[ bits/cell #32 = ] [IF] 4#LZB [ELSE] 2#LZB [THEN] #16
[THEN]
( #non_frac_bits) - ;
: (raw-exp) ( raw -- exp )
(
Return the unshifted, exp field of raw without removing the bias.
)
dup >r raw-drop r> RAW-EXP-MASK and ;
: raw-exp ( raw -- e )
(
Return the exp field shifted to the least significant end of its cell
and subtract the bias. The exponent when raw is zero is minus the bias,
mathematically correct since 0 x 2^x = 0 for any x, but not particularly
useful.
)
(raw-exp)
[ bits/float #80 = ] [IF] emax - ; [ELSE]
[ bits/float #32 = ] [IF] #23 [THEN]
[ bits/float #64 = ] [IF]
[ bits/cell #32 = ] [IF] #20 [ELSE] #52 [THEN]
[THEN]
[ bits/float #128 = ] [IF]
[ bits/cell #32 = ] [IF] #16 [ELSE] #48 [THEN]
[THEN]
rshift emax - ; [THEN]
: raw-quiet ( nan.raw -- qnan.raw )
(
Assume the input to be a raw nan. Leave the raw nan with quiet bit on
and all other bits unchanged [simply quieted].
)
[ bits/cell #32 = ] [IF]
[ bits/float #32 = ] [IF] $00400000 or [THEN]
[ bits/float #64 = ] [IF] $00080000 or [THEN]
[ bits/float #80 = ] [IF] swap $40000000 or swap [THEN]
[ bits/float #128 = ] [IF] $00008000 or [THEN]
[ELSE] \ bits/cell = 64
[ bits/float #32 = ] [IF] $00400000 or [THEN]
[ bits/float #64 = ] [IF] $0008000000000000 or [THEN]
[ bits/float #80 = ] [IF] swap $4000000000000000 or swap [THEN]
[ bits/float #128 = ] [IF] $0000800000000000 or [THEN]
[THEN] ;
BEGIN-MODULE
PUBLIC:
\ *** QUIETING
: QUIET (f: snan|qnan -- qnan )
(
Any floating-point result produced by an operation that signals an
invalid operation shall default to a qnan [7.2 Invalid Operation]. If
that qnan is propagated from an snan operand [6.2.3 NaN propagation], it
shall be the snan quieted by setting the quiet bit and leaving the load
unchanged [6.2.1 NaN encodings in binary formats]. What happens to the
sign bit is unspecified. This implementation leaves it unchanged.
)
f>raw raw-quiet raw>f ;
\ *** CLASSIFICATION
: FSIGNBIT (f: f -- ) ( -- [-]? )
f>raw dup >r raw-drop r> signbit-raw.hi and 0<> ;
: F0= (f: f -- ) ( -- [0]? )
(
The IEEE 754 comparison ignores the sign of zero.
)
f>raw |raw| +0-raw raw= ;
: FINFINITE? (f: f -- ) ( -- [+|-inf]? ) f>raw |raw| +inf-raw raw= ;
: FNAN? (f: f -- ) ( -- [nan]? )
(
True if f is not infinite, and the magnitude of the most sig raw cell is
>= the most sig cell of +inf-raw. Algorithm from inspection of bit
patterns.
)
fdup FINFINITE? 0= ( [not.inf]?)
f>raw |raw| dup >r raw-drop r> +inf-raw.hi >= and ;
: FSNAN? (f: f -- ) ( -- [snan]? )
fdup FINFINITE? 0= ( [not.inf]?)
[ bits/float #80 = ] [IF]
f>raw |raw| dup +inf-raw.hi = ( [snan|qnqn]?) >r ( [not.inf]? |f|.raw)
over ( raw.nx) >r raw-drop ( [not.inf]?)
+min-qnan-raw.nx r> ( raw.nx ) u> and r> and ;
[ELSE]
f>raw |raw| dup >r raw-drop r@ +inf-raw.hi >= and
+min-qnan-raw.hi r> u> and ;
[THEN]
\ Not required by IEEE 754-2008.
: FQNAN? (f: f -- ) ( -- [qnan]? ) fdup FNAN? FSNAN? 0= and ;
: FFINITE? (f: f -- ) ( -- [not-inf & not-nan]? )
fdup FINFINITE? FNAN? or 0= ;
: FSUBNORMAL? (f: f -- ) ( -- [subnormal]? )
fdup F0= 0= fdup FFINITE? and
f>raw dup >r raw-drop r> |raw| +max-subn-raw.hi <= and ;
: FNORMAL? (f: f -- ) ( -- [normal]? )
fdup F0= 0= fdup FFINITE? and FSUBNORMAL? 0= and ;
\ *** SIGN MANIPULATION
: FNEGATE (f: f -- -f ) f>raw raw-negate raw>f ;
: FABS (f: f -- |f|) f>raw |raw| raw>f ;
: FCOPYSIGN (f: f1 f2 -- f3 )
f>raw dup >r raw-drop f>raw
[ signbit-raw.hi invert ] literal and r> signbit-raw.hi and or raw>f ;
\ *** COMPARISON
: F< ( f: f1 f2 -- ) ( -- [f1 (f: f1 f2 -- ) ( -- [f1>f2]? ) fswap F< ;
: F= (f: f1 f2 -- ) ( -- [f1=f2]? )
f2dup F0= F0= and IF f2drop true EXIT THEN
f2dup FNAN? FNAN? or IF f2drop false EXIT THEN
f>raw f>raw ( raw2 raw1) raw= ; \ RAW= is symmetric
: F<= (f: f1 f2 -- ) ( -- [f1<=f2]? ) f2dup F< F= or ;
: F>= (f: f1 f2 -- ) ( -- [f1>=f2]? ) f2dup F> F= or ;
: FDATUM= (f: f1 f2 -- ) ( -- [f1.dat=f2.dat]? ) f>raw f>raw raw= ;
: FABSDIFF< (f: f1 f2 f3 -- ) ( -- flag ) -frot f- FABS F> ;
: F~ (f: f1 f2 f3 -- ) ( -- flag )
fdup F0= IF fdrop FDATUM= EXIT THEN
fdup FNAN? IF f2drop fdrop false EXIT THEN
fdup FSIGNBIT 0= IF FABSDIFF< EXIT THEN
\ From here the nan's will propapate and be handled correctly by F<.
FNEGATE -frot (f: |f3| f1 f2)
f2dup f- FABS (f: |f3| f1 f2 |f1-f2|)
-frot FABS fswap FABS f+ (f: |f3| |f1-f2| |f2|+|f1|)
frot (f: |f1-f2| |f2|+|f1| |f3|)
f* F< ;
\ *** DATA MANIPULATION
: FMAX (f: f1 f2 -- x | nan )
(
The following is based on IEEE 754-2008 [5.3.1 General operations].
IF f1 < f2, then return f2.
IF f1 > f2, then return f1.
In either of these cases, [f1,f2] is [x1,x2], and it is impossible that
both values are signed zeros.
If only one of f1,f2 is a nan, return the other, which is an extended
real. [This is unlike the usual operations on nans that produce an f
result.]
Otherwise return either. This has two subcases:
a. [f1,f2] is [x1,x2] with x1 = x2. Hence if [x1,x2] is [+0,-0] or
[-0,+0], the return can be either +0 or -0.
b. [f1,f2] is [nan1,nan2], and either can be returned.
The IEEE spec mentions that x returns are "canonicalized", which is
irrelevant for binary formats.
It also mentions that with signaling nan input, the result must obey
Sec. 6.2. This implementation is possibly noncompliant in that it
ignores any distinction between quiet and signaling.
)
f2dup F< IF fnip (f: x2) EXIT THEN
f2dup F> IF fdrop (f: x1) EXIT THEN
fdup FNAN? IF fdrop (f: f1) fdup FNAN? IF QUIET THEN EXIT THEN
fnip ;
: FMIN (f: f1 f2 -- min[x1,x2] | nan )
(
The specification is the same as that for FMAX, but with < and >
interchanged.
)
f2dup F> IF fnip (f: x2) EXIT THEN
f2dup F< IF fdrop (f: x1) EXIT THEN
fdup FNAN? IF fdrop (f: f1) fdup FNAN? IF QUIET THEN EXIT THEN
fnip ;
: FNEXTUP (f: f1 -- f2 )
(
Paraphrased from IEEE 754-2008 Standard Sec. 3.5.1:
a. if f1 is the negative number of least magnitude, then f2 is -0;
b. nextUp[±0] is the positive nonzero number of least magnitude;
c. nextUp[+∞] is +∞;
d. nextUp[−∞] is the finite negative number of largest magnitude;
e. when f1 is NaN, then the result is according to Sec. 6.2;
f. nextUp[f1] does not signal an exception except for sNaNs.
The interpretation of Section 6.2 is not totally clear to us. We take
it to mean that a quiet nan "should" be preserved, and that a signaling
nan "shall" be returned as quiet under default exception handling.
The basic algorithm is from Andrew Haley.
)
fdup FINFINITE?
IF FSIGNBIT IF -max-n-raw ELSE +inf-raw THEN raw>f EXIT THEN
\ finite or nan
0.0e f+ \ convert -0 into +0, sNaN into qNaN
f>raw dup signbit-raw.hi and ( signbit) >r
|raw| dup +inf-raw.hi <
IF \ finite
[ bits/float #80 = ] [IF]
r@ ( signbit)
IF \ minus
over intbit-raw.nx and
IF \ normal
raw-1- over intbit-raw.nx and 0=
IF \ possible gap
( exp) 1-
dup ( exp) 0<> ( gap?)
IF \ output is normal, restore intbit to skip any gap
swap intbit-raw.nx or swap
THEN
\ if no gap, output is subnormal and correct
THEN
ELSE \ subnormal, not zero
raw-1-
THEN
ELSE \ plus
over intbit-raw.nx and
IF \ normal
raw-1+ swap intbit-raw.nx or ( any_gap_skipped) swap
ELSE \ subnormal
raw-1+ over intbit-raw.nx and IF drop 1 ( min-n) THEN
THEN
THEN
[ELSE] \ bits/float #80 <>
r@ ( signbit) IF raw-1- ELSE raw-1+ THEN
[THEN]
r> ( signbit) or raw>f EXIT
THEN
\ nan
r> or raw>f ;
: FNEXTDOWN (f: f1 -- f2 ) FNEGATE FNEXTUP FNEGATE ;
: FLOGB (f: f -- n )
(
This is the floating-point format version of logB[f] [IEEE 754-2008
5.3.3]. It obeys logB[f] = logB[|f|]. For binary encodings like ours,
a finite f = x is treated as represented in normalized form, even when
subnormal; that is, x = 1.y * 2^n with y a string of binary digits.
Then logB[x] = floor[log_2[x]] = n. When x is subnormal, so that x =
0.0 ... 01y * 2^emin, we have n = emin - z - 1, where z is the number of
leading zero binary digits in the fractional part of the significand.
Requirements for special values:
a. logB[1] = +0;
b. logB[0] = -inf and signals the divideByZero exception;
c. logB[inf] = +inf;
d. logB[NaN] is a NaN, which we take to be the same as the input when it
is quiet, and preferably the quieting of the input when it is
signaling.
)
fdup f0= IF fdrop -1e 0e f/ EXIT THEN
fdup FNORMAL? IF f>raw raw-exp s>f EXIT THEN
fdup FSUBNORMAL? IF f>raw raw-frac-#lzb ( z) 1+ emin - negate s>f EXIT THEN
fdup FINFINITE? IF fabs EXIT THEN
( nan) QUIET ;
: FILOGB (f: f -- ) ( -- n )
(
This is the integer format version of logB[f]. It differs from the
floating-point format version by its integer representation of special
output values. Namely, logB[NaN], logB[inf], and logB[0] are
language-defined large integer values, and all signal the invalid
operation exception.
This implementation returns:
a. logB[0] = -max-n;
c. logB[inf] = max-n;
d. logB[NaN] = min-n = $80...0 .
)
fdup f0= IF +QNAN f< drop -max-n EXIT THEN
fdup FNORMAL? IF f>raw raw-exp EXIT THEN
fdup FSUBNORMAL? IF f>raw raw-frac-#lzb ( z) 1+ emin - negate EXIT THEN
fdup FINFINITE? IF +QNAN f< drop max-n EXIT THEN
f0< drop min-n ;
END-MODULE