( Title: ANS Forth Complex Arithmetic Lexicon with Kahan Algorithms File: complex-kahan.fs Log file: complex-kahan.log Test files: complex-test.fs, complex-ieee-test.fs Version: 1.0.3 Original Author: Julian V. Noble Modified by: David N. Williams License: LGPL Last revision: December 12, 2010 The original code is ) \ Copyright (C) 1998 Julian V. Noble ( Modifications not derived from the original, including expressions of the Kahan algorithms, are ) \ Copyright (C) 2002-2005, 2008-2010 David N. Williams ( This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or at your option any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. If you take advantage of the option in the LGPL to put a particular version of this library under the GPL, the author[s] would regard it as polite if you would put any direct modifications under the LGPL as well, and include a copy of this request near the beginning of the modified library source. A "direct modification" is one that enhances or extends the library in line with its original concept, as opposed to developing a distinct application or library which might use it. ANS Forth environmental dependences: 1. Requires the Floating-Point and Floating-Point extension word sets. 2. Assumes a separate floating point stack. 3. Does not construct a separate complex number stack. 4. Case insensitivity for standard words. 5. The ANS FORTH default floats must be 32-bit or 64-bit IEEE 754 to work correctly with signed zero/infinity. 6. Some of the Kahan algorithms are optimized for radix 2. 7. Machine parameters like EPSILON and a few others have to be uncommented, or supplied if they are not 32-bit or 64-bit IEEE 754. The parameter section assumes that raw floats are represented in memory as contiguous sets of bytes that are totally big-endian or little-endian, with the raw representation of -1E nonzero in the logical first byte and zero in the logical last byte. 8. If not present in the host FORTH, FSIGNBIT assumes default floats to be IEEE 754 32 or 64 bits, and also assumes the cell size to be a multiple 32 bits. 9. There is a system dependence on the sign of NaN produced by an arithmetic operation. In particular, Z^ currently gives NaN's with plus signs on ppc and minus signs on intel systems, for both 0+i0^0+i0 and 0+i0^1+i0, with both pfe and gforth. So does F/ for 0/0. Complex numbers z=x+iy are stored on the fp stack as [f: x y], also written as [f: z]. Angles are in radians. This code uses higher-accuracy algorithms by William Kahan [1]. It uses the principal argument, with -pi < arg <= pi, for ARG, ZSQRT, ZLN, and Z^. The Kahan alorithms implement an OpenMath compliant treatment of principal expressions, branch cuts, and branches for the elementary functions [2], which is based on Abramowitz and Stegun [3]. Currently the only high-level way we have to save and restore IEEE underflow and overflow flags, and to use IEEE scaling functions to avoid underflow/overflow, is through a pfe environment, IEEEFP-EXT. The parts of the Kahan algorithms that do that are conditionally compiled here, when the environment is present. Kahan says that the algorithms degrade gracefully under simple omission of these procedures, which is what happens here when IEEEFP-EXT is absent. Kahan pays attention to signed zero, where available in IEEE 754/854 implementations. We address that in this library as follows. 1. ZSINH, ZASINH, ZTANH, ZATANH, ZSIN, ZASIN, ZTAN, and ZATAN conserve the sign of zero, and 1/Z and the minimal computation words Z*F, Z/F, F*Z, F/Z, Z*I*F, -I*Z/F, I*F*Z, I*F/Z do the right thing. 2. We would like the analytic functions that are real and analytic on the real axis to do the right thing for the sign of the zero imaginary part. This works for all of our tests so far. 3. For functions having branch cuts, signed zero in the appropriate x or y input produces correct values on the cuts. ) s" FLOATING-EXT" environment? dup [IF] ( flag) and [THEN] 0= [IF] cr .( ***Floating-point extensions not available.) cr ABORT [THEN] s" FLOATING-STACK" environment? dup [IF] ( maxdepth) and [THEN] 0= [IF] cr .( ***Floating-point stack not separate.) cr ABORT [THEN] s" IEEEFP-EXT" environment? [IF] ( version) drop true [ELSE] false [THEN] CONSTANT IEEEFP immediate \ *** WORDS ( NONSTANDARD FLOATING POINT (f: -frot fnip ftuck s>f 1/f f^2 f^n f2* f2/ f0> f= f> fsignbit CONSTANTS BIG-ENDIAN BITS/FLOAT IEEEFP BADEN-STYLE FLOAT LOCALS framef unframe fpframe>r r>frame &fa &fb fa fb fc fd fe ff fg fh to-fa to-fb to-fc to-fd to-fe to-ff to-fg to-fh FCONSTANTS pi/2 pi -pi sqrt2 1/sqrt ln2 2pi RAW FLOAT GENERATION [deleted with a marker] hex-byte raw-hex>float GENERATED EXACT FCONSTANTS r2p1 t2p1 +Inf -Inf +NaN epsilon xmax 2/epsilon sqrt[xmax]/4 4/sqrt[xmax] sinh[xmax]/4 LOAD AND STORE COMPLEX COMPLEXES COMPLEX+ ZVARIABLE ZCONSTANT ZLITERAL z@ z! x@ x! y@ y! za zb zc zd to-za to-zb to-zc to-zd OUTPUT z. zs. FLOATING POINT STACK z= zdrop zdup zswap zover znip ztuck zrot -zrot MINIMAL [MIXED] OPERATIONS x+ x- y+ y- z*>real z*>imag z*f z/f f*z f/z z*i*f -i*z/f i*f*z i*f/z ARITHMETIC conjg real imag cmplx i* -i* znegate z+ z- z* z/ z2* z2/ 1/z z^2 z^n |z|^2 FUNCTIONS |z| arg >polar polar> zbox zsqrt zexp zln z^ zsinh zcosh ztanh zsin zcos ztan INVERSE FUNCTIONS zasinh zacosh zatanh zasin zacos zatan ) \ *** CONSTANTS -1E pad faligned dup f! c@ [IF] true \ memory floats are big endian [ELSE] false \ memory floats are little endian [THEN] CONSTANT BIG-ENDIAN immediate [UNDEFINED] BITS/FLOAT [IF] :NONAME ( -- +n ) s" MAX-FLOAT" environment? 0= ABORT" ***can't determine MAX-FLOAT" pad 3 represent 0= ABORT" ***MAX-FLOAT invalid" \ shouldn't happen ( exp plus?) drop CASE 39 OF 32 ENDOF \ .340282347e+39 309 OF 64 ENDOF \ .17976931348623157e+309 4933 OF 80 ENDOF \ .118973149535723176505e+4933 ABORT" ***BITS/FLOAT must be 32, 64, or 80" ENDCASE ; EXECUTE CONSTANT BITS/FLOAT [THEN] \ *** BADEN-STYLE FLOAT LOCALS ( Wil Baden's cheap, nestable locals written for floats. ) : framef ( -- addr ) here falign 8 floats allot ; : unframe ( addr -- ) here - allot ; : fpframe>r ( -- ) postpone framef postpone >r ; immediate : r>frame ( -- ) postpone r> postpone unframe ; immediate : &fa here [ 8 floats ] literal - ; : &fb here [ 7 floats ] literal - ; : fa here [ 8 floats ] literal - f@ ; : fb here [ 7 floats ] literal - f@ ; : fc here [ 6 floats ] literal - f@ ; : fd here [ 5 floats ] literal - f@ ; : fe here [ 4 floats ] literal - f@ ; : ff here [ 3 floats ] literal - f@ ; : fg here [ 2 floats ] literal - f@ ; : fh here [ 1 floats ] literal - f@ ; : to-fa here [ 8 floats ] literal - f! ; : to-fb here [ 7 floats ] literal - f! ; : to-fc here [ 6 floats ] literal - f! ; : to-fd here [ 5 floats ] literal - f! ; : to-fe here [ 4 floats ] literal - f! ; : to-ff here [ 3 floats ] literal - f! ; : to-fg here [ 2 floats ] literal - f! ; : to-fh here [ 1 floats ] literal - f! ; \ *** NONSTANDARD FLOATING POINT \ non-Standard fp words jvn has found useful [UNDEFINED] s>f [IF] : s>f s>d d>f ; [THEN] [UNDEFINED] -frot [IF] : -frot frot frot ; [THEN] [UNDEFINED] fnip [IF] : fnip fswap fdrop ; [THEN] [UNDEFINED] ftuck [IF] : ftuck fswap fover ; [THEN] [UNDEFINED] 1/f [IF] : 1/f 1E fswap f/ ; [THEN] [UNDEFINED] f^2 [IF] : f^2 fdup f* ; [THEN] \ added by dnw [UNDEFINED] (f: [IF] : (f: postpone ( ; IMMEDIATE [THEN] [UNDEFINED] f2* [IF] : f2* 2E f* ; [THEN] [UNDEFINED] f2/ [IF] : f2/ 0.5E f* ; [THEN] [UNDEFINED] f0> [IF] : f0> fnegate f0< ; [THEN] [UNDEFINED] f= [IF] : f= f- f0= ; [THEN] [UNDEFINED] f> [IF] : f> fswap f< ; [THEN] [UNDEFINED] f^n [IF] : f^n ( u f: x -- x^u ) 1E ( u) 0 ?DO fover f* LOOP fnip ; [THEN] [UNDEFINED] fsignbit [IF] \ emulate IEEE signbit(x) : fsignbit (f: x -- s: minus? ) fpframe>r (f: x) to-fa BIG-ENDIAN [IF] &fa [ELSE] &fb 1- [THEN] c@ 128 and 0= invert r>frame ; [THEN] ( In ANS Forth 94 FATAN2 is ambiguous. The following assumes it to follow one of only two known conventions, -pi < arg < pi [or maybe <=], or 0 <= arg < 2pi. ) -1E 1E (f: y x) fatan2 f0> [IF] : fatan2 (f: y x -- atan2[y,x]) ( x) fdup f0< FATAN2 IF 2pi F- THEN ; [THEN] ( The x y order below follows the readability principle for implementing functions in Forth based on other standards. ) [UNDEFINED] fcopysign [IF] \ emulate IEEE copysign(x,y) = |x|*sgn(y) : fcopysign ( f: x y -- |x|*sgn[y] ) fsignbit fabs IF fnegate THEN ; [THEN] s" IFORTH" environment? [IF] ( flag) drop : FASINH fdup fsignbit fabs fasinh IF fnegate THEN ; : FLNP1 1e f+ fln ; [THEN] \ *** FCONSTANTS \ accuracies good up to the 21 significant digits required \ for Intel extended double (Sparc requires 36) 1.570796326794896619231E FCONSTANT pi/2 6.283185307179586476925E FCONSTANT 2pi 3.141592653589793238463E FCONSTANT pi -3.141592653589793238463E FCONSTANT -pi 1.414213562373095048802E FCONSTANT sqrt2 \ 0.7071067811865475244008E FCONSTANT 1/sqrt2 0.70710678118654752440084436210484903928E FCONSTANT 1/sqrt2 \ 0.6931471805599453094172E FCONSTANT ln2 0.69314718055994530941723212145817656808E FCONSTANT ln2 \ *** RAW FLOAT GENERATION MARKER -EXACT-PARAMS BITS/FLOAT 8 / CONSTANT /float \ assume 1au = 1 byte : hex-byte ( caddr u -- caddr+2 u-2 byte ) base @ >r hex over 0 0 rot 2 >number ( caddr u dbyte=0 addr #rem) ABORT" ***Not a hex string." ( dhi=0 addr) 2drop ( byte) >r 2 /string r> ( byte) r> base ! ; falign here /float allot CONSTANT fscratch : raw-hex>float ( caddr u -- f: f ) ( u) dup /float 2* <> ABORT" ***Wrong size for raw float string." BIG-ENDIAN [IF] fscratch /float + fscratch ( 'end+1 'start) DO hex-byte i c! LOOP [ELSE] fscratch dup /float + 1- ( 'start ;end) DO hex-byte i c! -1 +LOOP [THEN] ( caddr' 0) 2drop fscratch f@ ; \ In the following, the constants for 1+sqrt(2) used by the \ Kahan algorithms represent double the precision of the \ corresponding float type. The leading one is chopped, and the \ trailing one is rounded to nearest with even ties. BITS/FLOAT 80 = [IF] \ double extended s" 3FC08000000000000000" raw-hex>float (f: eps) \ 0x1p-63 s" 7FFEFFFFFFFFFFFFFFFF" raw-hex>float (f: xmax) s" 7FFF8000000000000000" raw-hex>float (f: +Inf) s" FFFF8000000000000000" raw-hex>float (f: -Inf) s" 7FFFC000000000000000" raw-hex>float (f: +NaN) \ quiet \ used by Kahan algorithms s" 40009A827999FCEF3242" raw-hex>float (f: r2p1=1+sqrt[2]) s" 3FBEB2FB1366EA957D3F" raw-hex>float (f: t2p1=[1+sqrt[2]]-r2p1) [THEN] BITS/FLOAT 64 = [IF] \ double s" 3CB0000000000000" raw-hex>float (f: eps) \ 0x1p-52 s" 7FEFFFFFFFFFFFFF" raw-hex>float (f: xmax) s" 7FF0000000000000" raw-hex>float (f: +Inf) s" FFF0000000000000" raw-hex>float (f: -Inf) s" 7FF8000000000000" raw-hex>float (f: +NaN) \ quiet \ used by Kahan algorithms s" 4003504F333F9DE6" raw-hex>float (f: r2p1=1+sqrt[2]) s" 3CA21165F626CDD5" raw-hex>float (f: t2p1=[1+sqrt[2]]-r2p1) [THEN] BITS/FLOAT 32 = [IF] \ single s" 34000000" raw-hex>float (f: eps) \ 0x1p-23 s" 7F7FFFFF" raw-hex>float (f: xmax) s" 7F800000" raw-hex>float (f: +Inf) s" FF800000" raw-hex>float (f: -Inf) s" 7FC00000" raw-hex>float (f: +NaN) \ quiet \ used by Kahan algorithms s" 401A8279" raw-hex>float (f: r2p1=1+sqrt[2]) s" 3419FCEF" raw-hex>float (f: t2p1=[1+sqrt[2]]-r2p1) [THEN] -EXACT-PARAMS \ *** GENERATED RAW FCONSTANTS (f: eps xmax +Inf -Inf +NaN r2p1 t2p1) FCONSTANT t2p1 FCONSTANT r2p1 FCONSTANT +NaN FCONSTANT -Inf FCONSTANT +Inf FCONSTANT xmax FCONSTANT epsilon \ used by Kahan algorithms 2E epsilon f/ FCONSTANT 2/epsilon xmax fsqrt 4E f/ FCONSTANT sqrt[xmax]/4 sqrt[xmax]/4 1/f FCONSTANT 4/sqrt[xmax] xmax fasinh 4E f/ FCONSTANT sinh[xmax]/4 1/sqrt2 FCONSTANT T0 \ for ZLN 5E 4E f/ FCONSTANT T1 3E FCONSTANT T2 IEEEFP [IF] 4E 1E epsilon f- f* xmax f/ epsilon f/ FCONSTANT lambda/epsilon [THEN] \ *** LOAD AND STORE : COMPLEXES ( n -- n*/complex ) 2* FLOATS ; : COMPLEX+ ( f-addr -- f-addr+/complex ) [ 1 COMPLEXES ] literal + ; [UNDEFINED] COMPLEX [IF] 1 COMPLEXES CONSTANT COMPLEX [THEN] \ hidden, *nonnestable* scratch storage for stuff from fp stack \ falign here 6 floats allot value fnoname falign here 3 complexes allot value fnoname : z@ ( zaddr -- f: -- z ) dup f@ float+ f@ ; : z! ( zaddr -- f: z -- ) dup float+ f! f! ; : x@ ( zaddr -- f: x ) f@ ; : y@ ( zaddr -- f: y ) float+ f@ ; : x! ( zaddr f: x -- ) f! ; : y! ( zaddr f: y -- ) float+ f! ; \ cheap locals for complex values, sharing the floats space: : za here [ 4 complexes ] literal - z@ ; : zb here [ 3 complexes ] literal - z@ ; : zc here [ 2 complexes ] literal - z@ ; : zd here [ 1 complexes ] literal - z@ ; : to-za here [ 4 complexes ] literal - z! ; : to-zb here [ 3 complexes ] literal - z! ; : to-zc here [ 2 complexes ] literal - z! ; : to-zd here [ 1 complexes ] literal - z! ; : ZVARIABLE ( "name" -- faddr ) falign here [ 1 complexes ] literal allot CONSTANT ; : ZLITERAL (f: z -- ) \ compile (f: -- z ) \ run state @ 0= IF -14 throw THEN fswap postpone fliteral postpone fliteral ; immediate : ZCONSTANT ( "name" f: z -- ) \ compile (f: -- z ) \ run ( Based on Anton Ertl's portable definition of CONSTANT, comp.lang.forth, "Re: Alternative DEFER strategies?", 16 Dec 2008, but avoiding the return stack. ) [ fnoname ] literal z! : [ fnoname ] literal z@ postpone zliteral postpone ; ; \ *** OUTPUT : .f[bl|-] (f: r -- |r| ) fdup fsignbit fabs dup invert bl and swap [char] - and + emit ; : .f[+|-]i (f: r -- |r| ) fdup fsignbit fabs dup invert [char] + and swap [char] - and + emit ." i" ; : z. (f: z -- ) \ emit complex # fswap .f[bl|-] f. .f[+|-]i f. ; : zs. (f: z -- ) \ emit complex #, scientific notation fswap .f[bl|-] fs. .f[+|-]i fs. ; \ *** FLOATING POINT STACK : real (f: x y -- y ) fdrop ; : imag (f: x y -- x ) fnip ; : conjg (f: x y -- x -y ) fnegate ; : zdrop (f: z --) fdrop fdrop ; : zdup (f: z -- z z ) fover fover ; : zswap (f: z1 z2 -- z2 z1) [ fnoname ] literal f! -frot [ fnoname ] literal f@ -frot ; : znip (f: z1 z2 -- z2 ) zswap zdrop ; : zover (f: z1 z2 -- z1 z2 z1 ) frot [ fnoname float+ ] literal f! (f: -- x u v) frot fdup [ fnoname ] literal f! (f: -- u v x) -frot [ fnoname float+ ] literal f@ (f: -- x u v y) -frot [ fnoname ] literal z@ ; (f: -- x y u v x y) : ztuck (f: z1 z2 -- z2 z1 z2 ) zswap zover ; : z= (f: z1 z2 -- s: flag ) frot f= f= and ; : z0= (f: z -- s: flag ) f0= f0= and ; : zrot ( f: z1 z2 z3 -- z2 z3 z1 ) [ fnoname ] literal z! [ fnoname 1 complexes + ] literal z! [ fnoname 2 complexes + ] literal z! [ fnoname 1 complexes + ] literal z@ [ fnoname ] literal z@ [ fnoname 2 complexes + ] literal z@ ; : -zrot ( f: z1 z2 z3 -- z3 z1 z2 ) [ fnoname ] literal z! [ fnoname 1 complexes + ] literal z! [ fnoname 2 complexes + ] literal z! [ fnoname ] literal z@ [ fnoname 2 complexes + ] literal z@ [ fnoname 1 complexes + ] literal z@ ; \ *** MINIMAL [MIXED] OPERATIONS ( These words implement Kahan's rules [pp. 194, 195] for minimal computation, to avoid gratuitous overflow, underflow, and signed zero distortion in real or imaginary parts that don't need to be computed. ) : x+ (f: x y a -- x+a y ) frot f+ fswap ; : y+ (f: x y a -- x y+a ) f+ ; : x- (f: x y a -- x-a y ) fnegate frot f+ fswap ; : y- (f: x y a -- x y-a ) f- ; : z*>real (f: z1 z2 -- re[z1*z2] ) \ compute z* without doing Im frot [ fnoname ] literal f* f! (f: x1 x2) f* [ fnoname ] literal f@ f- ; (f: x1*x2-y1*y2) : z*>imag (f: z1 z2 -- im[z1*z2] ) \ compute z* without doing Re [ fnoname ] literal f! f* fswap (f: y1*x2 x1) [ fnoname ] literal f@ f* f+ ; (f: y1*x2+x1*y2) : z*f (f: z f -- f*z ) frot fover f* (f: y f x*f) -frot f* ; : z/f (f: z f -- z/f ) frot fover f/ (f: y f x/f) -frot f/ ; : f*z (f: f z -- f*z ) -frot fover f* (f: y f x*f) -frot f* ; : (f/z) (f: f a b -- f/[b+[a/b]a] [a/b]f/[b+[a/b]a] ) ( A factor in F/Z, used with |a| <= |b|. ) fpframe>r zdup f/ to-fa (f: f a b) \ fa=a/b fswap fa f* f+ f/ (f: u=f/[b+[a/b]a]) fdup fa f* (f: u v=u[a/b]) r>frame ; : f/z (f: f z -- f/z ) ( Kahan algorithm *without* due attention to spurious over/underflows and zeros and infinities. ) zdup fabs fswap fabs (f: f z |y| |x|) f< IF fswap (f/z) ELSE (f/z) fswap THEN fnegate ; (f: f*x/|z|^2 -f*y/|z|^2) : z*i*f (f: z f -- z*if ) frot fover f* (f: y f x*f) -frot f* fnegate fswap ; (f: -y*f x*f) : -i*z/f (f: z f -- z/[if] ) frot fover f/ fnegate (f: y f -x/f) -frot f/ fswap ; (f: y/f -x/f) : i*f*z (f: f z -- if*z ) -frot fover f* (f: y f x*f) -frot f* fnegate fswap ; (f: -y*f x*f) : i*f/z (f: f z -- if/z ) ( Kahan algorithm *without* due attention to spurious over/underflows and zeros and infinities. ) zdup fabs fswap fabs (f: f z |y| |x|) f< IF fswap (f/z) ELSE (f/z) fswap THEN fswap ; (f: f*y/|z|^2 f*x/|z|^2) \ *** ARITHMETIC : z* (f: x y u v -- x*u-y*v x*v+y*u) ( Uses the algorithm followed by jvn: [x+iy]*[u+iv] = [[x+y]*u - y*[u+v]] + i[[x+y]*u + x*[v-u]] Requires 3 multiplications and 5 additions. ) zdup f+ (f: x y u v u+v) [ fnoname ] literal f! (f: x y u v) fover f- (f: x y u v-u) [ fnoname float+ ] literal f! (f: x y u) frot fdup (f: y u x x) [ fnoname float+ ] literal f@ (f: y u x x v-u) f* [ fnoname float+ ] literal f! (f: y u x) frot fdup (f: u x y y) [ fnoname ] literal f@ (f: u x y y u+v) f* [ fnoname ] literal f! (f: u x y) f+ f* fdup (f: u*[x+y] u*[x+y]) [ fnoname ] literal f@ f- (f: u*[x+y] x*u-y*v) fswap [ fnoname float+ ] literal f@ (f: x*u-y*v u*[x+y] x*[v-u]) f+ ; (f: x*u-y*v x*v+y*u) : z+ (f: a b x y -- a+x b+y) frot f+ -frot f+ fswap ; : znegate (f: x y -- -x -y ) fswap fnegate fswap fnegate ; : z- (f: a b x y -- a-x b-y ) ( Kahan says, to conserve signed zero, write -y+b for b-y instead of -[y-b]. ) fnegate frot f+ -frot f- fswap ; : 1/z (f: z -- 1/z ) 1E -frot f/z ; : z/ (f: z1 z2 -- z1/z2 ) 1/z z* ; : z2/ (f: z -- z/2 ) f2/ fswap f2/ fswap ; : z2* (f: z -- z*2 ) f2* fswap f2* fswap ; : i* (f: x+iy -- -y+ix) fnegate fswap ; : -i* (f: x+iy -- y-ix) fswap fnegate ; : z^2 (f: z -- z^2 ) ( Kahan algorithm without removal of any spurious NaN created by overflow. It deliberately uses [x-y][x+y] instead of x^2-y^2 for the real part. ) zdup f+ [ fnoname ] literal f! zdup f- [ fnoname ] literal f@ f* (f: x y [x-y][x+y]) -frot f* f2* ; : |z|^2 (f: z -- x^2+y^2 ) f^2 fswap f^2 f+ ; 0 [IF] : z^3 zdup z^2 z* ; : z^4 z^2 z^2 ; [THEN] : z^n ( n -- f: z -- z^n ) \ raise z to integer power ( Use Z^ instead for n > 50 or so. ) 1E 0E zswap BEGIN dup 0> WHILE dup 1 and IF ztuck z* zswap THEN z^2 2/ REPEAT zdrop drop ; \ *** FUNCTIONS : |z| (f: x y -- |z|) ( This is Kahan's algorithm, which treats infinity, the precision of 1+sqrt[2], and the roundoff threshhold for large ratios of the maximum and minimum absolute values of x and y. When the IEEEFP word list is present, it also treats the IEEE underflow and invalid flags. ) fpframe>r IEEEFP [IF] FINVALID get-fflags ( invalid?.bit) [THEN] fabs fswap fabs zdup f< IF fswap THEN \ a >= b >= 0 if not NaN (f: a b ) fdup +Inf 0E f~ IF fnip +Inf (f: a=inf b=inf) THEN zdup to-fb to-fa f- fdup fa (f: t=a-b t a) fdup +Inf 0E f~ 0E f~ or (f: t) ( t=a.or.a=inf?) IF \ b is negligible (f: t) fdrop 0E (f: s=0) ELSE \ b is not negligible IEEEFP [IF] FUNDERFLOW get-fflags ( invalid?.bit underflow?.bit) [THEN] fa fover (f: t b t) f< IF \ 2 < a/b (f: t) fdrop fa fb f/ (f: u=a/b) fdup 2/epsilon f< IF \ a/b < 2/epsilon fdup f^2 1E f+ fsqrt f+ (f: s=u+sqrt[1+u^2]) ELSE fdrop 0E (f: s=0) THEN ELSE \ 1 <= a/b <= 2 (f: t) fb f/ (f: v=t/b) fdup fdup 2E f+ f* (f: v w=[2+v]v) fdup 2E f+ fsqrt sqrt2 f+ f/ f+ (f: h=w/[sqrt[2+w]+sqrt2]+v) t2p1 f+ r2p1 f+ (f: s=[t2p1+h]+r2p1) \ substitute v = u - 1 and w = u^2 - 1 to prove that \ this s expression is the same as the one above THEN (f: s) fb fswap f/ (f: s=b/s) IEEEFP [IF] ( invalid?.bit underflow?.bit) SET-FFLAGS ( invalid?.bit) \ any underflow is now deserved [THEN] THEN IEEEFP [IF] ( invalid?.bit) SET-FFLAGS [THEN] (f: s) fa f+ (f: s+a) r>frame ; : zbox (f: z -- box[z] ) ( Defined *only* for zero and infinite arguments. This differs from Kahan's CBOX [1] by conserving signs when only one of x or y is infinite, consistent with the other cases, and with its use in his ARG. ) zdup z0= IF 1E frot fcopysign fswap EXIT (f: cps[1,x] y) THEN fover fabs +Inf f= (f: z s: |x|=inf?) IF fdup fabs +Inf f= (f: z s: |y|=inf?) IF 1E frot fcopysign (f: y cps[1,x]) 1E frot fcopysign (f: cps[1,x] cps[1,y]) ELSE fover 1E fswap fcopysign (f: x y cps[1,x]) -frot fswap fabs f/ (f: cps[1,x] y/|x|) THEN ELSE fdup fabs +Inf f= (f: z s: |y|=inf?) IF fswap fover fabs f/ (f: y x/|y|) 1E frot fcopysign (f: x/|y| cps[1,y]) ELSE zdrop +NaN +NaN (f: NaN+iNaN) \ invalid use THEN THEN ; : arg (f: x y -- arg[x+iy] ) ( Kahan's algorithm, including underflow signal suppression when The IEEEFP word set is present. He says better accuracy can be obtained by further case reduction and identities like atan[y/x] = pi/4 + atan[[y-x]/[y+x]]. ) fpframe>r to-fb to-fa \ za=x+iy fa f0= fb f0= and IF 1E fa fcopysign to-fa \ fa=x=cps[1,x] THEN fa fabs fb fabs (f: |x| |y|) zdup +Inf f= +Inf f= or IF za zbox to-za \ za=z=zbox[z] THEN \ leaves signs unchanged (f: |x| |y|) f< IF pi/2 fb fcopysign fa fb f/ fatan f- (f: theta=cps[pi/2,y]-atan[x/y]) ELSE fa f0< IF \ x < 0 pi fb fcopysign fb fa f/ fatan f+ (f: theta=cps[pi,y]+atan[y/x]) ELSE fb fa f/ fatan (f: theta=atan[y/x]) THEN THEN IEEEFP [IF] fdup fabs 0.125E f> IF FUNDERFLOW clear-fflags THEN [THEN] r>frame ; IEEEFP [IF] : zssqs (f: z -- |z/2^k|^2 s: k ) ( Compute |[x+iy]/2^k|^2, scaled to avoid overflow or underflow, and leave the scaling integer k. Kahan's CSSQS, p. 200. ) fpframe>r (f: z) to-za \ fa=x, fb=y [ FOVERFLOW FUNDERFLOW or dup ] literal get-fflags ( flag.bits) literal clear-fflags ( flag.bits) >r fa f^2 fb f^2 f+ to-fc \ fc=rho=x^2+y^2 fc fnan? fc finfinite? or fa finfinite? fb finfinite? or and IF +Inf 0 (f: rho=inf s: k=0) ELSE FOVERFLOW get-fflags FUNDERFLOW get-fflags fc (f: rho) lambda/epsilon f< and or IF fa fabs fb fabs fmax filogb ( k) dup negate dup ( -k) fa fscalbn ( -k) fb fscalbn |z|^2 (f: rho.scaled s: k) ELSE fc 0 (f: rho.unscaled s: 0) THEN THEN r> ( flag.bits) set-fflags r>frame ; [THEN] : >polar (f: x+iy -- r theta ) zdup |z| -frot arg ; : polar> (f: r theta -- x+iy ) fsincos frot z*f fswap ; IEEEFP [IF] \ variable #k's 0 #k's ! \ DEBUG : zlns ( j f: z -- ln[z*2^j] ) ( Kahan, p. 202. He says to use this for nonzero j only when |z| is huge, and that it's good for functions like inverse trig and hyperbolic which go like ln[2z] at huge |z|. We blindly use his choices of threshholds, T0 = 1/sqrt2, T1 = 5/4, T2 = 3, which he says should depend on the accuracy of FLNP1. ) fpframe>r zdup zssqs to-fa (f: z s: j k) \ fa=rho=|z|^2-scaled \ ( k) dup IF 1 #k's +! THEN \ DEBUG zdup arg to-fb \ fb=theta fabs fswap fabs (f: |y| |x|) zdup fmin to-fc \ fc=min[|x|,|y|] fmax to-fd \ fd=max[|x|,|y|] ( k) dup 0= fd T1 f<= fa T2 f< or and T0 fd f< and IF ( j k) 2drop fd 1E f- fd 1E f+ f* fc f^2 f+ (f: a=[max-1][max+1]+min^2) flnp1 f2/ (f: ln1p[a]/2) ELSE fa fln f2/ (f: ln[rho]/2) ( j k) + s>d d>f ln2 f* f+ THEN fb r>frame ; [THEN] : zln (f: z -- ln|z|+iarg[z] ) ( Kahan with blind choices of threshholds T0 = 1/sqrt2, T1 = 5/4, T2 = 3, which should depend on the accuracy of FLNP1, and with large agument treatment when IEEEFP is true. ) IEEEFP [IF] 0 zlns ; [ELSE] fpframe>r zdup |z|^2 to-fa \ fa=rho=|z|^2 zdup arg to-fb \ fb=theta fabs fswap fabs (f: |y| |x|) zdup fmin to-fc \ fc=min[|x|,|y|] fmax to-fd \ fd=max[|x|,|y|] fd T1 f<= fa T2 f< or T0 fd f< and IF fd 1E f- fd 1E f+ f* fc f^2 f+ (f: a=[max-1][max+1]+min^2) flnp1 f2/ (f: ln1p[a]/2) ELSE fa fln f2/ (f: ln[rho]/2) THEN fb r>frame ; [THEN] : zexp (f: z -- exp[z] ) fsincos fswap frot fexp z*f ; : z^ (f: x y u v -- [x+iy]^[u+iv] ) zswap zln z* zexp ; : zsqrt (f: z -- sqrt[z] ) ( Kahan's algorithm, with overflow/underflow avoidance when IEEEFP is true. In either case, it should handle signed zero and infinity properly. ) fpframe>r zdup to-za IEEEFP [IF] zssqs (f: |z|^2.scaled s: k) fa fa f= IF \ x is not a NaN fsqrt fa fabs ( k) dup negate fscalbn f+ THEN ( k) dup 2 mod IF \ k is odd 1- 2/ ( [k-1]/2) ELSE \ k is even 2/ 1- ( [k/2]-1) f2* THEN fsqrt fscalbn (f: rho=sqrt[[|z|+|x|]/2]) [ELSE] |z| fa fabs f+ f2/ fsqrt (f: rho=sqrt[[|z|+|x|]/2]) [THEN] fdup f0= fb (f: rho y ) 0= IF \ rho <> 0 fdup fabs +Inf f<> (f: rho eta=y) IF fover f/ f2/ THEN (f: rho eta=[y/rho]/2) fa f0< IF \ x < 0 fabs fswap fb fcopysign (f: |eta| cps[rho,y]) THEN THEN r>frame ; \ Trigonometric and hyperbolic functions : zcosh (f: z -- cosh[z] ) ( This version is reasonably accurate and preserves signed zero on the real axis. ) fsincos [ fnoname ] literal (f: cos[y]) f! [ fnoname float+ ] literal (f: sin[y]) f! fdup fcosh [ fnoname ] literal f@ f* (f: x cosh[x]cos[y]) fswap fsinh [ fnoname float+ ] literal f@ f* (f: cosh[x]cos[y] sinh[x]sin[y]) ; : zcos (f: z -- cos[z] ) i* zcosh ; : zsinh (f: z -- sinh[z] ) ( This version is reasonably accurate and preserves signed zero. ) fsincos [ fnoname ] literal (f: cos[y]) f! [ fnoname float+ ] literal (f: sin[y]) f! fdup fsinh [ fnoname ] literal f@ f* (f: x sinh[x]cos[y]) fswap fcosh [ fnoname float+ ] literal f@ f* (f: sinh[x]cos[y] cosh[x]sin[y]) ; : zsin (f: z -- sin[z] ) i* zsinh -i* ; : ztanh (f: z -- tanh[z] ) fpframe>r fover fabs sinh[xmax]/4 f> IF \ avoid overflow 0E fswap fcopysign (f: x cps[0,y]) fswap 1E fswap fcopysign fswap (f: cps[1,x] cps[0,y]) ELSE \ divide by zero signal should have been suppressed here ftan fdup to-fa (f: x t=tan[y]) \ fa=t f^2 1E f+ to-fb (f: x) \ fb=beta=1+t^2=1/cos^2[y] fsinh fdup to-fc (f: s=sinh[x]) \ fc=s f^2 1E f+ fsqrt (f: rho=sqrt[1+s^2]=cosh[x]) fa fabs +Inf f= IF \ |t|=Inf, signal when s=0 ok (f: rho) fc f/ fa 1/f (f: rho/s+i/t) ELSE (f: rho) fb f* fc f* fa (f: z=beta*rho*s+it) fb fc f^2 f* 1E f+ z/f (f: z=z/[1+beta*s^2]) THEN THEN r>frame ; : zcoth (f: z -- 1/tanh[z] ) ztanh 1/z ; : ztan (f: z -- tan[z] ) i* ztanh -i* ; : zcot (f: z -- cot[z] ) -i* zcoth -i* ; \ *** INVERSE FUNCTIONS : zacos (f: z -- u+iv=-2i*ln[sqrt[[1+z]/2]+i*sqrt[[1-z]/2]] ) ( The principal expression is computed as: zacos[z] = 2 * atan[ Re[sqrt[1-z]] / Re[sqrt[1+z]] ] + i * asinh[ Im[ [sqrt[1+z]]* * sqrt[1-z] ] ] ) fpframe>r fover 1E f+ fover zsqrt (f: z c+id=sqrt[z+1]) to-za \ za=c+id znegate fswap 1E f+ fswap zsqrt (f: a+ib=sqrt[1-z]) fover fa f/ (f: a+ib a/c) fatan f2* -frot (f: u=2*atan[a/c] a+ib) za conjg z*>imag (f: u h=im[[a+ib][c-id]]) fasinh (f: u v=asinh[h]) r>frame ; : zacosh (f: z -- u+iv=2*ln[sqrt[[z+1]/2]+sqrt[[z-1]/2]] ) ( The principal expression is computed as: zacosh[z] = asinh[ Re[ [sqrt[z-1]]* * sqrt[z+1] ] ] + 2i * atan[ Im[ sqrt[z-1] / Re[ sqrt[z+1] ] ] ) fpframe>r fover 1E f+ fover zsqrt (f: z c+id=sqrt[z+1]) to-za \ za=c+id fswap 1E f- fswap zsqrt (f: a+ib=sqrt[z-1]) fdup fa f/ (f: a+ib b/c) fatan f2* -frot (f: v=2*atan[b/c] a+ib) conjg za z*>real (f: v h=re[[a-ib][c+id]]) fasinh fswap (f: u=asinh[h] v) r>frame ; : zasin (f: z -- u+iv=-iln[iz+sqrt[1-z^2]] ) ( The principal expression is computed as: zasin[z] = atan[ x / Re[ sqrt[1-z] * sqrt[1+z] ] ] + i * asinh[ Im[ [sqrt[1-z]]* * sqrt[1+z] ] ] ) fpframe>r fover (f: x) fdup to-fc (f: z x) 1E f+ fover zsqrt (f: z c+id=sqrt[z+1]) to-za \ za=c+id znegate fswap 1E f+ fswap zsqrt (f: a+ib=sqrt[1-z]) zdup za z*>real (f: a+ib g=re[[a+ib]*[c+id]]) fc (f: a+ib g x) fswap f/ fatan -frot conjg (f: u=atan[x/g] a-ib) za z*>imag (f: u h=im[[a-ib]*[c+id]) fasinh (f: u v=asinh[h]) r>frame ; : zasinh (f: z -- ln[z+sqrt[1+z^2] ) i* zasin -i* ; : zatanh (f: z -- u+iv=[ln[1+z]-ln[1-z]]/2 ) fpframe>r \ handle unsigned zero fover 1E fswap fcopysign (f: x y beta=cps[1,x]) fdup to-fa \ fa=beta z*f conjg (f: z=beta*conjg[z]) zdup fabs sqrt[xmax]/4 f> sqrt[xmax]/4 f> or IF \ avoid overflow pi/2 fover fcopysign -frot (f: v=cps[pi/2,y] z) 1/z real fswap (f: u=re[1/z] v) ELSE fdup fabs 4/sqrt[xmax] f+ (f: z a=|y|+rho) to-fb \ fb=a fover (f: x) 1E f= (f: z) IF \ x=1 fnip (f: y) fdup fdup f* 4E f+ fsqrt fsqrt (f: y b=sqrt[sqrt[4+y^2]]) fb fsqrt f/ fln fswap (f: u=ln[b/sqrt[a]] y) fb f2/ fatan pi/2 f+ (f: u y d=pi/2+atan[a/2]) fswap fcopysign f2/ (f: u v=cps[d,y]/2) ELSE \ normal case, use ln1p(u) accurately even if u is tiny fswap fdup fnegate 1E f+ (f: y x 1-x) fover 1E f+ (f: y x 1-x 1+x) fover f* to-fc (f: y x 1-x) \ fc=g=[1-x^2] f^2 fb f^2 (f: y x h=[1-x]^2 a^2) fdup to-fb \ fb=a^2 f+ f/ 4E f* flnp1 4E f/ (f: y u=ln1p[4x/[h+a^2]]/4) fswap f2* fc fb (f: u 2y g a^2) f- fswap arg f2/ (f: u v=arg[g-a^2+i2y]/2) THEN THEN conjg fa z*f (f: u+iv=beta*[u-iv]) r>frame ; : zatan (f: z -- [ln[1+iz]-ln[1-iz]]/2i ) i* zatanh -i* ; \ *** REFERENCES ( 1. William Kahan, "Branch cuts for complex elementary functions", The State of the Art in Numerical Analysis, A. Iserles and M.J.D. Powell, eds., Clarendon Press, Oxford, 1987, pp. 165-211. http://www.cims.nyu.edu/~dbindel/class/cs279/slits02.pdf http://www.cims.nyu.edu/~dbindel/class/cs279/slits34.pdf http://www.cims.nyu.edu/~dbindel/class/cs279/slits56.pdf 2. Robert M. Corless, James H. Davenport, David J. Jeffrey, Stephen M. Watt, "'According to Abramowitz and Stegun' or arcoth needn't be uncouth", ACM SIGSAM Bulletin, June, 2000, pp. 58-65. http://portal.acm.org/citation.cfm?doid=362001.362023 http://www.apmaths.uwo.ca/~djeffrey/Offprints/couth.pdf 3. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, US Government Printing Office, 10th Printing December 1972, Secs. 4.4, 4.6. http://www.convertit.com/Go/ConvertIt/Reference/AMS55.ASP )