(         Title:  ANS Forth Complex Arithmetic Lexicon
                  with Kahan Algorithms
           File:  complex-kahan.fs
       Log file:  complex-kahan.log
     Test files:  complex-test.fs, complex-ieee-test.fs
        Version:  1.0.3
Original Author:  Julian V. Noble
    Modified by:  David N. Williams
        License:  LGPL
  Last revision:  December 12, 2010

The original code is
)
\ Copyright (C) 1998 Julian V. Noble
(
Modifications not derived from the original, including
expressions of the Kahan algorithms, are
)
\ Copyright (C) 2002-2005, 2008-2010 David N. Williams
(
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or at your option any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

If you take advantage of the option in the LGPL to put a
particular version of this library under the GPL, the author[s]
would regard it as polite if you would put any direct
modifications under the LGPL as well, and include a copy of this
request near the beginning of the modified library source.  A
"direct modification" is one that enhances or extends the
library in line with its original concept, as opposed to
developing a distinct application or library which might use it.

ANS Forth environmental dependences:

  1. Requires the Floating-Point and Floating-Point extension
     word sets.
  2. Assumes a separate floating point stack.
  3. Does not construct a separate complex number stack.
  4. Case insensitivity for standard words.
  5. The ANS FORTH default floats must be 32-bit or 64-bit IEEE
     754 to work correctly with signed zero/infinity.
  6. Some of the Kahan algorithms are optimized for radix 2.
  7. Machine parameters like EPSILON and a few others have to be
     uncommented, or supplied if they are not 32-bit or 64-bit
     IEEE 754.  The parameter section assumes that raw floats
     are represented in memory as contiguous sets of bytes that
     are totally big-endian or little-endian, with the raw
     representation of -1E nonzero in the logical first byte and
     zero in the logical last byte.
  8. If not present in the host FORTH, FSIGNBIT assumes default
     floats to be IEEE 754 32 or 64 bits, and also assumes the
     cell size to be a multiple 32 bits.
  9. There is a system dependence on the sign of NaN produced by
     an arithmetic operation.  In particular, Z^ currently gives
     NaN's with plus signs on ppc and minus signs on intel
     systems, for both 0+i0^0+i0 and 0+i0^1+i0, with both pfe
     and gforth.  So does F/ for 0/0.

Complex numbers z=x+iy are stored on the fp stack as [f: x y],
also written as [f: z].

Angles are in radians.

This code uses higher-accuracy algorithms by William Kahan [1].
It uses the principal argument, with -pi < arg <= pi, for ARG,
ZSQRT, ZLN, and Z^.  The Kahan alorithms implement an OpenMath
compliant treatment of principal expressions, branch cuts, and
branches for the elementary functions [2], which is based on
Abramowitz and Stegun [3].

Currently the only high-level way we have to save and restore
IEEE underflow and overflow flags, and to use IEEE scaling
functions to avoid underflow/overflow, is through a pfe
environment, IEEEFP-EXT.  The parts of the Kahan algorithms that
do that are conditionally compiled here, when the environment is
present.  Kahan says that the algorithms degrade gracefully
under simple omission of these procedures, which is what happens
here when IEEEFP-EXT is absent.

Kahan pays attention to signed zero, where available in IEEE
754/854 implementations.  We address that in this library as
follows.

1. ZSINH, ZASINH, ZTANH, ZATANH, ZSIN, ZASIN, ZTAN, and ZATAN
   conserve the sign of zero, and 1/Z and the minimal
   computation words Z*F, Z/F, F*Z, F/Z, Z*I*F, -I*Z/F, I*F*Z,
   I*F/Z do the right thing.

2. We would like the analytic functions that are real and
   analytic on the real axis to do the right thing for the sign
   of the zero imaginary part.  This works for all of our tests
   so far.

3. For functions having branch cuts, signed zero in the
   appropriate x or y input produces correct values on the cuts.
)

s" FLOATING-EXT" environment? dup [IF] ( flag) and [THEN]
0= [IF] cr .( ***Floating-point extensions not available.) cr
ABORT [THEN]

s" FLOATING-STACK" environment? dup [IF] ( maxdepth) and [THEN]
0= [IF] cr .( ***Floating-point stack not separate.) cr
ABORT [THEN]

s" IEEEFP-EXT" environment? [IF] ( version) drop
  true [ELSE] false [THEN] CONSTANT IEEEFP immediate


\ *** WORDS
(
NONSTANDARD FLOATING POINT
  (f:
  -frot  fnip  ftuck
  s>f  1/f  f^2  f^n  f2*  f2/  f0>  f=  f>  fsignbit

CONSTANTS
  BIG-ENDIAN  BITS/FLOAT  IEEEFP

BADEN-STYLE FLOAT LOCALS
  framef  unframe  fpframe>r  r>frame
  &fa  &fb
  fa  fb  fc  fd  fe  ff  fg  fh
  to-fa  to-fb  to-fc  to-fd  to-fe  to-ff  to-fg  to-fh

FCONSTANTS
  pi/2  pi  -pi  sqrt2  1/sqrt  ln2  2pi

RAW FLOAT GENERATION [deleted with a marker]
  hex-byte  raw-hex>float

GENERATED EXACT FCONSTANTS
  r2p1  t2p1
  +Inf  -Inf  +NaN  epsilon  xmax
  2/epsilon  sqrt[xmax]/4  4/sqrt[xmax]  sinh[xmax]/4

LOAD AND STORE
  COMPLEX  COMPLEXES  COMPLEX+
  ZVARIABLE  ZCONSTANT  ZLITERAL
  z@  z!  x@  x!  y@  y!
  za  zb  zc  zd  to-za  to-zb  to-zc  to-zd
 
OUTPUT
  z.  zs.

FLOATING POINT STACK
  z=  zdrop  zdup  zswap  zover  znip  ztuck  zrot  -zrot

MINIMAL [MIXED] OPERATIONS
  x+  x-  y+  y-
  z*>real  z*>imag
  z*f  z/f  f*z  f/z  z*i*f  -i*z/f  i*f*z  i*f/z

ARITHMETIC
  conjg  real  imag  cmplx
  i*  -i*  znegate
  z+  z-  z*  z/
  z2*  z2/  1/z  z^2  z^n  |z|^2

FUNCTIONS
  |z|  arg  >polar  polar>  zbox
  zsqrt  zexp  zln  z^
  zsinh  zcosh  ztanh
  zsin  zcos  ztan

INVERSE FUNCTIONS
  zasinh  zacosh  zatanh
  zasin  zacos  zatan
)


\ *** CONSTANTS

-1E pad faligned dup f! c@
[IF]   true  \ memory floats are big endian
[ELSE] false \ memory floats are little endian
[THEN] CONSTANT BIG-ENDIAN immediate

[UNDEFINED] BITS/FLOAT [IF]
:NONAME  ( -- +n )
  s" MAX-FLOAT" environment? 0= ABORT" ***can't determine MAX-FLOAT"
  pad 3 represent 0= ABORT" ***MAX-FLOAT invalid" \ shouldn't happen
  ( exp plus?) drop
  CASE
      39 OF 32 ENDOF  \ .340282347e+39
     309 OF 64 ENDOF  \ .17976931348623157e+309
    4933 OF 80 ENDOF  \ .118973149535723176505e+4933
    ABORT" ***BITS/FLOAT must be 32, 64, or 80"
  ENDCASE ; EXECUTE CONSTANT BITS/FLOAT
[THEN]


\ *** BADEN-STYLE FLOAT LOCALS
(
Wil Baden's cheap, nestable locals written for floats.
)

: framef  ( -- addr )  here falign 8 floats allot ;
: unframe ( addr -- )  here - allot ; 

: fpframe>r  ( -- )  postpone framef postpone >r ; immediate
: r>frame    ( -- )  postpone r> postpone unframe ; immediate

: &fa    here [ 8 floats ] literal - ;
: &fb    here [ 7 floats ] literal - ;
: fa     here [ 8 floats ] literal - f@ ;
: fb     here [ 7 floats ] literal - f@ ;
: fc     here [ 6 floats ] literal - f@ ;
: fd     here [ 5 floats ] literal - f@ ;
: fe     here [ 4 floats ] literal - f@ ;
: ff     here [ 3 floats ] literal - f@ ;
: fg     here [ 2 floats ] literal - f@ ;
: fh     here [ 1 floats ] literal - f@ ;
: to-fa  here [ 8 floats ] literal - f! ;
: to-fb  here [ 7 floats ] literal - f! ;
: to-fc  here [ 6 floats ] literal - f! ;
: to-fd  here [ 5 floats ] literal - f! ;
: to-fe  here [ 4 floats ] literal - f! ;
: to-ff  here [ 3 floats ] literal - f! ;
: to-fg  here [ 2 floats ] literal - f! ;
: to-fh  here [ 1 floats ] literal - f! ;


\ *** NONSTANDARD FLOATING POINT

\ non-Standard fp words jvn has found useful
[UNDEFINED] s>f    [IF]  : s>f    s>d  d>f  ;           [THEN]
[UNDEFINED] -frot  [IF]  : -frot  frot  frot  ;         [THEN]
[UNDEFINED] fnip   [IF]  : fnip   fswap  fdrop  ;       [THEN]
[UNDEFINED] ftuck  [IF]  : ftuck  fswap  fover ;        [THEN]
[UNDEFINED] 1/f    [IF]  : 1/f    1E  fswap  f/ ;       [THEN]
[UNDEFINED] f^2    [IF]  : f^2    fdup  f*  ;           [THEN]
\ added by dnw
[UNDEFINED] (f:    [IF]  : (f:   postpone ( ; IMMEDIATE [THEN]
[UNDEFINED] f2*    [IF]  : f2*   2E f* ;                [THEN]
[UNDEFINED] f2/    [IF]  : f2/   0.5E f* ;              [THEN]
[UNDEFINED] f0>    [IF]  : f0>   fnegate f0< ;          [THEN]
[UNDEFINED] f=     [IF]  : f=    f- f0= ;               [THEN]    
[UNDEFINED] f>     [IF]  : f>    fswap f< ;             [THEN]    

[UNDEFINED] f^n [IF]
  : f^n    ( u f: x -- x^u )
    1E ( u) 0 ?DO fover f* LOOP fnip ;
[THEN]

[UNDEFINED] fsignbit [IF]  \ emulate IEEE signbit(x)
  : fsignbit  (f: x -- s: minus? )
    fpframe>r
      (f: x) to-fa
      BIG-ENDIAN 
      [IF]   &fa
      [ELSE] &fb 1- [THEN]
      c@ 128 and 0= invert
    r>frame ;
[THEN]

(
In ANS Forth 94 FATAN2 is ambiguous.  The following assumes it
to follow one of only two known conventions, -pi < arg < pi [or
maybe <=], or 0 <= arg < 2pi.
)
  -1E 1E (f: y x) fatan2 f0>
  [IF] : fatan2  (f: y x -- atan2[y,x])
         ( x) fdup f0< FATAN2
         IF 2pi F- THEN ; [THEN]

(
The x y order below follows the readability principle for
implementing functions in Forth based on other standards.
)
[UNDEFINED] fcopysign [IF] \ emulate IEEE copysign(x,y) = |x|*sgn(y)
  : fcopysign  ( f: x y -- |x|*sgn[y] )
    fsignbit fabs IF fnegate THEN ;
[THEN]

s" IFORTH" environment? [IF]
  ( flag) drop
: FASINH  fdup fsignbit fabs fasinh IF fnegate THEN ;
: FLNP1  1e f+ fln ;
[THEN]


\ *** FCONSTANTS

\ accuracies good up to the 21 significant digits required
\ for Intel extended double (Sparc requires 36)
 1.570796326794896619231E  FCONSTANT pi/2
 6.283185307179586476925E  FCONSTANT 2pi
 3.141592653589793238463E  FCONSTANT pi
-3.141592653589793238463E  FCONSTANT -pi
 1.414213562373095048802E  FCONSTANT sqrt2
\ 0.7071067811865475244008E FCONSTANT 1/sqrt2
 0.70710678118654752440084436210484903928E FCONSTANT 1/sqrt2
\ 0.6931471805599453094172E FCONSTANT ln2
 0.69314718055994530941723212145817656808E FCONSTANT ln2


\ *** RAW FLOAT GENERATION

MARKER -EXACT-PARAMS

BITS/FLOAT 8 / CONSTANT /float  \ assume 1au = 1 byte

: hex-byte  ( caddr u -- caddr+2 u-2 byte )
  base @ >r hex
  over 0 0 rot 2 >number                 ( caddr u dbyte=0 addr #rem)
  ABORT" ***Not a hex string."
  ( dhi=0 addr) 2drop ( byte) >r 2 /string r> ( byte)
  r> base ! ;

falign here /float allot CONSTANT fscratch 

: raw-hex>float  ( caddr u -- f: f )
  ( u) dup /float 2* <> ABORT" ***Wrong size for raw float string."
BIG-ENDIAN [IF]
  fscratch /float + fscratch ( 'end+1 'start)
  DO hex-byte i c! LOOP
[ELSE]
  fscratch dup /float + 1- ( 'start ;end)
  DO hex-byte i c! -1 +LOOP
[THEN]
  ( caddr' 0) 2drop fscratch f@ ;

\ In the following, the constants for 1+sqrt(2) used by the
\ Kahan algorithms represent double the precision of the
\ corresponding float type.  The leading one is chopped, and the
\ trailing one is rounded to nearest with even ties.

BITS/FLOAT 80 = [IF]  \ double extended
  s" 3FC08000000000000000" raw-hex>float (f: eps)  \ 0x1p-63
  s" 7FFEFFFFFFFFFFFFFFFF" raw-hex>float (f: xmax)
  s" 7FFF8000000000000000" raw-hex>float (f: +Inf)
  s" FFFF8000000000000000" raw-hex>float (f: -Inf)
  s" 7FFFC000000000000000" raw-hex>float (f: +NaN)  \ quiet
  \ used by Kahan algorithms
  s" 40009A827999FCEF3242" raw-hex>float (f: r2p1=1+sqrt[2])
  s" 3FBEB2FB1366EA957D3F" raw-hex>float (f: t2p1=[1+sqrt[2]]-r2p1)
[THEN]

BITS/FLOAT 64 = [IF]  \ double
  s" 3CB0000000000000" raw-hex>float (f: eps)  \ 0x1p-52
  s" 7FEFFFFFFFFFFFFF" raw-hex>float (f: xmax)
  s" 7FF0000000000000" raw-hex>float (f: +Inf)
  s" FFF0000000000000" raw-hex>float (f: -Inf)
  s" 7FF8000000000000" raw-hex>float (f: +NaN)  \ quiet
  \ used by Kahan algorithms
  s" 4003504F333F9DE6" raw-hex>float (f: r2p1=1+sqrt[2])
  s" 3CA21165F626CDD5" raw-hex>float (f: t2p1=[1+sqrt[2]]-r2p1)
[THEN]

BITS/FLOAT 32 = [IF]  \ single
  s" 34000000" raw-hex>float (f: eps)   \ 0x1p-23
  s" 7F7FFFFF" raw-hex>float (f: xmax)
  s" 7F800000" raw-hex>float (f: +Inf)
  s" FF800000" raw-hex>float (f: -Inf)
  s" 7FC00000" raw-hex>float (f: +NaN)  \ quiet
  \ used by Kahan algorithms
  s" 401A8279" raw-hex>float (f: r2p1=1+sqrt[2])
  s" 3419FCEF" raw-hex>float (f: t2p1=[1+sqrt[2]]-r2p1)
[THEN]

-EXACT-PARAMS


\ *** GENERATED RAW FCONSTANTS

(f: eps xmax +Inf -Inf +NaN r2p1 t2p1)
FCONSTANT t2p1    FCONSTANT r2p1
FCONSTANT +NaN    FCONSTANT -Inf   FCONSTANT +Inf
FCONSTANT xmax    FCONSTANT epsilon

\ used by Kahan algorithms
2E epsilon f/     FCONSTANT 2/epsilon
xmax fsqrt 4E f/  FCONSTANT sqrt[xmax]/4
sqrt[xmax]/4 1/f  FCONSTANT 4/sqrt[xmax]
xmax fasinh 4E f/ FCONSTANT sinh[xmax]/4
1/sqrt2           FCONSTANT T0  \ for ZLN
5E 4E f/          FCONSTANT T1
3E                FCONSTANT T2

IEEEFP [IF]
  4E 1E epsilon f- f* xmax f/ epsilon f/
      FCONSTANT lambda/epsilon 
[THEN]


\ *** LOAD AND STORE

: COMPLEXES ( n -- n*/complex )           2* FLOATS ;
: COMPLEX+  ( f-addr -- f-addr+/complex ) [ 1 COMPLEXES ] literal + ;
[UNDEFINED] COMPLEX [IF] 1 COMPLEXES CONSTANT COMPLEX [THEN]

\ hidden, *nonnestable* scratch storage for stuff from fp stack
\ falign here 6 floats allot value fnoname
  falign here 3 complexes allot value fnoname

: z@  ( zaddr --  f: -- z )  dup  f@  float+  f@ ;
: z!  ( zaddr --  f: z -- )  dup  float+  f!  f! ;
: x@  ( zaddr -- f: x )  f@ ;
: y@  ( zaddr -- f: y )  float+ f@ ;
: x!  ( zaddr f: x -- )  f! ;
: y!  ( zaddr f: y -- )  float+ f! ; 

\ cheap locals for complex values, sharing the floats space:
: za     here [ 4 complexes ] literal - z@ ;
: zb     here [ 3 complexes ] literal - z@ ;
: zc     here [ 2 complexes ] literal - z@ ;
: zd     here [ 1 complexes ] literal - z@ ;
: to-za  here [ 4 complexes ] literal - z! ;
: to-zb  here [ 3 complexes ] literal - z! ;
: to-zc  here [ 2 complexes ] literal - z! ;
: to-zd  here [ 1 complexes ] literal - z! ;

: ZVARIABLE  ( "name" -- faddr )
  falign here [ 1 complexes ] literal allot CONSTANT ;

: ZLITERAL  (f: z -- )  \ compile
            (f: -- z )  \ run
  state @ 0= IF -14 throw THEN
  fswap postpone fliteral postpone fliteral ; immediate

: ZCONSTANT  ( "name" f: z -- )  \ compile
             (f: -- z )          \ run
(
Based on Anton Ertl's portable definition of CONSTANT,
comp.lang.forth, "Re: Alternative DEFER strategies?", 16 Dec
2008, but avoiding the return stack.
)
  [ fnoname ] literal z!
  : [ fnoname ] literal z@ postpone zliteral postpone ; ;


\ *** OUTPUT

: .f[bl|-]  (f: r -- |r| )
  fdup fsignbit fabs
  dup invert bl and   swap [char] - and + emit ;

: .f[+|-]i  (f: r -- |r| )
  fdup fsignbit fabs
  dup invert [char] + and   swap [char] - and + emit ."  i" ;

: z.   (f: z -- )    \ emit complex #
  fswap .f[bl|-] f.   .f[+|-]i f. ;

: zs.  (f: z -- )    \ emit complex #, scientific notation
  fswap .f[bl|-] fs.   .f[+|-]i fs. ;


\ *** FLOATING POINT STACK

: real  (f: x y -- y )  fdrop ;
: imag  (f: x y -- x )  fnip ;
: conjg (f: x y -- x -y )  fnegate  ;
: zdrop (f: z --)      fdrop fdrop ;
: zdup  (f: z -- z z ) fover fover ;

: zswap  (f: z1 z2 -- z2 z1)
  [ fnoname ] literal  f!  -frot
  [ fnoname ] literal  f@  -frot  ;

: znip   (f: z1 z2 -- z2 )  zswap  zdrop ;

: zover  (f: z1 z2 -- z1 z2 z1 )
  frot [ fnoname float+ ]  literal f!   (f: -- x u v)
  frot fdup [ fnoname ]    literal f!   (f: -- u v x)
  -frot [ fnoname float+ ] literal f@   (f: -- x u v y)
  -frot [ fnoname ]        literal z@ ; (f: -- x y u v x y)

: ztuck  (f: z1 z2 -- z2 z1 z2 )  zswap zover ;
: z=     (f: z1 z2 -- s: flag )   frot f= f= and ;
: z0=    (f: z -- s: flag )       f0= f0= and ;

: zrot    ( f: z1 z2 z3 -- z2 z3 z1 )
  [ fnoname ]            literal z!
  [ fnoname 1 complexes + ] literal z!
  [ fnoname 2 complexes + ] literal z!
  [ fnoname 1 complexes + ] literal z@
  [ fnoname ]            literal z@
  [ fnoname 2 complexes + ] literal z@ ;

: -zrot   ( f: z1 z2 z3 -- z3 z1 z2 )
  [ fnoname ]            literal z!
  [ fnoname 1 complexes + ] literal z!
  [ fnoname 2 complexes + ] literal z!
  [ fnoname ]            literal z@
  [ fnoname 2 complexes + ] literal z@
  [ fnoname 1 complexes + ] literal z@ ;


\ *** MINIMAL [MIXED] OPERATIONS
(
These words implement Kahan's rules [pp. 194, 195] for minimal
computation, to avoid gratuitous overflow, underflow, and signed
zero distortion in real or imaginary parts that don't need to be
computed.
)

: x+  (f: x y a -- x+a y )  frot f+ fswap ;
: y+  (f: x y a -- x y+a )  f+ ;
: x-  (f: x y a -- x-a y )  fnegate frot f+ fswap ;
: y-  (f: x y a -- x y-a )  f- ;

: z*>real  (f: z1 z2 -- re[z1*z2] )  \ compute z* without doing Im
  frot [ fnoname ] literal f* f!        (f: x1 x2)
  f* [ fnoname ] literal f@ f- ;        (f: x1*x2-y1*y2)

: z*>imag  (f: z1 z2 -- im[z1*z2] )  \ compute z* without doing Re
  [ fnoname ] literal f! f* fswap       (f: y1*x2 x1)
  [ fnoname ] literal f@ f* f+ ;        (f: y1*x2+x1*y2)

: z*f  (f: z f -- f*z )
  frot fover f*                         (f: y f x*f)
  -frot f* ;

: z/f  (f: z f -- z/f )
  frot fover f/                         (f: y f x/f)
  -frot f/ ;

: f*z  (f: f z -- f*z )
  -frot fover f*                        (f: y f x*f)
  -frot f* ;

: (f/z)  (f: f a b -- f/[b+[a/b]a] [a/b]f/[b+[a/b]a] )
(
A factor in F/Z, used with |a| <= |b|.
)
fpframe>r
  zdup f/ to-fa                         (f: f a b) \ fa=a/b
  fswap fa f* f+ f/                     (f: u=f/[b+[a/b]a])
  fdup  fa f*                           (f: u v=u[a/b])
r>frame ;

: f/z  (f: f z  -- f/z )
(
Kahan algorithm *without* due attention to spurious
over/underflows and zeros and infinities.
)
  zdup fabs fswap fabs                  (f: f z |y| |x|)
  f< IF fswap (f/z)
  ELSE  (f/z) fswap
  THEN  fnegate ;                       (f: f*x/|z|^2 -f*y/|z|^2)

: z*i*f  (f: z f -- z*if )
  frot fover f*                         (f: y f x*f)
  -frot f* fnegate fswap ;              (f: -y*f x*f)

: -i*z/f  (f: z f -- z/[if] )
  frot fover f/ fnegate                 (f: y f -x/f)
  -frot f/ fswap ;                      (f: y/f -x/f)

: i*f*z  (f: f z -- if*z )
  -frot fover f*                        (f: y f x*f)
  -frot f* fnegate fswap ;              (f: -y*f x*f)                           

: i*f/z  (f: f z -- if/z )
(
Kahan algorithm *without* due attention to spurious
over/underflows and zeros and infinities.
)
  zdup fabs fswap fabs                  (f: f z |y| |x|)
  f< IF fswap (f/z)
  ELSE  (f/z) fswap
  THEN fswap ;                          (f: f*y/|z|^2 f*x/|z|^2)


\ *** ARITHMETIC

: z*    (f: x y u v -- x*u-y*v  x*v+y*u)
(
Uses the algorithm followed by jvn:
  [x+iy]*[u+iv] = [[x+y]*u - y*[u+v]] + i[[x+y]*u + x*[v-u]]
Requires 3 multiplications and 5 additions.
)
  zdup f+                               (f: x y u v u+v)
  [ fnoname ] literal f!                (f: x y u v)
  fover f-                              (f: x y u v-u)
  [ fnoname float+ ] literal f!         (f: x y u)
  frot fdup                             (f: y u x x)
  [ fnoname float+ ] literal f@         (f: y u x x v-u)
  f*
  [ fnoname float+ ] literal f!         (f: y u x)
  frot fdup                             (f: u x y y)
  [ fnoname ] literal f@                (f: u x y y u+v)
  f*
  [ fnoname ] literal f!                (f: u x y)
  f+  f* fdup                           (f: u*[x+y] u*[x+y])
  [ fnoname ] literal f@ f-             (f: u*[x+y] x*u-y*v)
  fswap
  [ fnoname float+ ] literal f@         (f: x*u-y*v u*[x+y] x*[v-u])
  f+ ;                                  (f: x*u-y*v x*v+y*u)

: z+  (f: a b x y -- a+x b+y)  frot f+ -frot f+ fswap ;

: znegate  (f: x y -- -x -y )  fswap fnegate fswap fnegate ;

: z-  (f: a b x y -- a-x b-y )
(
Kahan says, to conserve signed zero, write -y+b for b-y instead
of -[y-b].
)
  fnegate frot f+ -frot f- fswap ;

: 1/z  (f: z -- 1/z )        1E -frot f/z ;
: z/   (f: z1 z2 -- z1/z2 )  1/z  z* ;
: z2/  (f: z -- z/2 )  f2/   fswap  f2/  fswap  ;
: z2*  (f: z -- z*2 )  f2*   fswap  f2*  fswap  ;
: i*   (f: x+iy -- -y+ix)    fnegate fswap ;
: -i*  (f: x+iy -- y-ix)     fswap fnegate ;

: z^2  (f: z -- z^2 )
(
Kahan algorithm without removal of any spurious NaN created by
overflow.  It deliberately uses [x-y][x+y] instead of x^2-y^2
for the real part.
)
  zdup f+ [ fnoname ] literal f!
  zdup f- [ fnoname ] literal f@ f*     (f: x y [x-y][x+y])
  -frot f* f2* ;

: |z|^2  (f: z -- x^2+y^2 )  f^2 fswap f^2 f+  ;

0 [IF]
: z^3   zdup  z^2  z* ;
: z^4   z^2  z^2  ;
[THEN]

: z^n      ( n -- f: z -- z^n )    \ raise z to integer power
(
Use Z^ instead for n > 50 or so.
)
  1E 0E zswap
  BEGIN  dup  0>
  WHILE  dup 1 and IF ztuck z* zswap THEN z^2 2/
  REPEAT zdrop drop ;


\ *** FUNCTIONS

: |z|   (f: x y -- |z|)
(
This is Kahan's algorithm, which treats infinity, the precision
of 1+sqrt[2], and the roundoff threshhold for large ratios of
the maximum and minimum absolute values of x and y.  When the
IEEEFP word list is present, it also treats the IEEE underflow
and invalid flags.
)
fpframe>r
IEEEFP [IF]
  FINVALID get-fflags                   ( invalid?.bit)
[THEN]
  fabs fswap fabs
  zdup f<
  IF  fswap
  THEN  \ a >= b >= 0 if not NaN        (f: a b )
  fdup +Inf 0E f~
  IF  fnip +Inf                         (f: a=inf b=inf)
  THEN
   
  zdup to-fb  to-fa
  f- fdup fa                            (f: t=a-b t a)
  fdup +Inf 0E f~   0E f~ or            (f: t) 

  ( t=a.or.a=inf?)
  IF    \ b is negligible
    (f: t) fdrop 0E                     (f: s=0) 

  ELSE  \ b is not negligible
IEEEFP [IF]
    FUNDERFLOW get-fflags               ( invalid?.bit underflow?.bit)
[THEN]
    fa fover                            (f: t b t)

    f< IF  \ 2 < a/b
      (f: t) fdrop
      fa fb f/                          (f: u=a/b)
      fdup 2/epsilon f<
      IF  \ a/b < 2/epsilon
        fdup f^2 1E f+ fsqrt f+         (f: s=u+sqrt[1+u^2])
      ELSE
        fdrop 0E                        (f: s=0)
      THEN

    ELSE  \ 1 <= a/b <= 2
      (f: t)
      fb f/                             (f: v=t/b)
      fdup fdup 2E f+ f*                (f: v w=[2+v]v)
      fdup 2E f+ fsqrt sqrt2 f+ f/ f+   (f: h=w/[sqrt[2+w]+sqrt2]+v)
      t2p1 f+ r2p1 f+                   (f: s=[t2p1+h]+r2p1)
      \ substitute v = u - 1 and w = u^2 - 1 to prove that
      \ this s expression is the same as the one above

    THEN
    (f: s)
    fb fswap f/                         (f: s=b/s)
IEEEFP [IF]
    ( invalid?.bit underflow?.bit)
    SET-FFLAGS                          ( invalid?.bit)
    \ any underflow is now deserved
[THEN]

  THEN
IEEEFP [IF]
  ( invalid?.bit) SET-FFLAGS
[THEN]
  (f: s) fa  f+                         (f: s+a)
r>frame ;

: zbox  (f: z -- box[z] )
(
Defined *only* for zero and infinite arguments.  This differs
from Kahan's CBOX [1] by conserving signs when only one of x or
y is infinite, consistent with the other cases, and with its use
in his ARG.
)
  zdup z0=
  IF
    1E frot fcopysign fswap EXIT        (f: cps[1,x] y)
  THEN
  fover fabs +Inf f=                    (f: z s: |x|=inf?)
  IF fdup fabs +Inf f=                  (f: z s: |y|=inf?)
    IF
      1E frot fcopysign                 (f: y cps[1,x])
      1E frot fcopysign                 (f: cps[1,x] cps[1,y])
    ELSE
      fover 1E fswap fcopysign          (f: x y cps[1,x])
      -frot fswap fabs f/               (f: cps[1,x] y/|x|)
    THEN
  ELSE fdup fabs +Inf f=                (f: z s: |y|=inf?)
    IF
      fswap fover fabs f/               (f: y x/|y|)
      1E frot fcopysign                 (f: x/|y| cps[1,y])
    ELSE
      zdrop +NaN +NaN                   (f: NaN+iNaN)  \ invalid use
    THEN
  THEN ;

: arg  (f: x y -- arg[x+iy] )
(
Kahan's algorithm, including underflow signal suppression when
The IEEEFP word set is present.  He says better accuracy can be
obtained by further case reduction and identities like atan[y/x]
= pi/4 + atan[[y-x]/[y+x]].
)
fpframe>r
  to-fb to-fa                           \ za=x+iy
  fa f0= fb f0= and
  IF
    1E fa fcopysign to-fa                \ fa=x=cps[1,x]
  THEN
  fa fabs fb fabs                       (f: |x| |y|)
  zdup +Inf f= +Inf f= or
  IF
    za zbox to-za                       \ za=z=zbox[z]
  THEN  \ leaves signs unchanged
  (f: |x| |y|) f<
  IF
    pi/2 fb fcopysign fa fb f/ fatan f-  (f: theta=cps[pi/2,y]-atan[x/y])
  ELSE
    fa f0<
    IF   \ x < 0
      pi fb fcopysign fb fa f/ fatan f+  (f: theta=cps[pi,y]+atan[y/x])
    ELSE
      fb fa f/ fatan                     (f: theta=atan[y/x])
    THEN
  THEN
IEEEFP [IF]
  fdup fabs 0.125E f> IF FUNDERFLOW clear-fflags THEN
[THEN]
r>frame ;

IEEEFP [IF]
: zssqs  (f: z -- |z/2^k|^2 s: k )
(
Compute |[x+iy]/2^k|^2, scaled to avoid overflow or underflow,
and leave the scaling integer k.  Kahan's CSSQS, p. 200.
)
fpframe>r
  (f: z) to-za                          \ fa=x, fb=y
  [ FOVERFLOW FUNDERFLOW or dup ]
      literal get-fflags                ( flag.bits)
      literal clear-fflags
  ( flag.bits) >r
  fa f^2 fb f^2 f+ to-fc                \ fc=rho=x^2+y^2
  fc fnan? fc finfinite? or
  fa finfinite? fb finfinite? or and
  IF +Inf 0                             (f: rho=inf s: k=0)
  ELSE
    FOVERFLOW  get-fflags
    FUNDERFLOW get-fflags
    fc (f: rho) lambda/epsilon f<
    and or
    IF
      fa fabs fb fabs fmax filogb       ( k)
      dup negate dup
      ( -k) fa fscalbn ( -k) fb fscalbn
      |z|^2                             (f: rho.scaled s: k)
    ELSE fc 0                           (f: rho.unscaled s: 0)
    THEN
  THEN
  r> ( flag.bits) set-fflags
r>frame ;
[THEN]

: >polar  (f: x+iy -- r theta )  zdup |z| -frot arg ;
: polar>  (f: r theta -- x+iy )  fsincos frot z*f fswap  ;

IEEEFP [IF]
\ variable #k's 0 #k's !   \ DEBUG

: zlns  ( j f: z -- ln[z*2^j] )
(
Kahan, p. 202.  He says to use this for nonzero j only when |z|
is huge, and that it's good for functions like inverse trig and
hyperbolic which go like ln[2z] at huge |z|.  We blindly use his
choices of threshholds, T0 = 1/sqrt2, T1 = 5/4, T2 = 3, which he
says should depend on the accuracy of FLNP1.
)
fpframe>r
  zdup zssqs to-fa                      (f: z s: j k)
                                        \ fa=rho=|z|^2-scaled
\ ( k) dup IF 1 #k's +! THEN  \ DEBUG
  zdup arg to-fb                        \ fb=theta
  fabs fswap fabs                       (f: |y| |x|)
  zdup fmin to-fc                       \ fc=min[|x|,|y|]
  fmax to-fd                            \ fd=max[|x|,|y|]
  ( k) dup 0=
  fd T1 f<=   fa T2 f<   or and
  T0 fd f<    and
  IF
    ( j k) 2drop
    fd 1E f-   fd 1E f+ f*
        fc f^2 f+                       (f: a=[max-1][max+1]+min^2)
    flnp1 f2/                           (f: ln1p[a]/2)
  ELSE
    fa fln f2/                          (f: ln[rho]/2)
    ( j k) + s>d d>f ln2 f* f+
  THEN
  fb
r>frame ;
[THEN]

: zln  (f: z -- ln|z|+iarg[z] )
(
Kahan with blind choices of threshholds T0 = 1/sqrt2, T1 = 5/4,
T2 = 3, which should depend on the accuracy of FLNP1, and with
large agument treatment when IEEEFP is true.
)
IEEEFP [IF]
  0 zlns ;
[ELSE]
fpframe>r
  zdup |z|^2 to-fa                      \ fa=rho=|z|^2
  zdup arg to-fb                        \ fb=theta
  fabs fswap fabs                       (f: |y| |x|)
  zdup fmin to-fc                       \ fc=min[|x|,|y|]
  fmax to-fd                            \ fd=max[|x|,|y|]
  fd T1 f<=   fa T2 f<   or
  T0 fd f<    and
  IF
    fd 1E f-   fd 1E f+ f*
        fc f^2 f+                       (f: a=[max-1][max+1]+min^2)
    flnp1 f2/                           (f: ln1p[a]/2)
  ELSE
    fa fln f2/                          (f: ln[rho]/2)
  THEN
  fb
r>frame ;
[THEN]

: zexp  (f: z -- exp[z] )  fsincos fswap frot fexp z*f ;
: z^    (f:  x y u v --  [x+iy]^[u+iv] )  zswap zln  z* zexp  ;

: zsqrt  (f: z -- sqrt[z] )
(
Kahan's algorithm, with overflow/underflow avoidance when IEEEFP
is true.  In either case, it should handle signed zero and
infinity properly.
)
fpframe>r
  zdup to-za
IEEEFP [IF]
  zssqs                                 (f: |z|^2.scaled s: k)
  fa fa f=
  IF  \ x is not a NaN
    fsqrt fa fabs ( k) dup negate fscalbn f+
  THEN
  ( k) dup 2 mod
  IF   \ k is odd
    1- 2/ ( [k-1]/2)
  ELSE \ k is even
    2/ 1- ( [k/2]-1)   f2*
  THEN
  fsqrt fscalbn                         (f: rho=sqrt[[|z|+|x|]/2])
[ELSE]
  |z| fa fabs f+ f2/ fsqrt              (f: rho=sqrt[[|z|+|x|]/2])
[THEN]
  fdup f0= fb                           (f: rho y )
  0= IF  \ rho <> 0
    fdup fabs +Inf f<>                  (f: rho eta=y)
    IF fover f/ f2/ THEN                (f: rho eta=[y/rho]/2)
    fa f0< IF  \ x < 0
      fabs fswap fb fcopysign           (f: |eta| cps[rho,y])
    THEN      
  THEN
r>frame ;

\ Trigonometric and hyperbolic functions

: zcosh  (f: z -- cosh[z] )
(
This version is reasonably accurate and preserves signed zero on
the real axis.
)
  fsincos [ fnoname ] literal (f: cos[y]) f!
          [ fnoname float+ ] literal (f: sin[y]) f!
  fdup fcosh [ fnoname ] literal f@ f*  (f: x cosh[x]cos[y])
  fswap fsinh
      [ fnoname float+ ] literal f@ f*  (f: cosh[x]cos[y] sinh[x]sin[y])
;

: zcos  (f: z -- cos[z] )  i* zcosh  ;

: zsinh  (f: z -- sinh[z] )
(
This version is reasonably accurate and preserves signed zero.
)
  fsincos [ fnoname ] literal (f: cos[y]) f!
          [ fnoname float+ ] literal (f: sin[y]) f!
  fdup fsinh [ fnoname ] literal f@ f*  (f: x sinh[x]cos[y])
  fswap fcosh
      [ fnoname float+ ] literal f@ f*  (f: sinh[x]cos[y] cosh[x]sin[y])
;

: zsin  (f: z -- sin[z] )  i* zsinh -i* ;

: ztanh  (f: z -- tanh[z] )
fpframe>r
  fover fabs sinh[xmax]/4 f>
  IF  \ avoid overflow
    0E fswap fcopysign                  (f: x cps[0,y])
    fswap 1E fswap fcopysign fswap      (f: cps[1,x] cps[0,y])
  ELSE
    \ divide by zero signal should have been suppressed here
    ftan fdup to-fa                     (f: x t=tan[y]) \ fa=t
    f^2 1E f+ to-fb                     (f: x)   \ fb=beta=1+t^2=1/cos^2[y]
    fsinh fdup to-fc                    (f: s=sinh[x]) \ fc=s
    f^2 1E f+ fsqrt                     (f: rho=sqrt[1+s^2]=cosh[x])
    fa fabs +Inf f=
    IF  \ |t|=Inf, signal when s=0 ok
      (f: rho) fc f/ fa 1/f             (f: rho/s+i/t)
    ELSE
      (f: rho) fb f* fc f* fa           (f: z=beta*rho*s+it)
      fb fc f^2 f* 1E f+ z/f            (f: z=z/[1+beta*s^2])  
    THEN
  THEN
r>frame ;

: zcoth  (f: z -- 1/tanh[z] ) ztanh  1/z ;
: ztan   (f: z -- tan[z] )    i* ztanh -i* ;
: zcot   (f: z -- cot[z] )   -i* zcoth -i* ;


\ *** INVERSE FUNCTIONS

: zacos  (f: z -- u+iv=-2i*ln[sqrt[[1+z]/2]+i*sqrt[[1-z]/2]] )
(
The principal expression is computed as:

  zacos[z] = 2 * atan[ Re[sqrt[1-z]] / Re[sqrt[1+z]] ]
               + i * asinh[ Im[ [sqrt[1+z]]* * sqrt[1-z] ] ]
)
fpframe>r
  fover 1E f+ fover zsqrt               (f: z c+id=sqrt[z+1])
  to-za                                 \ za=c+id
  znegate fswap 1E f+ fswap zsqrt       (f: a+ib=sqrt[1-z])
  fover fa f/                           (f: a+ib a/c)
  fatan f2* -frot                       (f: u=2*atan[a/c] a+ib)
  za conjg z*>imag                      (f: u h=im[[a+ib][c-id]])
  fasinh                                (f: u v=asinh[h]) 
r>frame ; 

: zacosh  (f: z -- u+iv=2*ln[sqrt[[z+1]/2]+sqrt[[z-1]/2]] )
(
The principal expression is computed as:

  zacosh[z] = asinh[ Re[ [sqrt[z-1]]* * sqrt[z+1] ] ]
               + 2i * atan[ Im[ sqrt[z-1] / Re[ sqrt[z+1] ] ]
)
fpframe>r
  fover 1E f+ fover zsqrt               (f: z c+id=sqrt[z+1])
  to-za                                 \ za=c+id
  fswap 1E f- fswap zsqrt               (f: a+ib=sqrt[z-1])
  fdup fa f/                            (f: a+ib b/c)
  fatan f2* -frot                       (f: v=2*atan[b/c] a+ib)
  conjg za z*>real                      (f: v h=re[[a-ib][c+id]])
  fasinh fswap                          (f: u=asinh[h] v)
r>frame ;

: zasin  (f: z -- u+iv=-iln[iz+sqrt[1-z^2]] )
(
The principal expression is computed as:

  zasin[z] = atan[ x / Re[ sqrt[1-z] * sqrt[1+z] ] ]
               + i * asinh[ Im[ [sqrt[1-z]]* * sqrt[1+z] ] ]
)
fpframe>r
  fover (f: x) fdup to-fc               (f: z x)
  1E f+ fover zsqrt                     (f: z c+id=sqrt[z+1])
  to-za                                 \ za=c+id
  znegate fswap 1E f+ fswap zsqrt       (f: a+ib=sqrt[1-z])
  zdup za z*>real                       (f: a+ib g=re[[a+ib]*[c+id]])
  fc                                    (f: a+ib g x)
  fswap f/ fatan -frot conjg            (f: u=atan[x/g] a-ib)
  za z*>imag                            (f: u h=im[[a-ib]*[c+id]) 
  fasinh                                (f: u v=asinh[h])
r>frame ;

: zasinh  (f: z -- ln[z+sqrt[1+z^2] )  i* zasin -i* ;

: zatanh  (f: z -- u+iv=[ln[1+z]-ln[1-z]]/2 )
fpframe>r
  \ handle unsigned zero
  fover 1E fswap fcopysign              (f: x y beta=cps[1,x])
  fdup to-fa                            \ fa=beta
  z*f conjg                             (f: z=beta*conjg[z])

  zdup fabs sqrt[xmax]/4 f>
      sqrt[xmax]/4 f> or
  IF   \ avoid overflow
    pi/2 fover fcopysign -frot          (f: v=cps[pi/2,y] z)
    1/z real fswap                      (f: u=re[1/z] v)
  ELSE
    fdup fabs 4/sqrt[xmax] f+           (f: z a=|y|+rho)
    to-fb                               \ fb=a
    fover (f: x) 1E f=                  (f: z)
    IF   \ x=1
      fnip                              (f: y)
      fdup fdup f* 4E f+ fsqrt fsqrt    (f: y b=sqrt[sqrt[4+y^2]])
      fb fsqrt f/ fln fswap             (f: u=ln[b/sqrt[a]] y)
      fb f2/ fatan pi/2 f+              (f: u y d=pi/2+atan[a/2])
      fswap fcopysign f2/               (f: u v=cps[d,y]/2)
    ELSE \ normal case, use ln1p(u) accurately even if u is tiny
      fswap fdup fnegate 1E f+          (f: y x 1-x)
      fover 1E f+                       (f: y x 1-x 1+x)
      fover f* to-fc                    (f: y x 1-x) \ fc=g=[1-x^2]
      f^2 fb f^2                        (f: y x h=[1-x]^2 a^2)
      fdup to-fb                        \ fb=a^2
      f+ f/ 4E f* flnp1 4E f/           (f: y u=ln1p[4x/[h+a^2]]/4)
      fswap f2*   fc  fb                (f: u 2y g a^2)
      f- fswap arg f2/                  (f: u v=arg[g-a^2+i2y]/2)
    THEN
  THEN

  conjg fa z*f                         (f: u+iv=beta*[u-iv])
r>frame ;

: zatan  (f: z -- [ln[1+iz]-ln[1-iz]]/2i )  i* zatanh -i* ;


\ *** REFERENCES
(
1. William Kahan, "Branch cuts for complex elementary
   functions", The State of the Art in Numerical Analysis, A.
   Iserles and M.J.D. Powell, eds., Clarendon Press, Oxford,
   1987, pp. 165-211.

   http://www.cims.nyu.edu/~dbindel/class/cs279/slits02.pdf
   http://www.cims.nyu.edu/~dbindel/class/cs279/slits34.pdf
   http://www.cims.nyu.edu/~dbindel/class/cs279/slits56.pdf

2. Robert M. Corless, James H. Davenport, David J. Jeffrey,
   Stephen M. Watt, "'According to Abramowitz and Stegun' or
   arcoth needn't be uncouth", ACM SIGSAM Bulletin, June, 2000,
   pp. 58-65.

   http://portal.acm.org/citation.cfm?doid=362001.362023
   http://www.apmaths.uwo.ca/~djeffrey/Offprints/couth.pdf

3. M. Abramowitz and I. Stegun, Handbook of Mathematical
   Functions with Formulas, Graphs, and Mathematical Tables, US
   Government Printing Office, 10th Printing December 1972,
   Secs. 4.4, 4.6.

   http://www.convertit.com/Go/ConvertIt/Reference/AMS55.ASP
)