( Title: Convert between IEEE 754 32-bit or 64-bit floats and C99 mixed hex, decimal radix two power notation File: hexfloat.fs Version: 0.1.2 Author: David N. Williams License: LGPL Last revision: March 14, 2005 This code uses some words from Wil Baden's Tool Belt, which we believe to be in the public domain. For the sake of the LGPL, the rest is ) \ Copyright (C) 2003, 2005 David N. Williams ( This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or at your option any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. If you take advantage of the option in the LGPL to put a particular version of this library under the GPL, the author[s] would regard it as polite if you would put any direct modifications under the LGPL as well, and include this paragraph in your license notice. A direct modification is one that enhances or extends the library in line with its original concept, as opposed to developing a distinct application or library which might use it. Unattributed changes are by David N. Williams. The last revision date above may reflect cosmetic changes not logged here. Version 0.1.2 2Feb05 * Interchanged T2P1 and R2P1 in examples to agree with Kahan's notation. * Started DF>MIX. 3Feb05 * Finished DF>MIX, added SF>MIX, F>MIX. * Fixed bugs in MIX>FRAW32 and MIX>FRAW64 for converting zero. 6Feb05 * Renamed file from hex2float.fs to hexfloat.fs, since we convert both ways now. 2Mar05 * Changed "FRAW" to "RAW", because the float context is always clear, and we found ourselves forgetting to type the "F". * Added overflow to signed infinity and subnormal conversion with underflow to signed zero to MIX>RAW64 and MIX>RAW32. 10Mar05 * Fixed subnormal bugs in MIX>RAW64 and MIX>RAW32. * Added subnormal output to DF>MIX and SF>MIX. 11Mar05 * Added signed infinity, signed zero, and NaN output to DF>MIX and SF>MIX. * Started hexfloat-test.fs. 12Mar05 * Finished hexfloat-test.fs and fixed a few bugs it revealed. * Replaced [UNDEFINED] by standalone [UNDEF]. * Replaced generic " NaN" output in DF>MIX and SF>MIX by " NaN" and "-NaN". The sign of NaN is not portable, but is accessible in IEEE 754. Version 0.1.1 22Feb03 * Added single precision example. Version 0.1.0 25Jan03 * Start. 1Feb03 * Last revision. Notation: "s" or "something.s" in stack comments stands for an ANS Forth string [addr len] pair. This code has an environmental dependence on lower case. This code uses LOCALS| ... |. See MIX>FPARTS for a description of the mixed input format, and DF>MIX and SF>MIX for the mixed output format. Much of the conversion of mixed hex and radix two decimal power strings to floating point is independent of IEEE floating point formats, but code near the end assumes either IEEE-754 32-bit single or 64-bit double precision. We eventually assume that cells are 32 bits, and that the host floating point system is either totally big-endian or totally little-endian in memory, as accessed by DF! and DF@ or SF! and SF@. Some of the code near the end may depend on two's complement arithmetic. This code manipulates both floating-point numbers on the float stack and their raw representations on the data stack. Raw 64-bit representations on the data stack are ANS Forth double cell numbers, with the most significant 32 bits nearer the top, containing the sign bit. They can be displayed naturally by "HEX UD.". Upon inspection, we see no words with mixed data/float inputs, outputs, or intermediate calculations; so we expect the code to work the same way whether or not the float stack is separate from the data stack. This may not be true of the tests in hexfloat-test.fs. We have used this code as a tool for manipulating floating point constants for high-precision algorithms, and as a tool for checking floating point extensions to ANS Forth. There are a few examples of use commented out at the end, but the code has evolved beyond the examples, so they don't cover that much anymore. See hexfloat-test.fs for many syntax examples. ) \ *** USER WORDS ( raw64>df raw32>sf df>raw64 sf>raw32 mix>raw64 mix>raw32 mix>f mix>df mix>sf f>mix df>mix sf>mix ) \ *** GENERAL USE decimal s" [UNDEF]" pad c! pad char+ pad c@ move pad find nip 0= [IF] : [UNDEF] ( "name" -- flag ) ( Leave true if name is in the search order, else leave false. ) bl word find nip 0= ; immediate [THEN] [UNDEF] -rot [IF] : -rot rot rot ; [THEN] [UNDEF] 0>= [IF] : 0>= 0< 0= ; [THEN] [UNDEF] 0<= [IF] : 0<= 0> 0= ; [THEN] s" ADDRESS-UNIT-BITS" environment? drop 1 cells * constant bits/cell 1 bits/cell 1- lshift invert constant ~sign-mask : logical-dshift ( ud n -- ud' ) ( Do a left/right logical shift of ud when n is positive/negative. ) ( n) dup 0> IF ( n) 0 DO d2* LOOP ELSE ( n) abs 0 ?DO d2/ ~sign-mask and LOOP THEN ; : logical-sshift ( u n -- u' ) ( Do a left/right logical shift of u when n is positive/negative. ) ( n) dup 0> IF lshift ELSE negate rshift THEN ; : -spaces ( s -- s' ) ( Remove all blanks from the ANS Forth string s and leave the compacted string s'. The new string body address is at PAD. This works even if the input string s starts somewhere in PAD, but then it may get partially written over. ) pad >r BEGIN ( len) dup 0> WHILE over c@ bl <> IF over c@ r@ c! r> 1+ >r THEN 1 /string REPEAT 2drop pad r> pad - ; : {}in ( char &chars -- char|0 ) ( Leave char if it is in the null-terminated list of characters starting at &chars. Otherwise leave zero. ) ( &chars) >r BEGIN ( char) r@ c@ dup WHILE r> 1+ >r over = UNTIL ELSE 2drop false THEN r> drop ; : {}scan ( s &chars -- s' ) ( Leave the substring of s starting at the first occurrence of one of the characters in the null-terminated string starting at &chars. If no such character is found and s is not empty, s' is the empty string, with address just after the end of s. If s is empty, s' is s. This is a variation on SCAN from Wil Baden's Tool Belt. ) ( &chars) >r BEGIN ( len) dup WHILE over c@ r@ {}in 0= WHILE 1 /string REPEAT THEN r> drop ; : {}scan-past ( s &chars -- s' char|0 ) ( Scan s for the first occurrence of any of the characters in the null-terminated string starting at &chars. If the character is found, leave it with s' the part of s that follows it, or the empty string with address just after the character if it is found at the end of s. If s is empty, leave s and zero. If s is nonempty and the character is not found, leave zero and s' the empty string with address just after the end of s. ) {}scan dup IF over c@ >r 1 /string r> ELSE 0 THEN ; : /split ( a m a+i m-i -- a+i m-i a i ) ( Split the character string a m at the place given by a+i m-i. Wil Baden's "cut-split", from the Tool Belt. ) dup >r 2swap r> - ; : {}separate ( s &chars -- after.s before.s char | s false ) ( Scan s for the first occurrence of any of the null-terminated sequence of characters at &chars. If a character is found, leave it as char, and leave the strings after and before. If not found, leave zero instead and the original string. ) ( &chars) >r ( s) 2dup r> {}scan-past ( char|0) dup IF ( char) >r ( s after.s) /split 1- r> ELSE -rot 2drop THEN ; : skip ( addr len char -- addr+i len-i ) ( Advance past leading characters. From Wil Baden's Tool Belt. ) >r BEGIN dup WHILE over c@ r@ = WHILE 1 /string REPEAT THEN r> drop ; : skip-back ( addr len char -- addr len-i ) ( Trim trailing instances of char from the input string. Based on TRIM from Wil Baden's Tool Belt. ) ( char) >r BEGIN ( len) dup WHILE 1- 2dup + c@ r@ <> UNTIL 1+ THEN r> drop ; : scan ( addr len char -- addr+i len-i ) ( Look for a particular character in the input string. From Wil Baden's Tool Belt. ) >r BEGIN dup WHILE over c@ r@ - WHILE 1 /string REPEAT THEN r> drop ; \ *** GENERIC MIXED FLOATING POINT : minus? ( s -- s' flag ) ( If s is not empty and has a plus or minus sign as its first character, s' is s skipped past the first character. Otherwise s' is s. If the first character is a minus sign, flag is true. Flag is false if the first character is a plus sign, not a sign, or if s is empty. ) ( len) dup IF over c@ >r r@ [char] - = r@ [char] + = or IF 1 /string THEN r> [char] - = ELSE false THEN ; create p|P char p c, char P c, 0 c, : hex>ud ( s -- ud flag ) ( If the string contains only hexadecimal digits with no sign, or is empty, leave true and the unsigned double number conversion [0 if there are no digits]. Otherwise leave an undefined ud result and false. ) base @ >r hex 0 0 2swap >number ( ud rest.s) nip 0= r> base ! ; : sdec>n ( s -- n flag ) ( If the string consists of an optional leading plus or minus sign followed only by optional decimal digits, leave true and the signed single number conversion [0 if there are no digits]. Otherwise leave an undefined result and false. There is no check for overflow. ) base @ >r decimal minus? >r 0 0 2swap >number nip nip ( ud.lo len) 0= swap ( valid? ud.lo) r> ( minus?) IF negate THEN swap ( n valid?) r> base ! ; : -first-dot ( s -- s' #before ) ( Delete the first "." [dot] from s. If there is no dot, s' is the same as s, and #before is a copy of its length. If there is a dot, any characters following it are moved up one position, so s' has the same address as s and length reduced by one. Then #before is the number of characters before the dot in s. ) 2dup [char] . scan ( len-i) dup 0= IF 2drop dup ELSE ( s dot.addr 1+#after) 1- ( #after) >r ( dot.addr) dup 1+ swap ( after.addr dot.addr) r@ cmove ( s.len) 1- dup r> - THEN ; : bin-defect ( nonzero.nibble -- -#leading.bin.zeros ) 0 >r BEGIN 2* dup 16 and 0= WHILE r> 1+ >r REPEAT drop r> negate ; : mix>fparts ( s -- udmant #mantbits binexp fminus? ) ( The input string is a mixed hexadecimal number with radix two exponent written in decimal, in the style of the C99 standard but without a leading "0x". Blanks are allowed anywhere but are ignored. The hex part on the left and the power part on the right must be separated by "p" or "P". A plus or minus sign may precede either part, and the hex part can include at most one hex point. The absence of any digits in either part makes it zero. The double number udmant is the integer that results from the hex part by dropping the hex point, if present, and then dropping trailing hex zeros. It is considered to have an implicit binary point just to the right of its leading nonzero bit, called the "normalized position", unless it is zero. There is an abort if the hex mantissa is too large to fit into two cells. If the hex part is zero, including having no digits, udmant is zero. The number of binary digits in udmant, starting with the leading nonzero bit, is #mantbits. Although a nonzero udmant has no trailing zero hex digits, it may have up to three trailing zero bits. If udmant is zero, so is #mantbits. When udfrac is nonzero, binexp is the signed power of two required to move the implicit binary point to the correct position in nonexponential notation, taking into account both the power of two part and any hex point in the hex part. It is said to be normalized and is assumed to fit into a single cell. When udfrac is zero, binexp is zero. The flag fminus? determines the sign associated with the mantissa. That can be minus even when udmant is zero. ) -spaces minus? 0 0 0 0 locals| binexp #mantbits mant.hi mant.lo fminus? | p|P {}separate 0= ABORT" ***Missing 'p' or 'P'." ( exp.s hex.s) \ Remove leading 0's from the hex part. [char] 0 skip \ Remove any hex point and get #intdigs. Note that \ if the hex point is the first char, #intdigs is 0. -first-dot ( #intdigs) >r \ Remove trailing 0's from the hex part. [char] 0 skip-back \ Remove and negatively count any leading frac 0's \ revealed if the hex point was the first char. ( len) dup >r [char] 0 skip ( exp.s mant.s) ( len') dup r> ( len) - r> ( -#fracdigs #intdigs) \ one of these *has* to be zero + ( hexexp) 4 * to binexp \ maybe zero \ At this point the implicit binary point is just left of \ the leading hex digit, if there is one. ( #hexdigs=len) dup 4 cells > ABORT" ***Hex mantissa too large." dup 4 * to #mantbits \ includes leading hex digit \ Adjust for leading 0 bits in the first hex digit, and move \ the implicit binary point just right of the leading 1 bit. ( #hexdigs) dup IF over 1 ( first.dig.s) hex>ud 0= ABORT" ***Number not hexadecimal." ( hi) drop ( nibble) bin-defect dup #mantbits + to #mantbits binexp + 1- to binexp \ binary point moved 1 bit to right THEN ( mant.s) hex>ud 0= ABORT" ***Number not hexadecimal." ( udmant) to mant.hi to mant.lo ( exp.s) sdec>n 0= ABORT" ***Improper decimal power of 2." #mantbits IF ( power) binexp + to binexp ELSE ( power) drop 0 to binexp THEN mant.lo mant.hi #mantbits binexp fminus? ; \ *** IEEE-754 SPECIFIC WORDS ( From here on, we assume that a cell is 32 bits, and two's complement arithmetic. ) \ In the case of IEEE-754 doubles, these masks are only needed \ for the high 32 bits. decimal 1 31 lshift constant fsign-mask fsign-mask 1- constant fbody-mask 1 23 lshift 1- constant smantissa-mask 1 20 lshift 1- constant dmantissa-mask fbody-mask dmantissa-mask xor constant dexp-mask fbody-mask smantissa-mask xor constant sexp-mask hex 7F800000 constant Inf-raw32 7FF0000000000000. 2constant Inf-raw64 decimal : >24-bits ( udmant #bits -- mantissa.24) ( Assume that neither udmant nor #bits is zero, and that the leading nonzero bit of udmant and following bits occupy the rightmost #bits of udmant. Do a logical shift of udmant so that its leading nonzero bit and as many trailing zero bits as necessary occupy the rightmost 24 bits of a cell, and leave that as mantissa.24. If #bits is more than 24, udmant is truncated. ) ( #bits) dup 0= IF 2drop EXIT THEN over ( frac.hi) IF \ 2 cells ( #bits) [ bits/cell 24 + ] literal - negate logical-dshift nip ELSE \ 1 cell nip ( #bits) 24 - negate logical-sshift THEN ; : >53-bits ( udmant #bits -- dmantissa.53) ( Assume that neither udmant nor #bits is zero, and that the leading nonzero bit of udmant and following bits occupy the rightmost #bits of udmant. Do a logical shift of udmant so that its leading nonzero bit and as many trailing zero bits as necessary occupy the rightmost 53 bits of a double cell, and leave that as dmantissa.53. If #bits is more than 53, udmant is truncated. ) ( #bits) dup 0= IF 2drop EXIT THEN ( #bits) 53 - negate logical-dshift ; : mix>raw32 ( s -- ieee.32.raw ) ( Convert the string s from mixed hex with radix 2 decimal exponent floating point notation to raw IEEE-754 32-bit single precision floating point format, and truncate any mantissa bits that do not fit. Abort if the string is not legally formatted. If the number is zero, leave IEEE raw signed zero. If the number is nonzero and the normalized binary exponent is larger than +127, leave IEEE raw Inf with the appropriate sign. If the number is nonzero and the normalized binary exponent is less than -126, leave the corresponding IEEE raw subnormal number. Truncate any mantissa bits that do no fit, and leave zero with the appropriate sign when none fit. ) mix>fparts ( fminus?) IF fsign-mask ELSE 0 THEN locals| sign.bit exp #bdigs mant.hi | ( mant.lo) #bdigs IF ( mant.lo) mant.hi #bdigs >24-bits to mant.hi exp 127 > IF Inf-raw32 sign.bit or EXIT THEN exp 127 + to exp exp 0<= IF exp -23 < IF 0 ( hi) ELSE mant.hi 1 exp - rshift THEN to mant.hi 0 to exp THEN mant.hi smantissa-mask and exp 23 lshift or ELSE ( mant.lo) drop 0 THEN sign.bit or ; : mix>raw64 ( s -- ieee.64.raw ) ( Convert the string s from mixed hex with radix 2 decimal exponent floating point notation to raw IEEE-754 64-bit double precision floating point format, and truncate any mantissa bits that do not fit. Abort if the string is not legally formatted. If the number is zero, leave IEEE raw signed zero. If the number is nonzero and the normalized binary exponent is larger than +1023, leave IEEE raw Inf with the appropriate sign. If the number is nonzsero and the normalized binary exponent is less than -1022, leave the corresponding IEEE raw subnormal number. Truncate any mantissa bits that do no fit, and leave zero with the appropriate sign when none fit. ) mix>fparts ( fminus?) IF fsign-mask ELSE 0 THEN locals| sign.bit exp #bdigs mant.hi mant.lo | #bdigs IF mant.lo mant.hi #bdigs >53-bits to mant.hi to mant.lo exp 1023 > IF Inf-raw64 sign.bit or EXIT THEN exp 1023 + to exp exp 0<= IF exp -52 < IF 0 0 ( lo hi) \ zero ELSE mant.lo mant.hi exp 1- logical-dshift \ subnormal THEN to mant.hi to mant.lo 0 to exp THEN mant.lo mant.hi dmantissa-mask and exp 20 lshift or ELSE 0 0 THEN sign.bit or ; \ *** WORDS SPECIFIC TO MEMORY REPRESENTATION ( We assume here that the default for floats is not only IEEE-754 single or double, but that each is either big-endian or totally little-endian in memory. ) s" FLOATING-EXT" environment? [IF] ( flag) drop -1E pad dfaligned df! pad dfaligned c@ 0= constant LITTLE-ENDIAN immediate : raw32>sf ( raw.32 -- f: sf ) pad sfaligned ! pad sfaligned sf@ ; : sf>raw32 ( f: sf -- s: raw.32 ) pad sfaligned sf! pad sfaligned @ ; : raw64>df ( raw.64 -- f: df ) LITTLE-ENDIAN [IF] swap [THEN] pad dfaligned 2! pad dfaligned df@ ; : df>raw64 ( f: df -- s: raw.64 ) pad dfaligned df! pad dfaligned 2@ LITTLE-ENDIAN [IF] swap [THEN] ; : mix>sf ( mix.s -- f: sf ) mix>raw32 raw32>sf ; : mix>df ( mix.s -- f: df ) mix>raw64 raw64>df ; : mix>f ( mix.s -- f: f ) [ 1 floats 8 = ] [IF] mix>df [ELSE] mix>sf [THEN] ; : num>s ( n -- n.s ) ( Convert the number according to the current BASE to a Forth string in transient memory. Traditional. ) <# dup >r abs s>d #s r> sign #> ; ( Because char-counted strings are deprecated, we use "measured strings" [mstrings], which are strings with an implementation- defined count-field size. Our library defaults to cell-size. In stack comments, a $ suffix indicates the address of an mstring. ) : m$>s ( m$ -- s ) \ COUNT for cell count-field >r r@ cell+ ( addr) r> @ ( len) ; : s-m$+ ( s m$ -- ) ( Append the Forth string to the mstring. Transliteration of Wil Baden's APPEND. ) 2dup 2>r m$>s + swap move 2r> +! ; hex FFFF constant lo16-mask FFFF0000 constant hi16-mask decimal create mix$ 32 chars allot \ actually 30 is enough : 0mix$ ( -- ) 0 mix$ ! ; : mix$>s ( -- ) mix$ m$>s ; : mix+ ( s -- ) mix$ s-m$+ ; : bl+ ( -- ) s" " mix+ ; : num+ ( n -- ) num>s mix+ ; : fsign+ ( fraw.hi32 -- ) fsign-mask and IF s" -" ELSE s" " THEN mix+ ; : mix+>s ( s -- mix.s ) mix+ mix$>s ; : "Inf"+>s ( -- ) s" Inf" mix+>s ; : "0e0"+>s ( -- ) s" 0e0" mix+>s ; : "NaN"+>s ( -- ) s" NaN" mix+>s ; : digs+ ( +num #digs -- ) ( #digs) >r num>s r> over ( len) - dup 0< ABORT" ***Too many digits." ( #digs-len) 0 ?DO s" 0" mix+ LOOP ( num.s) mix+ ; : bl-4digs+ ( +4.digit.num -- ) bl+ 4 digs+ ; : raw64-num+ ( raw64 -- ) ( The main factor in DF>MIX. Assume raw64 to be normal or subnormal. Assume that HEX$ is already the sign of the mantissa, which is ignored here. Convert and append the mantissa and exponent to HEX$ as a mixed hex/decimal string using the specification of DF>MIX. This implementation converts NaN and infinity to something unspecified. It converts zero correctly, but DF>MIX does not use that. ) base @ >r dup dexp-mask and IF s" 1." mix+ ELSE s" 0." mix+ THEN hex \ mantissa after "1." or "0." \ single digit ( raw.hi32) dup dmantissa-mask and dup lo16-mask xor 16 rshift num+ \ remaining 4 digits of high mantissa ( raw.hi32) lo16-mask and bl-4digs+ \ high 4 digits of low mantissa swap ( mantissa.low32) dup hi16-mask and 16 rshift bl-4digs+ \ low 4 digits of low mantissa ( mantissa.low32) lo16-mask and bl-4digs+ decimal \ binary exponent s" p" mix+ ( raw.hi32) dexp-mask and 20 rshift dup 0= IF 1+ THEN \ adjust when subnormal 1023 - dup 0>= IF s" +" mix+ THEN num+ r> base ! ; : df>mix ( f: df -- s: dmix.s ) ( When df is normal or subnormal, convert it to a mixed hex/decimal string in the form: normal: <" "|"-">1.h hhhh hhhh hhhh p<"+"|"-">d...d subnormal: <" "|"-">0.h hhhh hhhh hhhh p-1022 The point of the format is to display the mantissa with hex digits that correspond directly to the raw bits. Otherwise leave the appropriate string " Inf", "-Inf", " 0e0", "-0e0", " NaN", or "-NaN" ignoring the arbitrary bits of a raw NaN, except for the sign. ) 0mix$ df>raw64 dup fsign+ fbody-mask and Inf-raw64 2over d< IF 2drop "NaN"+>s EXIT THEN 2dup Inf-raw64 d= IF 2drop "Inf"+>s EXIT THEN 2dup d0= IF 2drop "0e0"+>s EXIT THEN raw64-num+ mix$>s ; : raw32-num+ ( raw32 -- ) ( The main factor in SF>MIX. Assume raw32 to be normal or subnormal. Assume that HEX$ is already the sign of the mantissa, which is ignored here. Convert and append the mantissa and exponent to HEX$ as a mixed hex/decimal string using the specification of SF>MIX. This implementation converts NaN and infinity to something unspecified. It converts zero correctly, but SF>MIX does not use that. ) base @ >r s" 0." mix+ dup sexp-mask and >r hex ( raw) dup smantissa-mask and \ dup lo16-mask xor 16 rshift r@ ( sexp) IF 128 ( 0x80) or THEN 2 digs+ ( raw) lo16-mask and bl-4digs+ decimal s" p" mix+ r> ( sexp) dup 0= IF ( 0) drop 1 ELSE 23 rshift THEN 126 - dup 0>= IF s" +" mix+ THEN num+ r> base ! ; : sf>mix ( f: sf -- s: smix.s ) ( When sf is normal or subnormal, convert it to a mixed hex/decimal string in the form: normal: <" "|"-">0.hh hhhh p<"+"|"-">d...d subnormal: <" "|"-">0.hh hhhh p-125 The leading bit of the leading "h" is one for normal numbers. The point of the format is to display the mantissa with hex digits that correspond directly to the raw bits. Otherwise leave the appropriate string " Inf", "-Inf", " 0e0", "-0e0", " NaN", or "-NaN", ignoring the arbitrary bits of a raw NaN except for the sign. ) 0mix$ sf>raw32 dup fsign+ fbody-mask and Inf-raw32 over < IF drop "NaN"+>s EXIT THEN dup Inf-raw32 = IF drop "Inf"+>s EXIT THEN dup 0= IF drop "0e0"+>s EXIT THEN raw32-num+ mix$>s ; : f>mix ( f: f -- mix.s ) [ 1 floats 8 = ] [IF] df>mix [ELSE] sf>mix [THEN] ; 0 [IF] \ EXAMPLES: turn this on to see them \ *** DOUBLE PRECISION EXAMPLE ( Some high accuracy algorithms decompose certain constants into two floats, where the first contains all the higher bits that will fit without rounding, and the second contains as many of the next lower bits as will fit, with rounding possible in the last bit. We do that here for 1+sqrt[2] in IEEE 64-bit format. Mathematica gives: In[1]:= BaseForm[N[1+Sqrt[2], 40], 16] Out[1]//BaseForm= 2.6a09e667f3bcc908b2fb1366ea957d3e 16 We need only the first 3 bits of the last hex digit c in the first line below to get 53 unrounded bits for the leading part. Since the last bit of c is 0, this split works: 2.6a09e667f3bcc + .0000000000000908b2fb1366ea957d3e ------------------------------------- 2.6a09e667f3bcc908b2fb1366ea957d3e ) s" 2.6a09 e667 f3bc c p0" mix>df s" 0.908b 2fb1 366e b p-52" mix>df \ rounded to nearest even \ All of the code above could be thrown away at this point with \ a marker, if we only care about converted floats on the stack. dfalign here 1 dfloats allot constant t2p1d t2p1d df! dfalign here 1 dfloats allot constant r2p1d r2p1d df! \ Depending on how F. is implemented, the following may not work \ as intended. 39 set-precision cr 1 spaces r2p1d df@ f. cr .( +) t2p1d df@ f. cr .( ------------------------------------------) cr .( 2.414213562373095048801688724209698078570) cr .( The last number is 1 + sqrt[2] to 40 significant figures.) cr .( Adding the first two should be correct to about 32 figures.) cr hex cr .( raw r2p1d: ) r2p1d df@ df>raw64 ud. cr .( should be: 4003504F333F9DE6) cr .( raw t2p1d: ) t2p1d df@ df>raw64 ud. cr .( should be: 3CA21165F626CDD6) cr decimal \ *** SINGLE PRECISION EXAMPLE ( We decompose 1+sqrt[2] in IEEE 32-bit format. Now we need 24 unrounded bits for the leading part: 2.6a09e4 .00000267f3bcc ------------------ 2.6a09e667f3bcc ) s" 2.6a09 e4 p0" mix>sf s" 0.267f 3bcc p-20" mix>sf sfalign here 1 sfloats allot constant t2p1s t2p1s sf! sfalign here 1 sfloats allot constant r2p1s r2p1s sf! hex cr .( raw r2p1s: ) r2p1s sf@ sf>raw32 u. cr .( should be: 401A8279) cr .( raw t2p1s: ) t2p1s sf@ sf>raw32 u. cr .( should be: 3419FCEF) cr decimal [THEN] \ END EXAMPLES [ELSE] .( ***Floating point not available.) [THEN]