; vm-osxppc.s ; ; The kForth Virtual Machine ; Mac OS X 32-bit ppc architecture ; ; Any part of this file that is not a derivation of the kForth ; virtual machine source code is ; ; Copyright (c) 2006 David N. Williams ; ; The original kForth virtual machine source code is ; ; Copyright (c) 1998--2004, 2006 Krishna Myneni, ; Creative Consulting for Research and Education ; ; The software in this file is provided under the terms of the ; GNU General Public License and also under the terms of the GNU ; Lesser General Purpose License. Anyone who modifies this code ; is requested to follow the same dual license policy. ; Usage from C++ ; ; extern "C" int vm (byte* ip); ; ecode = vm(ip); ; INITIAL HISTORY ; Originally written for the A386 assembler by ; Krishna Myneni. ; 08-25-1998 Start. ; 08-15-1998 Ported to GNU assembler under Linux. ; ... ; 09-28-2005 Last Linux '386 revision. ; MAC OS X REVISIONS ; 02-07-2006 Port started by dnw. ; 02-15-2006 Started the nonregister ppc version in the ; same file as the i386 version, with a switch ; to select between them. The i386 version ; compiled under OS X gcc 4.0.1, but no i386 ; host was available for testing. ; 03-03-2006 Split the ppc part into two versions, ; nonregister and register, to be developed in ; parallel. ; 03-04-2006 Extracted i386 version into a separate file. ; 04-08-2006 Terminated parallel development of ; nonregister and register versions, retiring the ; nonregister code. Essentially everything except ; floating point was coded, and passed a suite of ; Hayes-style tests. Started this file. ; 04-12-2006 All words coded and tested except L_ftod. ; 04-15-2006 Fixed L_wfetch to do a signed load. ; 04-16-2006 Added missing L_dfloats and L_dfloatplus. ; 04-17-2006 Replaced "48" by minFrameSz, which is 32, ; and reduced vmFrameSz from 96 to 64, because ; there are no "params". New macros _DLT and ; _QDNEG. Added L_ftod. ; 04-19-2006 Added C_syscall. ; 04-20-2006 Fixed erroneous statement about r11 in SCRATCH ; REGISTERS and removed others that were holdovers ; from the nonregs version. ; 06-6-2006 Added L_move and revised L_cmove and ; L_cmovefrom to use persistent regs. ; REFERENCES ; AG = "Introduction to Mac OS X Assembler Guide", Apple ; Computer, Inc., Last updated 2005-04-29. ; FCG = "Mac OS X ABI Function Call Guide", Apple Computer, ; Inc., Last updated 2005-10-31. ; CWG = "The PowerPc Compiler Writer's Guide", eds. Steve Hoxey, ; Faraydon Karim, Bill Hay, Hank Warren, IBM Microelectronics ; Division, 1996. ; STACKS ; The four stacks build from their bottom towards lower memory. ; The stack pointer addresses the next available location, one ; slot lower in memory than the top item. Preincrement the ; stack pointer to pop an item, and postdecrement it to push an ; item. ; The virtual machine works within a return stack frame, with ; bottom marked by vmEntryRp. In comments: ; dsp = data stack pointer ; dtsp = data type stack pointer ; rsp = return stack pointer ; rtsp = return type stack pointer ; bdsp = bottom data stack pointer ; bdtsp = bottom data type stack pointer ; brsp = bottom return stack pointer ; brtsp = bottom return type stack pointer ; rsfp = return stack frame pointer, bottom of current frame ; ip = instruction pointer ; The above are to be taken as the values upon entry to a word. ; Adding a prime, as in dsp', means the value upon exit from the ; word. Also in comments: ; S1 = (dsp+WSIZE), S2 = (dsp+2*WSIZE), S3 = ... ; T1 = (dtsp+1), T2 = (dtsp+2), ... ; with a similar convention for primed versus unprimed versions. ; For example, S1' = (dsp'+WSIZE). Occasionally we use ; S0 = (dsp), S[-1] = (dsp - WSIZE), S[-2] = ... ; and analogues, for the next available slots above the top of ; the stacks. ; The analogous notation for return and return type stack items ; uses R instead of S and RT instead of T. ; SCRATCH REGISTERS ; In the OS X ABI, r2 is volatile and available. Register r3 is ; volatile and used to pass arguments and return 4-byte results ; of function calls. For returns, it plays the role of %eax in ; the Intel ABI. Registers r4-r12 are volatile and used for ; passing arguments. They are available, but we generally don't ; uses r12. Also volatile and available is r0, modulo the cases ; where ppc opcodes treat it as zero. ; We use r10 to save and restore the link register for calls ; from vm words to our own subroutines, and r11 for the second ; level of nesting. That is, our own subroutines must avoid ; clobbering r10, and when they are not leaf subroutines they do ; clobber r11. Leaf subroutines are those that don't call a ; subroutine, or the logical equivalent, as for example in a ; tail recursion. ; For forming logical predicates involving integers, we use ; condition registers cr0, cr2, cr3, and cr4. ; ERROR RETURNS ; Words that end with NEXT don't need r3 = err. Those that end ; with blr do. ; PERSISTENT PPC REGISTERS ; We do not declare global register variables in the C code ; because Apple's gcc 4.0.1 still doesn't get them right. But ; we do use registers from the list r13-r31 declared nonvolatile ; in the OS X ABI, and they do persist within this module. They ; must be seen as unchanged by code that calls into this module ; from outside. ; The persistent registers listed below are initialized in vm() ; in this module. The simplest policy, which we follow here, is ; that words that can be called from outside the virtual machine ; should avoid using them and go directly to their memory ; counterparts, in case the registers haven't been initialized. ; The values in the registers defined here are all addresses, ; either of the jump table or of global memory variables defined ; in the .comm section at the end of vm.s. ; Because of the following #define's, we build this module via ; cc ${CFLAGS} -o vm-osxppc.o vm-osxppc.s ; and not ; as -o vm-osxppc.o vm-osxppc.s ; so we can use the C preprocessor. As far as we know, Apple's ; gnu assembler doesn't have an equivalent. #define JumpTableA r13 #define GlobalIpA r14 #define GlobalSpA r15 #define GlobalTpA r16 #define GlobalRpA r17 #define GlobalRtpA r18 ; r13-r18, vmEntryRp and GlobalIp: 24+4+4 .set vmSavedDataSz, 32 ; linkage+params+saved, 16-byte aligned: 24+0+32+8 .set vmFrameSz, 64 ; linkage+params+saved, 16-byte aligned: 24+0+0+8 .set minFrameSz, 32 ; C++ INTERFACE .set WSIZE, 4 .set OP_ADDR, 65 .set OP_FVAL, 70 .set OP_IVAL, 73 .set OP_RET, 238 ; Error Codes .set E_NOT_ADDR, 1 .set E_DIV_ZERO, 4 .set E_RET_STK_CORRUPT, 5 .set E_UNKNOWN_OP, 6 .set E_QUIT, 8 .set E_DIV_OVERFLOW, 20 .data FCONST_180_by_pi: .double 0.572957795130823208769e2 FCONST_pi_by_180: .double 0.174532925199432957692e-1 FCONST_sconv: .long 0x43300000, 0x80000000 FCONST_dconv_lo: .long 0x43300000, 0 FCONST_dconv_hi: .long 0x45300000, 0 FCONST_maxftod: .long 0x43dfffff, 0xffffffff ; 0x1.f ffff ffff ffff p62 .comm _GlobalSp,4 .comm _GlobalTp,4 .comm _GlobalIp,4 .comm _GlobalRp,4 .comm _GlobalRtp,4 .comm _BottomOfStack,4 .comm _BottomOfReturnStack,4 .comm _BottomOfTypeStack,4 .comm _BottomOfReturnTypeStack,4 .comm _vmEntryRp,4 .comm _Base,4 .comm _State,4 .comm _pTIB,4 .comm _TIB,256 .comm _WordBuf,256 .comm _NumberCount,4 .comm _NumberBuf,256 _JumpTable: .long _L_false, _L_true, _L_cells, _L_cellplus ; 0 -- 3 .long _L_dfloats, _L_dfloatplus, _CPP_case, _CPP_endcase ; 4 -- 7 .long _CPP_of, _CPP_endof, _C_open, _C_lseek ; 8 -- 11 .long _C_close, _C_read, _C_write, _C_ioctl ; 12 -- 15 .long _L_usleep, _L_ms, _C_msfetch, _C_syscall ; 16 -- 19 .long _L_fill, _L_cmove, _L_cmovefrom, _CPP_dotparen ; 20 -- 23 .long _CPP_bracketsharp, _CPP_tofile, _CPP_console, _CPP_sharpbracket ; 24 -- 27 .long _CPP_sharps, _CPP_squote, _CPP_cr, _L_bl ; 28 -- 31 .long _CPP_spaces, _L_store, _CPP_cquote, _CPP_sharp ; 32 -- 35 .long _CPP_sign, _L_mod, _L_and, _L_tick ; 36 -- 39 .long _CPP_lparen, _CPP_hold, _L_mul, _L_add ; 40 -- 43 .long _L_nop, _L_sub, _CPP_dot, _L_div ; 44 -- 47 .long _L_dabs, _L_dnegate, _L_umstar, _L_umslashmod ; 48 -- 51 .long _L_mstar, _L_mplus, _L_mslash, _L_mstarslash ; 52 -- 55 .long _L_fmslashmod, _L_smslashrem, _CPP_colon, _CPP_semicolon ; 56 -- 59 .long _L_lt, _L_eq, _L_gt, _L_question ; 60 -- 63 .long _L_fetch, _L_addr, _L_base, _L_call ; 64 -- 67 .long _L_definition, _L_erase, _L_fval, _CPP_forget ; 68 -- 71 .long _L_tobody, _L_ival, _CPP_evaluate, _C_key ; 72 -- 75 .long _L_lshift, _L_slashmod, _C_numberquery, _CPP_dotr ; 76 -- 79 .long _CPP_ddot, _C_keyquery, _L_rshift, _CPP_dots ; 80 -- 83 .long _C_accept, _CPP_char, _CPP_bracketchar, _CPP_word ; 84 -- 87 .long _L_starslash, _L_starslashmod, _CPP_udotr, _CPP_lbracket ; 88 -- 91 .long _CPP_backslash, _CPP_rbracket, _L_xor, _CPP_literal ; 92 -- 95 .long _CPP_queryallot, _CPP_allot, _L_binary, _L_count ; 96 -- 99 .long _L_decimal, _CPP_emit, _CPP_fdot, _CPP_cold ; 100 -- 103 .long _L_hex, _L_i, _L_j, _CPP_brackettick ; 104 -- 107 .long _CPP_fvariable, _C_timeanddate, _CPP_find, _CPP_constant ; 108 -- 111 .long _CPP_immediate, _CPP_fconstant, _CPP_create, _CPP_dotquote ; 112 -- 115 .long _CPP_type, _CPP_udot, _CPP_variable, _CPP_words ; 116 -- 119 .long _CPP_does, _C_system, _C_chdir, _C_search ; 120 -- 123 .long _L_or, _C_compare, _L_not, _L_move ; 124 -- 127 .long _L_fsin, _L_fcos, _C_ftan, _C_fasin ; 128 -- 131 .long _C_facos, _C_fatan, _C_fexp, _C_fln ; 132 -- 135 .long _C_flog, _L_fatan2, _L_ftrunc, _L_ftrunctos ; 136 -- 139 .long _C_fmin, _C_fmax, _L_floor, _L_fround ; 140 -- 143 .long _L_dlt, _L_dzeroeq, _L_deq, _L_twopush ; 144 -- 147 .long _L_twopop, _L_tworfetch, _L_stod, _L_stof ; 148 -- 151 .long _L_dtof, _L_froundtos, _L_ftod, _L_degtorad ; 152 -- 155 .long _L_radtodeg, _L_dplus, _L_dminus, _L_dult ; 156 -- 159 .long _L_inc, _L_dec, _L_abs, _L_neg ; 160 -- 163 .long _L_min, _L_max, _L_twostar, _L_twodiv ; 164 -- 167 .long _L_twoplus, _L_twominus, _L_cfetch, _L_cstore ; 168 -- 171 .long _L_wfetch, _L_wstore, _L_dffetch, _L_dfstore ; 172 -- 175 .long _L_sffetch, _L_sfstore, _L_spfetch, _L_plusstore ; 176 -- 179 .long _L_fadd, _L_fsub, _L_fmul, _L_fdiv ; 180 -- 183 .long _L_fabs, _L_fneg, _C_fpow, _L_fsqrt ; 184 -- 187 .long _CPP_spstore, _CPP_rpstore, _L_feq, _L_fne ; 188 -- 191 .long _L_flt, _L_fgt, _L_fle, _L_fge ; 192 -- 195 .long _L_fzeroeq, _L_fzerolt, _L_fzerogt, _L_nop ; 196 -- 199 .long _L_drop, _L_dup, _L_swap, _L_over ; 200 -- 203 .long _L_rot, _L_minusrot, _L_nip, _L_tuck ; 204 -- 207 .long _L_pick, _L_roll, _L_2drop, _L_2dup ; 208 -- 211 .long _L_2swap, _L_2over, _L_2rot, _L_depth ; 212 -- 215 .long _L_querydup, _CPP_if, _CPP_else, _CPP_then ; 216 -- 219 .long _L_push, _L_pop, _L_puship, _L_rfetch ; 220 -- 223 .long _L_rpfetch, _L_afetch, _CPP_do, _CPP_leave ; 224 -- 227 .long _CPP_querydo, _CPP_abortquote, _L_jz, _L_jnz ; 228 -- 231 .long _L_jmp, _L_loop, _L_plusloop, _L_unloop ; 232 -- 235 .long _L_execute, _CPP_recurse, _L_ret, _L_abort ; 236 -- 239 .long _L_quit, _L_ge, _L_le, _L_ne ; 240 -- 243 .long _L_zeroeq, _L_zerone, _L_zerolt, _L_zerogt ; 244 -- 247 .long _L_ult, _L_ugt, _CPP_begin, _CPP_while ; 248 -- 251 .long _CPP_repeat, _CPP_until, _CPP_again, _CPP_bye ; 252 -- 255 .long _L_utmslash, _L_utsslashmod, _L_stsslashrem, _L_udmstar ; 256 -- 259 .long _CPP_include, _CPP_source, _CPP_refill, _CPP_nondeferred ; 260--263 .long _CPP_state, _CPP_allocate, _CPP_free, _L_nop ; 264--267 .long _L_cputest, _L_dsstar, _CPP_compilecomma, _L_nop ; 268--271 .long _CPP_postpone, _L_nop, _L_nop, _C_forth_signal ; 272--275 .long _C_raise, _C_setitimer, _C_getitimer, _L_nop ; 276--279 .long _L_nop, _L_fsincos, _C_facosh, _C_fasinh ; 280--283 .long _C_fatanh, _C_fcosh, _C_fsinh, _C_ftanh ; 284--287 .long _L_dmax,_L_dmin,_L_nop,_L_nop ; 288--291 ; The following C library function pointers are a substitute for ; the more involved function call protocol in the Apple PPC ABI. ; The basic point is that the linker wants external labels to be ; in the data section so they can be patched in. We're not sure ; this is kosher, but it works with our system (dnw). kf_usleep: .long _usleep kf_lround: .long _lround kf_round: .long _round kf_trunc: .long _trunc kf_floor: .long _floor kf_sqrt: .long _sqrt kf_cos: .long _cos kf_sin: .long _sin kf_atan2: .long _atan2 .text .align 4 ; It is legal to call any function in this .globl list from ; outside the virtual machine. They must end with blr, not ; NEXT. They must also be coded to avoid using any persistent ; registers, because those are initialized only by vm(). .globl _JumpTable .globl _L_depth, _L_tick, _L_quit, _L_abort, _L_ret, _L_dabs .globl _L_dminus, _L_mstarslash, _L_udmstar, _L_utmslash ; PERSISTENT REGISTER MACROS ; The following macros act on the JumpTable, the individual ; stacks or the instruction pointer. In the comments, xp and ; xp' are notations for any of dsp and dsp', etc., and GlobalXp ; stands for any of GlobalSp, etc. ; This section contains all macros that cannot be included from ; an external file. They cannot be included because the C ; preprocessor won't find the persistent register names ; #defined'd above behind the assembly language .include ; directive. ; Get the code address from the JumpTable into register $0, ; corresponding to the opcode in register $1. .macro GET_CODE_ADDR slwi $1,$1,2 ; $1 = 4*opcode lwzx $0,JumpTableA,$1 .endmacro ; Push r0 onto the stack. Leave r2 = xp'. .macro PUSH_D lwz r2,0(GlobalSpA) ; r2 = dsp stw r0,0(r2) ; S0 = r0 subi r0,r2,WSIZE stw r0,0(GlobalSpA) ; dsp' = dsp - WSIZE .endmacro .macro PUSH_DT lwz r2,0(GlobalTpA) ; r2 = dtsp stb r0,0(r2) ; T0 = r0 subi r2,r2,1 stw r2,0(GlobalTpA) ; dtsp' = dtsp - 1 .endmacro .macro PUSH_R lwz r2,0(GlobalRpA) ; r2 = rsp stw r0,0(r2) ; R0 = r0 subi r2,r2,WSIZE stw r2,0(GlobalRpA) ; rsp' = rsp - WSIZE .endmacro .macro PUSH_RT lwz r2,0(GlobalRtpA) ; r2 = rtsp stb r0,0(r2) ; RT0 = r0 subi r2,r2,1 stw r2,0(GlobalRtpA); rtsp' = rtsp - 1 .endmacro ; Decrement xp. Leave r2 = xp'. .macro DEC_DSP lwz r2,0(GlobalSpA) ; r2 = dsp subi r2,r2,WSIZE stw r2,0(GlobalSpA) ; dsp' = dsp - WSIZE .endmacro .macro DEC_DTSP lwz r2,0(GlobalTpA) ; r2 = dtsp subi r2,r2,1 stw r2,0(GlobalTpA) ; dtsp' = dtsp - 1 .endmacro ; Pop stack into register $0. Leave r2 = xp'. .macro POP_D lwz r2,0(GlobalSpA) ; r2 = dsp lwzu $0,WSIZE(r2) ; $0 = S1, r2 = dsp' = dsp + WSIZE stw r2,0(GlobalSpA) .endmacro .macro POP_DT lwz r2,0(GlobalTpA) ; r2 = dtsp lbzu $0,1(r2) ; $0 = T1, r2 = rsp' = rsp + 1 stw r2,0(GlobalTpA) .endmacro .macro POP_R lwz r2,0(GlobalRpA) ; r2 = rsp lwzu $0,WSIZE(r2) ; $0 = $1, r2 = rsp' = rsp + WSIZE stw r2,0(GlobalRpA) .endmacro .macro POP_RT lwz r2,0(GlobalRtpA) ; r2 = rtsp lbzu $0,1(r2) ; $0 = RT1, r2 = rtsp' = rtsp + 1 stw r2,0(GlobalRtpA) .endmacro ; Increment xp. Leave r2 = xp'. .macro INC_DSP lwz r2,0(GlobalSpA) ; r2 = dsp addi r2,r2,WSIZE stw r2,0(GlobalSpA) ; dsp' = dsp + WSIZE .endmacro .macro INC_DTSP lwz r2,0(GlobalTpA) ; r2 = dtsp addi r2,r2,1 stw r2,0(GlobalTpA) ; dtsp' = dtsp + 1 .endmacro ; Get xp into register $0. .macro GET_DSP lwz $0,0(GlobalSpA) .endmacro .macro GET_DTSP lwz $0,0(GlobalTpA) .endmacro .macro GET_RSP lwz $0,0(GlobalRpA) .endmacro .macro GET_RTSP lwz $0,0(GlobalRtpA) .endmacro .macro GET_IP lwz $0,0(GlobalIpA) .endmacro ; Put $0 into _GlobalXp. .macro PUT_DSP stw $0,0(GlobalSpA) .endmacro .macro PUT_DTSP stw $0,0(GlobalTpA) .endmacro .macro PUT_RSP stw $0,0(GlobalRpA) .endmacro .macro PUT_RTSP stw $0,0(GlobalRtpA) .endmacro .macro PUT_IP stw $0,0(GlobalIpA) .endmacro ; The next set of macros is used exclusively by _vm. ; Save the persistent registers used by the virtual machine. .macro SAVE_VM_REGS stw JumpTableA,-24(r1) stw GlobalIpA,-20(r1) stw GlobalSpA,-16(r1) stw GlobalTpA,-12(r1) stw GlobalRpA,-8(r1) stw GlobalRtpA,-4(r1) .endmacro ; Initialize the virtual machine persistent registers. .macro INIT_VM_REGS LDA JumpTableA,_JumpTable LDA GlobalIpA,_GlobalIp LDA GlobalRpA,_GlobalRp LDA GlobalSpA,_GlobalSp LDA GlobalTpA,_GlobalTp LDA GlobalRtpA,_GlobalRtp .endmacro ; Upon entry to _vm, the ip argument of vm(ip), is in r3. This ; macro saves the old ip in the entry stack frame and sets the ; new ip to the argument. Of course it must be used before r3 ; is clobbered. .macro SET_VM_ENTRY_IP lwz r0,0(GlobalIpA) stw r3,0(GlobalIpA) ; new ip = vm() arg stw r0,-vmSavedDataSz(r1) ; save old ip .endmacro ; Restore the persistent registers used by the virtual machine. .macro RESTORE_VM_REGS lwz JumpTableA,-24(r1) lwz GlobalIpA,-20(r1) lwz GlobalSpA,-16(r1) lwz GlobalTpA,-12(r1) lwz GlobalRpA,-8(r1) lwz GlobalRtpA,-4(r1) .endmacro ; GENERIC PPC MACROS ; Load data address $1 into register $0. .macro LDA addis $0,0,hi16($1) ori $0,$0,lo16($1) .endmacro ; Load data word at address $2 into register $0, using $1 as ; half-address register. .macro LDWZ addis $1,0,ha16($2) lwz $0,lo16($2)($1) .endmacro ; Load data word at address $2 into register $0, and leave ; address in register $1. .macro LDWZU addis $1,0,ha16($2) lwzu $0,lo16($2)($1) .endmacro ; Load data byte at address $2 into register $0, and leave ; address in register $1. .macro LDBZU addis $1,0,ha16($2) lbzu $0,lo16($2)($1) .endmacro ; Store data word in register $0 into address $2, using ; $1 as half-address register. .macro STDW addis $1,0,ha16($2) stw $0,lo16($2)($1) .endmacro ; Store data word in register $0 into address $2, and leave ; address in register $1. .macro STDWU addis $1,0,ha16($2) stwu $0,lo16($2)($1) .endmacro ; KFORTH MACROS .macro NEXT GET_IP r2 ; r2 = ip lbzu r3,1(r2) ; r3 = (ip') = opcode PUT_IP r2 GET_CODE_ADDR r2,r3 mtctr r2 bctr ; call the word .endmacro ; Let registers $0:$1 = d1.hi:d1.lo and $2:$3 = d2.hi:d2.lo. ; Leave $0:$1 = (d1-d2).hi:(d1-d2).lo. .macro DMINUS subfc $1,$3,$1 ; $1 = (d1-d2).lo subfe $0,$2,$0 ; $0 = (d1-d2).hi .endmacro ; Let register $0 = n. Leave |n| in register $0, using register ; $1 for intermediate calculation. $1 must be different from ; $0. Leaves $1 = (n < 0) ? -1 : 0. The algorithm is from CWG. .macro _ABS srawi $1,$0,31 ; $1 = (n < 0) ? -1 : 0 xor $0,$1,$0 ; $0 = (n < 0) ? ~n : n subf $0,$1,$0 ; $0 = |n| .endmacro ; Let $0:$1 = d.hi:d.lo. Leave $0:$1 = |d|.hi:|d|.lo, using $2 ; for intermediate calculation. Leaves $2 = (d < 0) ? 1 : 0. .macro _DABS srawi $2,$0,31 ; $2 = (d < 0) ? -1 : 0 xor $0,$2,$0 ; $0 = (d < 0) ? ~d.hi : d.hi xor $1,$2,$1 ; $1 = (d < 0) ? ~d.lo : d.lo neg $2,$2 ; $2 = (d < 0) ? 1 : 0 addc $1,$2,$1 addze $0,$0 .endmacro ; Let $0:$1 = d.hi:d.lo. Leave $0:$1 = (-d).hi:(-d).lo. .macro _DNEG nor $0,$0,$0 nor $1,$1,$1 addic $1,$1,1 addze $0,$0 .endmacro ; Let $0:$1 = d.hi:d.lo, where d>=0. Use register $2 for ; scratch. Leave $0:$1 = cr0.LT ? -d : d. .macro _QDNEG mfcr $2 srawi $2,$2,31 ; $2 = cr0.LT ? -1 : 0 xor $0,$2,$0 ; $0 = cr0.LT ? ~d.hi : d.hi xor $1,$2,$1 ; $1 = cr0.LT ? ~d.lo : d.lo neg $2,$2 ; $2 = cr0.LT ? 1 : 0 addc $1,$2,$1 addze $0,$0 .endmacro ; Let $0:$1:$2 = t.hi:t.mi:t.lo. Leave $0:$1:$2 = ; (-t).hi:(-t).mi:(-t).lo. .macro _TNEG nor $0,$0,$0 nor $1,$1,$1 nor $2,$2,$2 addic $2,$2,1 addze $1,$1 addze $0,$0 .endmacro ; Let $0:$1 = d1.hi:d1.lo and $2:$3 = d2.hi:d2.lo. Leave ; cr0.LT = (d1 < d2) = (d1.hi < d2.hi) ; | [(d1.hi = d2.hi) & (d1.lo u< d2.lo)] ; Also uses cr2. .macro _DLT cmp cr0,$0,$2 ; cr0.LT = (d1.hi < d2.hi), ; cr0.EQ = (d1.hi = d2.hi) cmpl cr2,$1,$3 ; cr2.LT = (d1.lo u< d2.lo) crand 8,2,8 ; cr2.LT = (d1.hi = d2.hi) ; & (d1.lo u< d2.lo) cror 0,0,8 ; cr0.LT = predicate .endmacro ; Let $0 = d.hi, $1 = d.lo, $2 = s, where d and s are signed, using ; $3 and $4 for intermediate calculation. ; Form the triple product t with $0:$1:$2 = t.hi:t.mi:t.lo. ; Uses r0, and cr2. .macro _DSSTAR cmpwi $0,0 ; cr0.LT = (d < 0) cmpi cr2,0,$2,0 ; cr2.LT = (s < 0) crxor 8,0,8 ; cr2.LT = cr0.LT xor cr2.LT _ABS $2,$3 ; $2 = |s|, $3 = ??? _DABS $0,$1,$3 ; $0:$1 = |d|.hi:|d|.lo ; let x, y, z = |d|.lo, |d|.hi, |s| ; $0:$1:$2 = y:x:z ; get $2:$1:$0 = xz.lo : xz.hi + yz.lo : yz.hi + CA mullw $3,$0,$2 ; $3 = yz.lo mulhwu $0,$0,$2 ; $0 = yz.hi mulhwu $4,$1,$2 ; $4 = xz.hi mullw $2,$1,$2 ; $2 = xz.lo addc $1,$4,$3 ; $1 = xz.hi + yz.lo, CA addze $0,$0 ; $0 = yz.hi + CA bge cr2,0f _TNEG $0,$1,$2 0: .endmacro ; VIRTUAL MACHINE .globl _vm _vm: ; int vm(byte*) ; initialize vm mflr r0 stw r0,8(r1) SAVE_VM_REGS INIT_VM_REGS SET_VM_ENTRY_IP ; save old ip, new ip = r3 = arg GET_RSP r11 LDWZU r0,r2,_vmEntryRp ; r0 = old rsfp, r2 = &_vmEntryRp stw r0,(-vmSavedDataSz+4)(r1) ; save old rsfp stw r11,0(r2) ; new rsfp = rsp stwu r1,-vmFrameSz(r1) ; vm next loop 0: GET_IP r2 lbz r2,0(r2) ; r2 = (ip) = opcode GET_CODE_ADDR r0,r2 ; r0 = word code address mtctr r0 bctrl ; call word, r3 = error code GET_IP r2 addi r2,r2,1 ; r2 = ip' = ip + 1 PUT_IP r2 cmpwi r3,0 ; check for error beq 0b ; vm exit cmpwi r3,OP_RET ; return from vm? bne 1f li r3,0 ; clear the error 1: addi r1,r1,vmFrameSz lwz r4,8(r1) mtlr r4 lwz r4,(-vmSavedDataSz+4)(r1) STDW r4,r2,_vmEntryRp ; restore old rsfp lwz r2,-vmSavedDataSz(r1) PUT_IP r2 ; restore old ip RESTORE_VM_REGS blr _L_cputest: ; This word is for testing and understanding ppc code. Its ; stack effects depend on the test. ; Test "u" versus "addi". The body of the loop in the first ; version below is included in the body of the loop in the ; second version, which should run less than twice as long if ; two lwzu's is more efficient than two lwz's with addi (subi in ; this case). ; ; First version results ; (#iters = 1 000 000 000, dual 1.4GHz G4): ; ; ms@ 1000000000 ppctest ms@ swap - . ; 1413 ok ; ms@ 1000000000 ppctest ms@ swap - . ; 1413 ok ; ms@ 1000000000 ppctest ms@ swap - . ; 1415 ok ; ms@ 1000000000 ppctest ms@ swap - . ; 1414 ok ; ; Second version results. ; ; ms@ 1000000000 ppctest ms@ swap - . ; 3531 ok ; ms@ 1000000000 ppctest ms@ swap - . ; 3535 ok ; ms@ 1000000000 ppctest ms@ swap - . ; 3536 ok ; 3535 1414 - . ; 2121 ok ; ; So lwzu doesn't win here. .if 0 ; ( #iters -- ) POP_D r0 ; r2 = dsp' = dsp - WSIZE lwz r0,0(r2) mtctr r0 0: subi r2,r2,0 lwz r3,WSIZE(r2) lwz r4,2*WSIZE(r2) bdnz 0b NEXT .else ; ( #iters -- ) POP_D r0 ; r2 = dsp' = dsp - WSIZE lwz r0,0(r2) mtctr r0 0: lwzu r3,WSIZE(r2) lwzu r4,WSIZE(r2) subi r2,r2,2*WSIZE lwz r3,WSIZE(r2) lwz r4,2*WSIZE(r2) bdnz 0b NEXT .endif _L_nop: li r3,E_UNKNOWN_OP blr _L_abort: ; called by ForthCompiler.cpp, ForthVM.cpp ; Clear the data and return stacks. LDWZ r0,r2,_BottomOfStack ; r0 = bdsp STDW r0,r2,_GlobalSp ; dsp = bdsp LDWZ r0,r2,_BottomOfTypeStack ; r0 = bdtsp STDW r0,r2,_GlobalTp ; dtsp = bdtsp li r3,E_QUIT b quit1 _L_quit: ; Clear the return stacks. li r3,0 ; clear err quit1: LDWZ r0,r2,_BottomOfReturnStack ; r0 = brsp STDW r0,r2,_GlobalRp ; rsp = brsp STDW r0,r2,_vmEntryRp ; rsfp = brsp LDWZ r0,r2,_BottomOfReturnTypeStack ; r2 = brtsp STDW r0,r2,_GlobalRtp ; rtsp = brtsp blr _L_base: LDA r0,_Base PUSH_D li r0,OP_ADDR PUSH_DT NEXT _L_binary: li r0,2 STDW r0,r2,_Base NEXT _L_decimal: li r0,10 STDW r0,r2,_Base NEXT _L_hex: li r0,16 STDW r0,r2,_Base NEXT _L_false: li r0,0 PUSH_D li r0,OP_IVAL PUSH_DT NEXT _L_true: li r0,-1 PUSH_D li r0,OP_IVAL PUSH_DT NEXT _L_cells: GET_DSP r2 lwz r0,WSIZE(r2) slwi r0,r0,2 stw r0,WSIZE(r2) ; S1' = 4*S1 NEXT _L_cellplus: GET_DSP r2 lwz r3,WSIZE(r2) addi r0,r3,WSIZE stw r0,WSIZE(r2) ; S1 = S1 + WSIZE NEXT _L_dfloats: ; ( n -- n' ) GET_DSP r2 lwz r0,WSIZE(r2) slwi r0,r0,3 stw r0,WSIZE(r2) NEXT _L_dfloatplus: GET_DSP r2 lwz r3,WSIZE(r2) addi r3,r3,2*WSIZE stw r3,WSIZE(r2) NEXT _L_bl: li r0,32 PUSH_D li r0,OP_IVAL PUSH_DT NEXT _L_ret: ; called by ForthVM.cpp LDWZ r0,r2,_vmEntryRp ; r0 = rsfp LDWZU r2,r3,_GlobalRp ; r2 =rsp, r3 = &_GlobalRp cmplw r2,r0 ; if rsp >= rsfp bge 0f ; then rframe exhausted, exit vm addi r4,r2,WSIZE ; r0 = rsp' = rsp + WSIZE stw r4,0(r3) LDWZU r2,r3,_GlobalRtp ; r2 = rtsp, r3 = &_GlobalRtp lbzu r0,1(r2) ; r4 = RT1 = next op, r2 = rtsp' = rtsp + 1 stw r2,0(r3) cmpwi r0,OP_ADDR ; if op <> OP_ADDR bne 1f ; then err exit lwz r0,0(r4) STDW r0,r3,_GlobalIp ; ip = (rsp') li r3,0 ; clear err blr 0: li r3,OP_RET blr 1: li r3,E_RET_STK_CORRUPT blr _L_tick: ; called by ForthVM.cpp mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) li r0,32 LDWZU r2,r3,_GlobalSp ; r2 = dsp, r3 = &_GlobalSp stw r0,0(r2) ; S0 = r0 subi r2,r2,WSIZE stw r2,0(r3) ; dsp' = dsp - WSIZE LDWZU r2,r3,_GlobalTp ; r2 = dtsp, r3 = &_GlobalTp subi r2,r2,1 stw r2,0(r3) ; dtsp' = dtsp - 1 bl _CPP_word bl _CPP_find LDWZU r2,r3,_GlobalSp ; r2 = dsp, r3 = &_GlobalSp addi r2,r2,WSIZE stw r2,0(r3) ; dtsp' = dtsp + WSIZE LDWZU r2,r3,_GlobalTp ; r2 = dtsp, r3 = &_GlobalTp addi r2,r2,1 stw r2,0(r3) ; dtsp' = dtsp + 1 li r3,0 ; clear err addi r1,r1,minFrameSz lwz r0,8(r1) mtlr r0 blr _L_tobody: GET_DSP r2 lwz r3,WSIZE(r2) ; r3 = xt (code address) lwz r0,1(r3) ; data address is offset by one stw r0,WSIZE(r2) li r3,0 blr ; ; For precision delays, use MS instead of USLEEP ; Use USLEEP when task can be put to sleep and reawakened by OS ; _L_usleep: mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) INC_DTSP POP_D r3 ; r3 = microseconds LDWZ r0,r2,kf_usleep mtctr r0 bctrl ; r3 = 0 if successful, else -1 addi r1,r1,minFrameSz lwz r0,8(r1) mtlr r0 blr _L_ms: mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = millsecs mulli r0,r0,1000 ; r0 = microsecs stw r0,WSIZE(r2) bl _C_usec ; returns r3 = 0 addi r1,r1,minFrameSz lwz r0,8(r1) mtlr r0 blr _L_erase: ; ( addr len -- ) DEC_DTSP li r0,OP_IVAL stb r0,1(r2) DEC_DSP li r0,0 stw r0,WSIZE(r2) ; fall thru to _L_fill _L_fill: ; ( addr len char -- ) GET_DTSP r2 lbz r0,3(r2) cmpwi r0,OP_ADDR bne 0f addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 li r3,0 lwz r0,-WSIZE(r2) ; r0 = len cmpwi r0,0 ; len = 0 ? mtctr r0 lwz r4,0(r2) ; r4 = addr lwz r0,-2*WSIZE(r2) ; r0 = char beqlr ; if len = 0 then exit subi r4,r4,1 1: stbu r0,1(r4) bdnz 1b blr 0: li r3,E_NOT_ADDR blr _L_move: ; ( &source &target u -- ) GET_DSP r2 lwz r3,3*WSIZE(r2) ; r3 = &source lwz r4,2*WSIZE(r2) ; r4 = &target cmplw r3,r4 ; if &source u< &target blt _L_cmovefrom ; then move from end ; else fall thru to move from start _L_cmove: ; ( &source &target u -- ) GET_DTSP r2 lbz r0,2(r2) cmpwi r0,OP_ADDR ; T2' = OP_ADDR? lbz r0,3(r2) cmpi cr2,0,r0,OP_ADDR ; T3' = OP_ADDR? cror 2,2,10 ; cr0.EQ = cr0.EQ | cr2.EQ li r3,E_NOT_ADDR bnelr ; if not addr then err exit addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 lwz r0,-2*WSIZE(r2) ; r0 = #bytes to move cmpwi r0,0 ; #bytes = 0? mtctr r0 li r3,0 lwz r5,-WSIZE(r2) ; r5 = &target lwz r4,0(r2) ; r4 = &source beqlr ; if 0 bytes then exit subi r4,r4,1 subi r5,r5,1 0: lbzu r0,1(r4) stbu r0,1(r5) bdnz 0b blr _L_cmovefrom: ; ( &source &target u -- ) GET_DTSP r2 lbz r0,2(r2) cmpwi r0,OP_ADDR ; T2' = OP_ADDR? lbz r0,3(r2) cmpi cr2,0,r0,OP_ADDR ; T3' = OP_ADDR? cror 2,2,10 ; cr0.EQ = cr0.EQ | cr2.EQ li r3,E_NOT_ADDR bnelr ; if not addr then err exit addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 lwz r0,-2*WSIZE(r2) ; r0 = #bytes to move cmpwi r0,0 ; #bytes = 0? mtctr r0 li r3,0 lwz r5,-WSIZE(r2) ; r5 = &target lwz r4,0(r2) ; r4 = &source beqlr ; if 0 bytes then exit add r4,r4,r0 add r5,r5,r0 0: lbzu r0,-1(r4) stbu r0,-1(r5) bdnz 0b blr _L_call: .if 1 INC_DTSP POP_D r0 ; r0 = code addr mtctr r0 bctr .else mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) INC_DTSP POP_D r0 ; r0 = code addr mtctr r0 bctrl addi r1,r1,minFrameSz lwz r0,8(r1) mtlr r0 li r3,0 blr .endif _L_push: POP_D r0 PUSH_R POP_DT r0 PUSH_RT NEXT _L_pop: POP_R r0 PUSH_D POP_RT r0 PUSH_DT NEXT _L_twopush: GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 lwz r4,2*WSIZE(r2) ; r4 = S2 addi r2,r2,2*WSIZE PUT_DSP r2 GET_RSP r2 stw r0,-WSIZE(r2) ; R1' = S1 stw r4,0(r2) ; R2' = S2 subi r2,r2,2*WSIZE PUT_RSP r2 GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 lbz r4,2(r2) ; r4 = T2 addi r2,r2,2 PUT_DTSP r2 GET_RTSP r2 stb r0,-WSIZE(r2) ; R1' = T1 stb r4,0(r2) ; R2' = T2 subi r2,r2,2 PUT_RTSP r2 NEXT _L_twopop: GET_RSP r2 lwz r0,WSIZE(r2) ; r0 = R1 lwz r4,2*WSIZE(r2) ; r4 = R2 addi r2,r2,2*WSIZE PUT_RSP r2 GET_DSP r2 stw r0,-WSIZE(r2) ; S1' = R1 stw r4,0(r2) ; S2' = R2 subi r2,r2,2*WSIZE PUT_DSP r2 GET_RTSP r2 lbz r0,1(r2) ; r0 = RT1 lbz r4,2(r2) ; r4 = RT2 addi r2,r2,2 PUT_RTSP r2 GET_DTSP r2 stb r0,-WSIZE(r2) ; T1' = RT1 stb r4,0(r2) ; T2' = RT2 subi r2,r2,2 PUT_DTSP r2 NEXT _L_puship: GET_IP r0 PUSH_R li r0,OP_ADDR PUSH_RT NEXT _L_execute: POP_D r3 ; r3 = S1 subi r3,r3,1 ; r3 = ip' = S1 - 1 GET_IP r0 PUT_IP r3 PUSH_R ; R1' = ip INC_DTSP li r0,OP_ADDR PUSH_RT NEXT _L_definition: GET_IP r2 lwz r3,1(r2) ; r3 = (ip+1) = addr to execute subi r0,r3,1 PUT_IP r0 ; ip' = addr to execute - 1 addi r0,r2,4 ; r0 = ip + 4 PUSH_R li r0,OP_ADDR PUSH_RT NEXT _L_rfetch: GET_RSP r3 lwz r0,WSIZE(r3) ; r0 = R1 PUSH_D GET_RTSP r3 lbz r0,1(r3) ; r0 = RT1 PUSH_DT NEXT _L_tworfetch: ; ( r: x1 x2 -- x1 x2 s: x1 x2 ) GET_RSP r3 lwz r0,WSIZE(r3) ; r0 = R1 GET_DSP r2 stw r0,-WSIZE(r2) ; S1' = R1 lwz r0,2*WSIZE(r3) ; r0 = R2 stw r0,0(r2) ; S2' = R2 subi r2,r2,2*WSIZE PUT_DSP r2 GET_RTSP r3 lbz r0,1(r3) ; r0 = RT1 GET_DTSP r2 stb r0,-1(r2) ; T1' = RT1 lbz r0,2(r3) ; r0 = RT2 stb r0,0(r2) ; T2' = RT2 subi r2,r2,2 PUT_DTSP r2 NEXT _L_rpfetch: ; ( -- &R1 ) LDA r3,_GlobalRp lwz r2,0(r3) addi r0,r2,WSIZE PUSH_D li r0,OP_ADDR PUSH_DT NEXT _L_spfetch: ; ( -- &S1 ) LDA r3,_GlobalSp lwz r2,0(r3) addi r0,r2,WSIZE PUSH_D li r0,OP_ADDR PUSH_DT NEXT _L_i: GET_RTSP r2 lbz r0,3(r2) ; r0 = RT3 PUSH_DT GET_RSP r2 lwz r0,3*WSIZE(r2) ; r0 = R3 PUSH_D NEXT _L_j: GET_RTSP r2 lbz r0,6(r2) ; r0 = RT6 PUSH_DT GET_RSP r2 lwz r0,6*WSIZE(r2) ; r0 = R6 PUSH_D NEXT _L_loop: GET_RTSP r3 lbz r0,1(r3) cmpwi r0,OP_ADDR bne loopbad GET_RSP r3 lwz r0,2*WSIZE(r3) ; r0 = R2 = loop limit lwz r2,3*WSIZE(r3) ; r2 = R3 = loop counter addi r2,r2,1 ; r2 = counter + 1 cmpw r0,r2 ; if limit = counter + 1 beq _L_unloop ; then unloop stw r2,3*WSIZE(r3) ; R3 = counter + 1 lwz r0,WSIZE(r3) ; r0 = R1 PUT_IP r0 ; ip' = R1 = loop start NEXT _L_unloop: ; terminal count reached, discard top 3 items GET_RSP r2 addi r2,r2,3*WSIZE ; r2 = rsp' = rsp + 3*WSIZE PUT_RSP r2 GET_RTSP r2 addi r2,r2,3 ; r2 = rtsp' = rtsp + 3 PUT_RTSP r2 NEXT loopbad: li r3,E_RET_STK_CORRUPT blr _L_plusloop: GET_RTSP r3 lbz r0,1(r3) cmpwi r0,OP_ADDR bne loopbad INC_DTSP GET_DSP r2 lwzu r4,WSIZE(r2) ; r4 = del cmpwi r4,0 ; cr0.LT = (delta < 0) PUT_DSP r2 GET_RSP r3 lwz r0,2*WSIZE(r3) ; r0 = R2 = loop lim lwz r2,3*WSIZE(r3) ; r2 = R3 = i add r4,r4,r2 ; r4 = i' = i + del cmp cr2,0,r4,r0 ; cr2.LT = (i' < lim) ; get cr0.LT = (del >= 0 & i' >= lim) | (del < 0 & i' < lim) ; = ~(del >= 0 XOR i' >= lim) creqv 0,0,8 ; cr0.LT = ~(cr0.LT XOR cr2.LT) blt _L_unloop stw r4,3*WSIZE(r3) ; R3 = i' lwz r0,WSIZE(r3) ; r0 = R1 PUT_IP r0 ; ip' = R1 = loop start NEXT _L_jz: POP_D r0 INC_DTSP cmpwi r0,0 ; if zero beq 0f ; then do jump GET_IP r2 addi r2,r2,4 ; r2 = ip' = ip + 4 PUT_IP r2 NEXT 0: GET_IP r2 lwz r0,1(r2) ; r0 = rel jump count subi r2,r2,1 ; r2 = ip - 1 add r0,r0,r2 ; r0 = ip' = ip - 1 + rel jump count PUT_IP r0 NEXT _L_jnz: blr _L_jmp: GET_IP r2 lwz r0,1(r2) ; r0 = rel jump count subi r2,r2,1 ; r2 = ip - 1 add r0,r0,r2 ; r0 = ip' = ip - 1 + rel jump count PUT_IP r0 NEXT _L_count: GET_DTSP r2 ; r2 = dtsp lbz r0,1(r2) cmpwi r0,OP_ADDR ; if not OP_ADDR bne 0f ; then err exit subi r2,r2,1 PUT_DTSP r2 li r0,OP_IVAL stb r0,1(r2) ; T1' = OP_IVAL DEC_DSP ; r2 = dsp' = dsp - WSIZE lwz r3,2*WSIZE(r2) ; r3 = &count lbz r0,0(r3) stw r0,WSIZE(r2) ; S1' = count addi r3,r3,1 stw r3,2*WSIZE(r2) ; S2' = &count + 1 = &body li r3,0 blr 0: li r3,E_NOT_ADDR blr _L_ival: GET_IP r2 lwz r0,1(r2) ; r0 = emdedded data addi r2,r2,WSIZE ; r2 = ip' = ip + WSIZE PUT_IP r2 PUSH_D li r0,OP_IVAL PUSH_DT NEXT _L_addr: GET_IP r2 lwz r0,1(r2) ; r0 = embedded data addi r2,r2,WSIZE ; r2 = ip' = ip + WSIZE PUT_IP r2 PUSH_D li r0,OP_ADDR PUSH_DT NEXT _L_fval: ; ( -- (ip+5) (ip+1) ) ; used by vm, not a visible word GET_DTSP r2 subi r2,r2,2 ; r2 = dtsp' = dtsp - 2 PUT_DTSP r2 li r0,OP_IVAL stb r0,1(r2) ; T1' = OP_IVAL stb r0,2(r2) ; T2' = OP_IVAL GET_IP r2 lwz r0,1(r2) ; r0 = (ip+1) = val.hi lwz r3,5(r2) ; r3 = (ip+5) = val.lo addi r2,r2,2*WSIZE ; r2 = ip' = ip + 2*WSIZE PUT_IP r2 GET_DSP r2 subi r2,r2,2*WSIZE ; r2 = dsp' = dsp - 2*WSIZE PUT_DSP r2 stw r0,WSIZE(r2) ; S1' = (ip+1) stw r3,2*WSIZE(r2) ; S2' = (ip+5) NEXT _L_and: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 and r0,r0,r3 stw r0,WSIZE(r2) ; S1' = S1 + S2 POP_DT r0 ; r0 = T1, r2 = dtsp' lbz r3,1(r2) ; r3 = T2 and r0,r0,r3 ; and to preserve addr type stb r0,1(r2) ; T1' = T1 and T2 NEXT _L_or: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 or r0,r3,r0 stw r0,WSIZE(r2) ; S1' = S2 - S1 INC_DTSP ; r2 = dtsp' li r0,OP_IVAL stb r0,1(r2) NEXT _L_not: GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 nor r0,r0,r0 stw r0,WSIZE(r2) ; S1' = not S1 NEXT _L_xor: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 xor r0,r3,r0 stw r0,WSIZE(r2) ; S1' = S2 xor S1 INC_DTSP ; r2 = dtsp' li r0,OP_IVAL stb r0,1(r2) ; T1' = OP_IVAL NEXT _L_lshift: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 slw r0,r3,r0 stw r0,WSIZE(r2) ; S1' = S1 << S0 INC_DTSP ; result has type of first operand NEXT _L_rshift: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 srw r0,r3,r0 stw r0,WSIZE(r2) ; S1' = S1 >> S0 INC_DTSP ; result has type of first operand NEXT _L_eq: ; ( n1 n2 -- flag ) INC_DTSP POP_D r0 ; r0 = n2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = n1 cmpw r3,r0 ; cr0.EQ = (n1 = n2) cror 0,2,2 ; cr0.0 = (n1 = n2) mfcr r0 srawi r0,r0,31 ; r0 = (n1 = n2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_ne: ; ( n1 n2 -- flag ) INC_DTSP POP_D r0 ; r0 = n2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = n1 cmpw r3,r0 ; cr0.EQ = (n1 = n2) crnor 0,2,2 ; cr0.0 = (n1 <> n2) mfcr r0 srawi r0,r0,31 ; r0 = (n1 <> n2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_ult: ; ( u1 u2 -- flag ) INC_DTSP POP_D r0 ; r0 = u2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = u1 cmplw r3,r0 ; cr0.LT = (u1 < u2) mfcr r0 srawi r0,r0,31 ; r0 = (u1 < u2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_ugt: ; ( u1 u2 -- flag ) INC_DTSP POP_D r0 ; r0 = u2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = u1 cmplw r3,r0 ; cr0.GT = (u1 > u2) cror 0,1,1 ; cr0.0 = (u1 > u2) mfcr r0 srawi r0,r0,31 ; r0 = (u1 > u2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_lt: ; ( n1 n2 -- flag ) INC_DTSP POP_D r0 ; r0 = n2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = n1 cmpw r3,r0 ; cr0.LT = (n1 < n2) mfcr r0 srawi r0,r0,31 ; r0 = (n1 < n2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_gt: ; ( n1 n2 -- flag ) INC_DTSP POP_D r0 ; r0 = n2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = n1 cmpw r3,r0 ; cr0.GT = (n1 > n2) cror 0,1,1 ; cr0.0 = (n1 > n2) mfcr r0 srawi r0,r0,31 ; r0 = (n1 > n2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_le: ; ( n1 n2 -- flag ) INC_DTSP POP_D r0 ; r0 = n2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = n1 cmpw r3,r0 ; cr0.GT = (n1 > n2) crnor 0,1,1 ; cr0.0 = (n1 <= n2) mfcr r0 srawi r0,r0,31 ; r0 = (n1 <= n2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_ge: ; ( n1 n2 -- flag ) INC_DTSP POP_D r0 ; r0 = n2, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = n1 cmpw r3,r0 ; cr0.LT = (n1 < n2) crnor 0,0,0 ; cr0.LT = (n1 >= n2) mfcr r0 srawi r0,r0,31 ; r0 = (n1 >= n2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_zerolt: ; ( n -- flag ) GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 cmpwi r0,0 ; cr0.LT = (n < 0) mfcr r0 srawi r0,r0,31 ; r0 = (n < 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_zeroeq: ; ( n -- flag ) GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 cmpwi r0,0 ; cr0.EQ = (n = 0) cror 0,2,2 ; cr0.0 = (n = 0) mfcr r0 srawi r0,r0,31 ; r0 = (n = 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_zerone: ; ( n -- flag ) GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 cmpwi r0,0 ; cr0.EQ = (n = 0) crnor 0,2,2 ; cr0.0 = (n <> 0) mfcr r0 srawi r0,r0,31 ; r0 = (n <> 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_zerogt: ; ( n -- flag ) GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 cmpwi r0,0 ; cr0.GT = (n > 0) cror 0,1,1 ; cr0.0 = (n > 0) mfcr r0 srawi r0,r0,31 ; r0 = (n > 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_deq: ; ( d1 d2 -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 li r0,OP_IVAL stb r0,1(r2) GET_DSP r2 addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 lwz r0,WSIZE(r2) ; r0 = d1.lo lwz r3,-WSIZE(r2) ; r3 = d2.lo cmplw r0,r3 ; cr0.EQ = (d1.lo = d2.lo) lwz r0,0(r2) ; r0 = d1.hi lwz r3,-2*WSIZE(r2) ; r3 = d2.hi cmpl cr2,0,r0,r3 ; cr2.EQ = (d1.hi = d2.hi) crand 0,2,10 ; cr0.0 = (d1.lo = d2.lo) & (d1.hi = d2.hi) mfcr r0 srawi r0,r0,31 ; r0 = (d1 = d2) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_dzeroeq: ; ( d -- flag ) INC_DTSP ; r2 = dtsp' = dtsp + 1 li r0,OP_IVAL stb r0,1(r2) POP_D r0 ; r0 = d.hi, r2 = dsp' = dsp + WSIZE lwz r3,WSIZE(r2) ; r3 = d.lo cmpwi r0,0 ; cr0.EQ = (d.hi = 0) cmpi cr2,r3,0 ; cr2.EQ = (d.lo = 0) crand 0,2,10 ; cr0.0 = (d.hi = 0) & (d.lo = 0) mfcr r0 srawi r0,r0,31 ; r0 = (d = 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_dlt: ; ( d1 d2 -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lwz r4,WSIZE(r2) ; r4 = d2.hi lwz r5,2*WSIZE(r2) ; r5 = d2.lo lwz r6,3*WSIZE(r2) ; r6 = d1.hi lwz r7,4*WSIZE(r2) ; r7 = d1.lo addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 _DLT r6,r7,r4,r5 ; cr0.LT = (d1 < d2) mfcr r0 srawi r0,r0,31 ; r0 = (d1 < d2) ? -1 : 0 stw r0,WSIZE(r2) ; S1' = r0 NEXT _L_dult: ; ( ud1 ud2 -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lwz r4,WSIZE(r2) ; r4 = ud2.hi lwz r5,2*WSIZE(r2) ; r5 = ud2.lo lwz r6,3*WSIZE(r2) ; r6 = ud1.hi lwz r7,4*WSIZE(r2) ; r7 = ud1.lo addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 ; predicate: (ud1.h < ud2.hi) ; | [(ud1.hi = ud2.hi) & (ud1.lo < ud2.lo)] cmpl cr0,r6,r4 ; cr0.LT = (ud1.hi < ud2.hi), ; cr0.EQ = (ud1.hi = ud2.hi) cmpl cr2,r7,r5 ; cr2.LT = (ud1.lo < ud2.lo) crand 8,2,8 ; cr2.LT = (ud1.hi = ud2.hi) ; & (ud1.lo < ud2.lo) cror 0,0,8 ; cr0.LT = predicate mfcr r0 srawi r0,r0,31 ; r0 = (ud1 < ud2) ? -1 : 0 stw r0,WSIZE(r2) ; S1' = r0 NEXT _L_querydup: GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 cmpwi r0,0 beq 0f stw r0,0(r2) ; S1' = S1 subi r2,r2,WSIZE PUT_DSP r2 DEC_DTSP ; r2 = dtsp' = dtsp - 1 lbz r0,2(r2) ; r0 = T1 stb r0,1(r2) ; T1' = T1 0: NEXT _L_drop: INC_DSP INC_DTSP NEXT _L_dup: DEC_DSP ; r2 = dsp' = dsp - WSIZE lwz r0,2*WSIZE(r2) stw r0,WSIZE(r2) ; S0 = S1 DEC_DTSP lbz r0,2(r2) stb r0,1(r2) ; T0 = T1 NEXT _L_swap: GET_DSP r2 lwz r0,2*WSIZE(r2) lwz r3,WSIZE(r2) stw r3,2*WSIZE(r2) stw r0,WSIZE(r2) GET_DTSP r2 lbz r0,2(r2) lbz r3,1(r2) stb r3,2(r2) stb r0,1(r2) NEXT _L_over: GET_DSP r2 lwz r0,2*WSIZE(r2) stw r0,0(r2) ; S0 = S2 subi r2,r2,WSIZE ; r2 = dsp' = dsp - WSIZE PUT_DSP r2 GET_DTSP r2 lbz r0,2(r2) stb r0,0(r2) ; T0 = T2 subi r2,r2,1 ; r2 = dtsp' = dtsp - 1 PUT_DTSP r2 NEXT _L_rot: GET_DSP r2 lwz r0,3*WSIZE(r2) ; r0 = S3 lwz r3,2*WSIZE(r2) ; r3 = S2 stw r3,3*WSIZE(r2) ; S3' = S2 lwz r3,WSIZE(r2) ; r3 = S1 stw r3,2*WSIZE(r2) ; S2' = S1 stw r0,WSIZE(r2) ; S1' = S3 GET_DTSP r2 lbz r0,3(r2) lbz r3,2(r2) stb r3,3(r2) lbz r3,1(r2) stb r3,2(r2) stb r0,1(r2) NEXT _L_minusrot: GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = S1 lwz r3,2*WSIZE(r2) ; r3 = S2 stw r3,WSIZE(r2) ; S1' = S2 lwz r3,3*WSIZE(r2) ; r3 = S3 stw r3,2*WSIZE(r2) ; S2' = S3 stw r0,3*WSIZE(r2) ; S3' = S1 GET_DTSP r2 lbz r0,1(r2) lbz r3,2(r2) stb r3,1(r2) lbz r3,3(r2) stb r3,2(r2) stb r0,3(r2) NEXT _L_nip: ; ( x1 x2 -- x2 ) POP_D r0 ; r2 = dsp' = dsp + WSIZE stw r0,WSIZE(r2) ; S1' = S1 POP_DT r0 ; r2 = dtsp' = dtsp + 1 stb r0,1(r2) NEXT _L_tuck: ; ( x1 x2 -- x2 x1 x2 ) DEC_DSP ; r2 = dsp' = dsp - WSIZE lwz r0,3*WSIZE(r2) ; r0 = x1 lwz r3,2*WSIZE(r2) ; r3 = x2 stw r3,WSIZE(r2) ; S1' = x2 stw r0,2*WSIZE(r2) ; S2' = x1 stw r3,3*WSIZE(r2) ; S3' = x2 DEC_DTSP ; r2 = dtsp' = dtsp - 1 lbz r0,3(r2) ; r0 = t1 lbz r3,2(r2) ; r3 = t2 stb r3,1(r2) ; T1' = t2 stb r0,2(r2) ; T2' = t1 stb r3,3(r2) ; T3' = t2 NEXT _L_pick: ; ( x_u ... x_0 u -- x_[u-1] ... x_0 x_u ) GET_DSP r2 lwz r4,WSIZE(r2) addi r4,r4,2 ; r4 = u + 2 mulli r3,r4,WSIZE ; r3 = (u+2)*WSIZE lwzx r3,r2,r3 stw r3,WSIZE(r2) ; S1 = x_u GET_DTSP r2 lbzx r3,r2,r4 stb r3,1(r2) ; T1 = t_u NEXT _L_roll: ; (x_u x_[u-1] ... x_0 u -- x_[u-1] ... x_0 x_u ) POP_D r4 ; r2 = dsp' = dsp + WSIZE cmpwi r4,0 ; if u = 0 beq 0f ; then drop dtstack and exit mtctr r4 addi r4,r4,1 ; r4 = u + 1 mulli r3,r4,WSIZE ; r3 = [u+1]*WSIZE add r2,r2,r3 ; r2 = dsp' + [u+1]*WSIZE GET_DTSP r3 addi r3,r3,1 ; r3 = dtsp' = dtsp + 1 PUT_DTSP r3 add r3,r3,r4 ; r3 = dtsp' + u + 1 lwz r4,0(r2) ; r4 = x_u lbz r5,0(r3) ; r5 = t_u 2: lwzu r0,-WSIZE(r2) ; r0 = x_i stw r0,WSIZE(r2) lbzu r0,-1(r3) stb r0,1(r3) bdnz 2b stw r4,0(r2) ; S1' = x_u stb r5,0(r3) ; T1' = t_u 1: li r3,0 blr 0: INC_DTSP b 1b _L_depth: ; called by ForthVM.cpp LDWZ r0,r3,_BottomOfStack ; r0 = bdsp LDWZU r2,r3,_GlobalSp ; r2 = dsp, r3 = &_GlobalSp subf r0,r2,r0 ; r0 = bdsp - dsp subi r2,r2,WSIZE ; r2 = dsp' = dsp - WSIZE stw r2,0(r3) li r3,WSIZE divwu r0,r0,r3 stw r0,WSIZE(r2) ; S1' = depth li r0,OP_IVAL LDWZU r2,r3,_GlobalTp ; r2 = dtsp, r3 = &_GlobalTp stb r0,0(r2) ; T1' = r0 subi r2,r2,1 stw r2,0(r3) ; dtsp' = dtsp - 1 li r3,0 blr _L_2drop: GET_DSP r2 addi r2,r2,2*WSIZE PUT_DSP r2 GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 NEXT _L_2dup: GET_DSP r2 subi r2,r2,2*WSIZE ; r2 = dsp' = dsp - 2*WSIZE PUT_DSP r2 lwz r0,4*WSIZE(r2) lwz r3,3*WSIZE(r2) stw r0,2*WSIZE(r2) ; S2' = S4 stw r3,1*WSIZE(r2) ; S1' = S3 GET_DTSP r2 subi r2,r2,2 ; r2 = dtsp' = dtsp - 2 PUT_DTSP r2 lbz r0,4(r2) lbz r3,3(r2) stb r0,2(r2) ; T2' = T4 stb r3,1(r2) ; T1' = T3 NEXT _L_2swap: GET_DSP r3 lwz r0,4*WSIZE(r3) ; r0 = S4 lwz r2,2*WSIZE(r3) ; r2 = S2 stw r2,4*WSIZE(r3) ; S4' = S2 stw r0,2*WSIZE(r3) ; S2' = S4 lwz r0,3*WSIZE(r3) ; r0 = S3 lwz r2,WSIZE(r3) ; r2 = S1 stw r2,3*WSIZE(r3) ; S3' = S1 stw r0,WSIZE(r3) ; S1' = S3 GET_DTSP r3 lbz r0,4(r3) lbz r2,2(r3) stb r2,4(r3) stb r0,2(r3) lbz r0,3(r3) lbz r2,1(r3) stb r2,3(r3) stb r0,1(r3) NEXT _L_2over: GET_DSP r2 subi r2,r2,2*WSIZE ; r2 = dsp' = dsp - 2*WSIZE PUT_DSP r2 lwz r0,6*WSIZE(r2) stw r0,2*WSIZE(r2) lwz r0,5*WSIZE(r2) stw r0,1*WSIZE(r2) GET_DTSP r2 subi r2,r2,2 ; r2 = dtsp' = dtsp - 2 PUT_DTSP r2 lbz r0,6(r2) stb r0,2(r2) lbz r0,5(r2) stb r0,1(r2) NEXT _L_2rot: GET_DSP r3 lwz r0,6*WSIZE(r3) ; r0 = S6 lwz r2,4*WSIZE(r3) stw r2,6*WSIZE(r3) ; S6' = S4 lwz r2,2*WSIZE(r3) stw r2,4*WSIZE(r3) ; S4' = S2 stw r0,2*WSIZE(r3) ; S2' = S6 lwz r0,5*WSIZE(r3) ; r0 = S5 lwz r2,3*WSIZE(r3) stw r2,5*WSIZE(r3) ; S5' = S3 lwz r2,1*WSIZE(r3) stw r2,3*WSIZE(r3) ; S3' = S1 stw r0,1*WSIZE(r3) ; S1' = S5 GET_DTSP r3 lbz r0,6(r3) ; r0 = T6 lbz r2,4(r3) stb r2,6(r3) ; T6' = T4 lbz r2,2(r3) stb r2,4(r3) ; T4' = T2 stb r0,2(r3) ; T2' = T6 lbz r0,5(r3) ; r0 = T5 lbz r2,3(r3) stb r2,5(r3) ; T5' = T3 lbz r2,1(r3) stb r2,3(r3) ; T3' = T1 stb r0,1(r3) ; T1' = T5 NEXT _L_question: mflr r0 stw r0,8(r1) bl _L_fetch cmpwi r3,0 bne 0f stwu r1,-minFrameSz(r1) bl _CPP_dot ; r3 = 0 addi r1,r1,minFrameSz 0: lwz r0,8(r1) mtlr r0 blr _L_fetch: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR bne fetcherror li r0,OP_IVAL stb r0,1(r2) ; T1 = OP_IVAL GET_DSP r3 lwz r2,WSIZE(r3) ; r2 = S1 = addr lwz r0,0(r2) ; r0 = (addr) = data stw r0,WSIZE(r3) ; S1 = data li r3,0 blr fetcherror: li r3,E_NOT_ADDR blr _L_store: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR ; if T1 not OP_ADDR, bne fetcherror ; then error exit, addi r2,r2,2 ; else r2 = dtsp' = dtsp + 2 PUT_DTSP r2 GET_DSP r2 lwzu r3,WSIZE(r2) ; r3 = target addr, r2 = dsp + WSIZE lwzu r0,WSIZE(r2) ; r0 = target value, r2 = dsp + 2*WSIZE stw r0,0(r3) ; (addr) = value PUT_DSP r2 ; dsp' = dsp + 2*WSIZE NEXT _L_afetch: GET_DTSP r3 lbz r0,1(r3) cmpwi r0,OP_ADDR ; if T1 <> OP_ADDR bne fetcherror ; then err exit GET_DSP r3 lwz r2,WSIZE(r3) ; r2 = S1 = addr lwz r0,0(r2) stw r0,WSIZE(r3) ; S1 = (addr) NEXT _L_cfetch: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR ; if T1 <> OP_ADDR bne fetcherror ; then err exit li r0,OP_IVAL ; else stb r0,1(r2) ; T1 = OP_IVAL GET_DSP r3 lwz r2,WSIZE(r3) ; r2 = S1 = addr lbz r0,0(r2) ; r0 = (addr) = data stw r0,WSIZE(r3) ; S1' = data NEXT _L_cstore: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR ; if T1 not OP_ADDR, bne fetcherror ; then error exit, addi r2,r2,2 ; else r2 = dtsp' = dtsp + 2 PUT_DTSP r2 GET_DSP r2 lwzu r3,WSIZE(r2) ; r3 = target addr, r2 = dsp + WSIZE lwzu r0,WSIZE(r2) ; r0 = target value, r2 = dsp + 2*WSIZE stb r0,0(r3) ; (addr) = value PUT_DSP r2 ; dsp' = dsp + 2*WSIZE NEXT _L_wfetch: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR ; if T1 <> OP_ADDR bne fetcherror ; then err exit li r0,OP_IVAL ; else stb r0,1(r2) ; T1 = OP_IVAL GET_DSP r3 lwz r2,WSIZE(r3) ; r2 = S1 = addr lha r0,0(r2) ; r0 = (addr) = data stw r0,WSIZE(r3) ; S1 = data NEXT _L_wstore: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR ; if T1 not OP_ADDR, bne fetcherror ; then error exit, addi r2,r2,2 ; else r2 = dtsp' = dtsp + 2 PUT_DTSP r2 GET_DSP r2 lwzu r3,WSIZE(r2) ; r3 = target addr, r2 = dsp + WSIZE lwzu r0,WSIZE(r2) ; r0 = target value, r2 = dsp + 2*WSIZE sth r0,0(r3) ; (addr) = value PUT_DSP r2 ; dsp' = dsp + 2*WSIZE NEXT _L_sffetch: ; ( sf-addr -- df ) GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR bne fetcherror subi r2,r2,2 ; r2 = dtsp' = dtsp - 2 li r0,OP_IVAL PUT_DTSP r2 stb r0,2(r2) ; T2' = OP_IVAL stb r0,1(r2) ; T1' = OP_IVAL GET_DSP r2 lwz r3,WSIZE(r2) ; r3 = sf-addr subi r2,r2,WSIZE ; r2 = dsp' = dsp - WSIZE lfs f0,0(r3) ; f0 = converted df stfd f0,WSIZE(r2) PUT_DSP r2 NEXT _L_sfstore: ; ( df sf-addr -- ) GET_DTSP r2 lbz r0,1(r2) cmpwi r0,OP_ADDR bne fetcherror addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lwz r3,WSIZE(r2) ; r3 = sf-addr lfd f0,2*WSIZE(r2) ; f0 = df stfs f0,0(r3) addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 NEXT _L_dffetch: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR bne fetcherror li r0,OP_IVAL stb r0,1(r2) stb r0,0(r2) ; T0 = T1 = OP_IVAL subi r2,r2,1 PUT_DTSP r2 DEC_DSP lwz r3,2*WSIZE(r2) ; r3 = S1 = addr lwz r0,0(r3) stw r0,WSIZE(r2) ; S1' = (addr) = data.hi lwz r0,WSIZE(r3) stw r0,2*WSIZE(r2) ; S2' = (addr+WSIZE) = data.lo NEXT _L_dfstore: GET_DTSP r2 lbz r0,1(r2) ; r0 = T1 cmpwi r0,OP_ADDR bne fetcherror addi r2,r2,3 PUT_DTSP r2 GET_DSP r2 addi r0,r2,3*WSIZE PUT_DSP r0 lwz r3,WSIZE(r2) ; r3 = S1 = addr lwz r0,2*WSIZE(r2) stw r0,0(r3) ; (addr) = S2 = data.hi lwz r0,3*WSIZE(r2) stw r0,WSIZE(r3) ; (addr+WSIZE) = S3 = data.lo NEXT _L_inc: GET_DSP r3 lwz r2,WSIZE(r3) addi r2,r2,1 stw r2,WSIZE(r3) ; S1 = S1 + 1 NEXT _L_dec: GET_DSP r3 lwz r2,WSIZE(r3) subi r2,r2,1 stw r2,WSIZE(r3) ; S1 = S1 - 1 NEXT _L_twoplus: GET_DSP r3 lwz r2,WSIZE(r3) addi r2,r2,2 stw r2,WSIZE(r3) ; S1 = S1 + 2 NEXT _L_twominus: GET_DSP r3 lwz r2,WSIZE(r3) subi r2,r2,2 stw r2,WSIZE(r3) ; S1 = S1 - 2 NEXT _L_abs: ; ( d -- u ) ; Because the output is unsigned , | 0x80000000 | = 0x80000000, ; and positive number overflow is impossible. Algorithm based ; on the CWG. GET_DSP r3 lwz r2,WSIZE(r3) ; r2 = S1 = n _ABS r2,r0 stw r2,WSIZE(r3) NEXT _L_neg: GET_DSP r3 lwz r0,WSIZE(r3) neg r0,r0 ; What about 0x8000 0000? stw r0,WSIZE(r3) ; S1 = -S1 NEXT _L_max: INC_DTSP POP_D r0 ; r0 = S0, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S1 cmpw r3,r0 ; if S1 > S0 bgt 0f ; then leave S1 stw r0,WSIZE(r2) ; else leave S0 0: NEXT _L_min: ; ( n1 n2 -- min ) INC_DTSP POP_D r0 ; r0 = S0, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S1 cmpw r3,r0 ; if S1 < S0 blt 0f ; then leave S1 stw r0,WSIZE(r2) ; else leave S0 0: NEXT _L_dmax: ; ( d1 d2 -- dmax ) GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 GET_DSP r2 lwz r7,4*WSIZE(r2) ; r7 = d1.lo lwz r6,3*WSIZE(r2) ; r6 = d1.hi lwz r5,2*WSIZE(r2) ; r5 = d2.lo lwz r4,WSIZE(r2) ; r4 = d2.hi addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE PUT_DSP r2 ; predicate: (d1.hi < d2.hi) ; | [(d1.hi = d2.hi) & (d1.lo u< d2.lo)] cmp cr0,r6,r4 ; cr0.LT = (d1.hi < d2.hi), ; cr0.EQ = (d1.hi = d2.hi) cmpl cr2,r7,r5 ; cr2.LT = (d1.lo u< d2.lo) crand 8,2,8 ; cr2.LT = (d1.hi = d2.hi) ; & (d1.lo u< d2.lo) cror 0,0,8 ; cr0.LT = predicate bge 0f stw r5,2*WSIZE(r2) ; else S2' = d2.lo stw r4,WSIZE(r2) ; S1' = d2.hi 0: PUT_DSP r2 li r3,0 blr _L_dmin: ; ( d1 d2 -- dmin ) GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 GET_DSP r2 lwz r7,4*WSIZE(r2) ; r7 = d1.lo lwz r6,3*WSIZE(r2) ; r6 = d1.hi lwz r5,2*WSIZE(r2) ; r5 = d2.lo lwz r4,WSIZE(r2) ; r4 = d2.hi addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE PUT_DSP r2 ; predicate: (d1.hi < d2.hi) ; | [(d1.hi = d2.hi) & (d1.lo u< d2.lo)] cmp cr0,r6,r4 ; cr0.LT = (d1.hi < d2.hi), ; cr0.EQ = (d1.hi = d2.hi) cmpl cr2,r7,r5 ; cr2.LT = (d1.lo u< d2.lo) crand 8,2,8 ; cr2.LT = (d1.hi = d2.hi) ; & (d1.lo u< d2.lo) cror 0,0,8 ; cr0.LT = predicate blt 0f stw r5,2*WSIZE(r2) ; else S2' = d2.lo stw r4,WSIZE(r2) ; S1' = d2.hi 0: PUT_DSP r2 li r3,0 blr _L_twostar: GET_DSP r3 lwz r2,WSIZE(r3) slwi r2,r2,1 stw r2,WSIZE(r3) ; S1 = S1 << 1 NEXT _L_twodiv: GET_DSP r3 lwz r2,WSIZE(r3) srawi r2,r2,1 stw r2,WSIZE(r3) ; S1 = S1 >> 1 (algebraic) NEXT _L_add: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 add r0,r0,r3 stw r0,WSIZE(r2) ; S1' = S1 + S2 POP_DT r0 ; r0 = T1, r2 = dtsp' lbz r3,1(r2) ; r3 = T2 and r0,r0,r3 ; and to preserve addr type stb r0,1(r2) ; T1' = T1 and T2 NEXT _L_sub: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 sub r0,r3,r0 stw r0,WSIZE(r2) ; S1' = S2 - S1 INC_DTSP ; result has type of first operand NEXT _L_mul: POP_D r0 ; r0 = S1, r2 = dsp' lwz r3,WSIZE(r2) ; r3 = S2 mullw r0,r3,r0 stw r0,WSIZE(r2) ; S1' = S2 * S1 INC_DTSP ; result has type of first operand NEXT _L_div: ; ( num den -- quo ) GET_DSP r4 ; r4 = dsp lwzu r3,WSIZE(r4) ; r3 = den, r2 = dsp' = dsp + WSIZE lwz r0,WSIZE(r4) ; r0 = num divwo. r0,r0,r3 ; r0 = num/den bso diverr PUT_DSP r4 INC_DTSP stw r0,WSIZE(r4) ; S1' = r0 li r3,0 blr diverr: subfo. r0,r0,r0 ; clear OV cmplwi r3,0 ; if denom <> 0 bne divov ; then 0x8000 0000 / -1 divby0: li r3,E_DIV_ZERO ; else num = 0 blr divov: li r3,E_DIV_OVERFLOW blr _L_mod: ; ( num den -- rem ) GET_DSP r5 ; r5 = dsp lwzu r3,WSIZE(r5) ; r3 = den, r5 = dsp' = dsp + WSIZE lwz r4,WSIZE(r5) ; r4 = num divwo. r0,r4,r3 ; r0 = num/den bso diverr PUT_DSP r5 mullw r0,r0,r3 subf r0,r0,r4 ; r0 = remainder stw r0,WSIZE(r5) ; S1' = r0 INC_DTSP NEXT _L_slashmod: ; ( num denom -- rem quo ) GET_DSP r2 lwz r3,WSIZE(r2) ; r3 = S1 = denom lwz r4,2*WSIZE(r2) ; r4 = S2 = num divwo. r0,r4,r3 ; r0 = S2/S1 stw r0,WSIZE(r2) ; S1' = quo mullw r0,r0,r3 subf r0,r0,r4 stw r0,2*WSIZE(r2) ; S2' = rem bso diverr NEXT _L_starslash: ; ( n1 n2 denom -- quo ) GET_DSP r2 addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE lwz r5,WSIZE(r2) ; r5 = n1 lwz r6,0(r2) ; r6 = n2 mulhw r4,r5,r6 ; r4 = (n1*n2).hi mullw r5,r5,r6 ; r5 = (n1*n2).lo lwz r6,-WSIZE(r2) ; r6 = denom cmpwi r6,0 ; denom = 0? beq divby0 ; if denom = 0 then err exit mflr r10 bl sdiv64by32 ; uses r11, pov impossible mtlr r10 cmpwi r4,0 ; cr0.EQ = (quo.hi = 0) cmpi cr2,r5,0 ; cr2.EQ = (quo.lo = 0), cr2.LT = (quo.lo < 0) crandc 2,2,8 ; cr0.EQ = (quo.hi = 0) & (quo.lo >= 0) cmpi cr3,r4,-1 ; cr3.EQ = (quo.hi = -1) crand 10,8,14 ; cr2.EQ = cr2.LT & cr3.EQ cror 2,2,10 ; cr0.EQ = cr0.EQ | cr2,EQ bne divov stw r5,WSIZE(r2) ; S1' = quo.lo PUT_DSP r2 GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 ; dtsp' = dtsp + 2 NEXT _L_starslashmod: ; ( n1 n2 den -- rem quo ) GET_DSP r2 lwz r5,3*WSIZE(r2) ; r5 = n1 lwz r6,2*WSIZE(r2) ; r6 = n2 mulhw. r4,r5,r6 ; r4 = (n1*n2).hi mullw r5,r5,r6 ; r5 = (n1*n2).lo lwzu r6,WSIZE(r2) ; r6 = den, r2 = dsp' = dsp + WSIZE cmpwi r6,0 beq divby0 ; if den = 0 then err exit mflr r10 bl sdiv64by32 ; uses r11, pov impossible mtlr r10 cmpwi r4,0 ; cr0.EQ = (quo.hi = 0) cmpi cr2,r5,0 ; cr2.EQ = (quo.lo = 0), cr2.LT = (quo.lo < 0) crandc 2,2,8 ; cr0.EQ = (quo.hi = 0) & (quo.lo >= 0) cmpi cr3,r4,-1 ; cr3.EQ = (quo.hi = -1) crand 10,8,14 ; cr2.EQ = cr2.LT & cr3.EQ cror 2,2,10 ; cr0.EQ = cr0.EQ | cr2,EQ bne divov stw r5,WSIZE(r2) ; S1' = quo.lo stw r6,2*WSIZE(r2) ; S2' = rem PUT_DSP r2 INC_DTSP NEXT _L_plusstore: GET_DTSP r2 lbz r0,1(r2) cmpwi r0,OP_ADDR bne fetcherror addi r0,r2,2 ; dtsp' = dtsp + 3 PUT_DTSP r0 GET_DSP r2 addi r0,r2,2*WSIZE ; dsp' = dsp + 3*WSIZe PUT_DSP r0 lwz r0,2*WSIZE(r2) ; r0 = S2 lwz r2,WSIZE(r2) ; r2 = S1 = addr lwz r3,0(r2) ; r3 = (addr) add r0,r0,r3 stw r0,0(r2) ; (addr) = r3 + S1 NEXT _L_dabs: LDWZ r4,r4,_GlobalSp ; r4 = dsp lwz r2,WSIZE(r4) ; r2 = S1 = d.hi lwz r3,2*WSIZE(r4) ; r3 = S2 = d.lo _DABS r2,r3,r0 ; r2:r3: = |d|.hi:|d|.lo stw r2,WSIZE(r4) stw r3,2*WSIZE(r4) li r3,0 blr _L_dnegate: ; ( d -- -d ) GET_DSP r3 lwz r0,WSIZE(r3) ; r0 = d.hi lwz r2,2*WSIZE(r3) ; r2 = d.lo _DNEG r0,r2 stw r2,2*WSIZE(r3) ; S1 = (-d).lo stw r0,WSIZE(r3) ; S0 = (-d).hi NEXT _L_dplus: ; ( d1 d2 -- d1+d2 ) GET_DSP r2 lwzu r3,2*WSIZE(r2) ; r3 = d2.lo, r2 = dsp' = dsp + 2*WSIZE PUT_DSP r2 lwz r0,2*WSIZE(r2) ; r0 = d1.lo addc r0,r0,r3 stw r0,2*WSIZE(r2) ; S2' = (d1+d2).lo lwz r3,-WSIZE(r2) ; r3 = d2.hi lwz r0,WSIZE(r2) ; r0 = d1.hi adde r0,r0,r3 stw r0,WSIZE(r2) ; S1' = (d1+d2).hi GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 NEXT _L_dminus: ; ( d1 d2 -- d1-d2 ) ; Called by CPP_sharp() in ForthVM.cpp. GET_DTSP r2 addi r2,r2,2 ; dtsp' = dtsp + 2 PUT_DTSP r2 LDWZU r2,r11,_GlobalSp ; r2 = dsp, r11 = &_GlobalSp lwz r3,4*WSIZE(r2) ; r0 = d1.lo lwz r0,3*WSIZE(r2) ; r3 = d1.hi lwz r5,2*WSIZE(r2) ; r4 = d2.lo lwz r4,WSIZE(r2) ; r5 = d2.hi DMINUS r0,r3,r4,r5 ; r0:r3 = (d1-d2).hi:(d1-d2).lo addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE stw r3,2*WSIZE(r2) ; S2' = (d1-d2).lo stw r0,WSIZE(r2) ; S1' = (d1-d2).hi stw r2,0(r11) li r3,0 blr _L_umstar: GET_DSP r4 lwz r2,2*WSIZE(r4) ; r2 = S2 lwz r3,WSIZE(r4) ; r3 = S1 mulhwu r0,r2,r3 stw r0,WSIZE(r4) ; S1' = (S1*S2).hi mullw r0,r2,r3 ; same for signed or unsigned stw r0,2*WSIZE(r4) ; S2' = (S1*S2).lo li r3,0 blr _L_dsstar: ; ( d s -- [d*s].lo [d*s].mi [d*s].hi ) GET_DSP r2 lwz r5, WSIZE(r2) ; r5 = s lwz r3,2*WSIZE(r2) ; r3 = d.hi lwz r4,3*WSIZE(r2) ; r4 = d.lo _DSSTAR r3,r4,r5,r6,r7 stw r5,3*WSIZE(r2) ; S3'= [d*s].lo stw r4,2*WSIZE(r2) ; S2' = [d*s].mi stw r3,WSIZE(r2) ; S1' = [d*s].hi NEXT .include "div.s" _L_umslashmod: ; ( num.lo num.hi denom -- rem quo ) ; Divide unsigned double length by unsigned single length to ; give unsigned single quotient and remainder. A "Divide ; overflow" error results if the quotient doesn't fit into a ; single word. GET_DSP r2 lwzu r6,WSIZE(r2) ; r6 = S1 = denom, r2 = dsp' = dsp + WSIZE cmpwi r6,0 ; denom = 0? beq divby0 ; if denom = 0, then exit mflr r10 lwz r4,WSIZE(r2) ; r4 = S2 = num.hi lwz r5,2*WSIZE(r2) ; r5 = S3 = num.lo bl udiv64by32 cmpwi r4,0 mtlr r10 bne divov ; if quo.hi <> 0 then overflow exit PUT_DSP r2 stw r5,WSIZE(r2) ; S1' = quo stw r6,2*WSIZE(r2) ; S2' = rem INC_DTSP NEXT _L_mstar: ; ( n1 n2 -- d ) GET_DSP r2 lwz r3,2*WSIZE(r2) ; r3 = n1 lwz r4,WSIZE(r2) ; r4 = n2 mulhw r0,r3,r4 stw r0,WSIZE(r2) ; S1' = (n1*n2).hi mullw r0,r3,r4 stw r0,2*WSIZE(r2) ; S2' = (n1*n2).lo li r3,0 blr _L_mplus: ; ( d1|ud1 n -- d2|ud2 ) INC_DTSP POP_D r4 ; r4 = n, r2 = dsp' li r5,32 sraw r0,r4,r5 ; all r0 bits = sign bit lwz r5,2*WSIZE(r2) ; r5 = d1.lo addc r3,r4,r5 ; r3 = (d1+n).lo stw r3,2*WSIZE(r2) ; S2' = sum.lo lwz r5,WSIZE(r2) ; r5 = d1.hi adde r0,r0,r5 ; r0 = (d1+n).hi stw r0,WSIZE(r2) ; S1' = sum.hi NEXT _L_mslash: ; ( num.lo num.hi den -- quo ) GET_DSP r2 lwz r6,WSIZE(r2) ; r6 = den cmpwi r6,0 ; den = 0? beq divby0 lwz r4,2*WSIZE(r2) ; r4 = num.hi lwz r5,3*WSIZE(r2) ; r5 = num.lo mflr r10 bl sdiv64by32 ; uses r11, cr0.EQ = pov mtlr r10 cmpi cr2,r4,0 ; cr2.EQ = (quo.hi = 0) cmpi cr3,r5,0 ; cr3.LT = (quo.lo < 0) crandc 10,10,12 ; cr2.EQ = (quo.hi = 0) & (quo.lo >= 0) cmpi cr4,r4,-1 ; cr4.EQ = (quo.hi = -1) crand 18,18,12 ; cr4.EQ = (quo.hi = -1) & (quo.lo < 0) cror 10,10,18 ; cr2.EQ = [(quo.hi = 0) & (quo.lo >= 0)] ; | [(quo.hi = -1) & (quo.lo < 0)] crandc 2,10,2 ; cr0.EQ = cr2.EQ & ~pov bne divov addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE PUT_DSP r2 stw r5,WSIZE(r2) ; S1' = quo GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 NEXT _L_udmstar: ; ( ud.lo ud.hi u -- ut.lo ut.mid ut.hi ) ; Called by CPP_sharp() in ForthVM.cpp. ; Multiply the unsigned double ud and unsigned single u to give ; the triple length product ut. The algorithm is expressed by ; ( x y z -- xz.lo xz.hi+yz.lo yz.hi+CA ) ; When called by CPP_sharp(), z = Base. We treat z as the ; "smaller" multiplier, and put it in the rightmost argument of ; multiply commands, for a possible speedup. LDWZ r3,r3,_GlobalSp ; r3 = dsp lwz r2,WSIZE(r3) ; r2 = z lwz r0,3*WSIZE(r3) ; r0 = x mulhwu r4,r0,r2 ; r4 = xz.hi mullw r0,r0,r2 stw r0,3*WSIZE(r3) ; s3 = zx.lo lwz r0,2*WSIZE(r3) ; r0 = y mulhwu r5,r0,r2 ; r5 = yz.hi mullw r0,r0,r2 ; r0 = yz.lo addc r0,r4,r0 stw r0,2*WSIZE(r3) ; s2 = xz.hi + yz.lo addze r0,r5 stw r0,WSIZE(r3) ; s1 = yz.hi + carry li r3,0 blr _L_utsslashmod: ; ( num.lo num.mi num.hi denom ; -- rem quo.lo quo.mi quo.hi ) ; Divide unsigned triple length by unsigned single length to ; give an unsigned triple quotient and single remainder. LDWZ r2,r3,_GlobalSp ; r2 = dsp lwz r6,WSIZE(r2) cmpwi r6,0 ; denom = 0? beq divby0 ; if denom = 0, then err exit mflr r11 lwz r3,2*WSIZE(r2) ; r3 = num.hi lwz r4,3*WSIZE(r2) ; r4 = num.mi lwz r5,4*WSIZE(r2) ; r5 = num.lo bl udiv96by32 stw r3,WSIZE(r2) ; S1' = quo.hi stw r4,2*WSIZE(r2) ; S2' = quo.mi stw r5,3*WSIZE(r2) ; S3' = quo.lo stw r6,4*WSIZE(r2) ; S4' = rem mtlr r11 NEXT _L_stsslashrem: ; ( num.lo num.mi num.hi denom ; -- rem quo.lo quo.mi quo.hi ) ; Divide signed triple length by signed single length to give a ; signed triple quotient and single remainder, according to the ; rule for symmetric division. Aside from the normal "Divide by ; zero" error, a "Divide overflow" error results from ; 0x80...0 / -1. GET_DSP r2 mflr r10 lwz r6,WSIZE(r2) ; r6 = denom cmpwi r6,0 ; denom = 0? lwz r5,4*WSIZE(r2) ; r5 = num.lo lwz r4,3*WSIZE(r2) ; r4 = num.mi lwz r3,2*WSIZE(r2) ; r3 = num.hi beq divby0 ; if denom = 0 then err exit bl sdiv96by32 mtlr r10 beq divov ; if pos number overflow then err exit stw r3,WSIZE(r2) ; S1' = quo.hi stw r4,2*WSIZE(r2) ; S2' = quo.mi stw r5,3*WSIZE(r2) ; S3' = quo.lo stw r6,4*WSIZE(r2) ; S4' = rem NEXT _L_utmslash: ; ( num.lo num.mi num.hi denom -- quo.lo quo.mi ) ; Called by ForthVM.cpp. ; Divide unsigned triple length by unsigned single to give ; unsigned double quotient. A "Divide overflow" error results ; if the quotient doesn't fit into a double word. LDWZU r2,r11,_GlobalSp ; r2 = dtsp, r11 = &_GlobalSp addi r2,r2,2*WSIZE ; r2 = dtsp' = dtsp + 2*WSIZE lwz r6,-WSIZE(r2) ; r6 = denom cmpwi r6,0 ; denom = 0? beq divby0 ; if denom = 0 then err exit mflr r10 ; save in case of call lwz r3,0(r2) ; r3 = num.hi lwz r4,WSIZE(r2) ; r4 = num.mi lwz r5,2*WSIZE(r2) ; r5 = num.lo bl udiv96by32 cmpwi r3,0 ; quo.hi = 0? mtlr r10 stw r4,WSIZE(r2) ; S1' = quo.mi stw r5,2*WSIZE(r2) ; S2' = quo.lo bne divov ; if quo.hi <> 0 then overflow exit stw r2,0(r11) ; update _GlobalSp LDWZU r2,r11,_GlobalTp ; r2 = dtsp, r11 = &_GlobalTp addi r2,r2,2 stw r2,0(r11) ; dtsp' = dtsp + 2 blr ; else r3 = 0 _L_mstarslash: ; ( d n +denom -- [d*n/denom].lo [d*n/denom].mi ) ; Callable from C, but currently not called by kForth. ; The ANS "ambiguous result" when denom < 0 is the same as ; symmetric division here. An error is signaled when the ; quotient doesn't fit into two cells. LDWZ r2,r3,_GlobalSp ; r2 = dsp addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE lwz r5,0(r2) ; r5 = n lwz r3,WSIZE(r2) ; r3 = d.hi lwz r4,2*WSIZE(r2) ; r4 = d.lo _DSSTAR r3,r4,r5,r6,r7 ; r3:r4:r5 = [d*n].hi:[d*n].mi:[d*n].lo mflr r10 lwz r6,-WSIZE(r2) ; r6 = denom cmpwi r6,0 ; if denom = 0 beq divby0 ; then error exit bl sdiv96by32 ; pos number ov impossible mtlr r10 cmpwi r3,0 ; cr0.EQ = (quo.hi = 0) cmpi cr2,r4,0 ; cr2.EQ = (quo.mi = 0), cr2.LT = (quo.mi < 0) crandc 2,2,8 ; cr0.EQ = (quo.hi = 0) & (quo.mi >= 0) cmpi cr3,r3,-1 ; cr3.EQ = (quo.hi = -1) crand 10,8,14 ; cr2.EQ = (quo.mi < 0) & (quo.hi = -1) cror 2,2,10 ; cr0.EQ = cr0.EQ | cr2.EQ bne divov stw r4,WSIZE(r2) ; S1' = quo.mi stw r5,2*WSIZE(r2) ; S2' = quo.lo STDW r2,r3,_GlobalSp LDWZU r2,r3,_GlobalTp ; r2 = dtsp, r3 = &_GlobalTp addi r2,r2,2 stw r2,0(r3) ; dtsp' = dtsp + 2 li r3,0 blr _L_fmslashmod: ; ( num.lo num.hi den -- rem quo ) GET_DSP r2 mflr r10 lwz r4,2*WSIZE(r2) ; r4 = num.hi lwz r5,3*WSIZE(r2) ; r5 = num.lo lwzu r6,WSIZE(r2) ; r6 = den, r2 = dsp' = dsp + WSIZE cmpwi r6,0 beq divby0 ; if den = 0 then err exit bl sdiv64by32 ; uses r11, cr0.EQ = pov mtlr r10 cmpi cr2,r4,0 ; cr2.EQ = (quo.hi = 0) cmpi cr3,r5,0 ; cr3.LT = (quo.lo < 0) crandc 10,10,12 ; cr2.EQ = (quo.hi = 0) & (quo.lo >= 0) cmpi cr4,r4,-1 ; cr4.EQ = (quo.hi = -1) crand 18,18,12 ; cr4.EQ = (quo.hi = -1) & (quo.lo < 0) cror 10,10,18 ; cr2.EQ = [(quo.hi = 0) & (quo.lo >= 0)] ; | [(quo.hi = -1) & (quo.lo < 0)] crandc 2,10,2 ; cr0.EQ = cr2.EQ & ~pov bne divov PUT_DSP r2 cmpwi r6,0 ; cr0.EQ = (rem.sym = 0) cmpi cr2,r4,0 ; cr2.LT = (quo < 0) crorc 2,2,8 ; cr0.EQ = (rem.sym = 0) | (quo >= 0) beq 0f ; skip floor if same as sym lwz r0,0(r2) ; else floor add r6,r6,r0 ; r6 = rem.sym + denom subic r5,r5,1 subfze r4,r4 ; r4 = quo.sym - 1 0: stw r5,WSIZE(r2) ; S1' = quo.lo stw r6,2*WSIZE(r2) ; S2' = rem INC_DTSP NEXT _L_smslashrem: ; ( num.lo num.hi den -- rem quo) GET_DSP r2 mflr r10 lwz r4,2*WSIZE(r2) ; r4 = num.hi lwz r5,3*WSIZE(r2) ; r5 = num.lo lwzu r6,WSIZE(r2) ; r6 = den, r2 = dsp' = dsp + WSIZE cmpwi r6,0 beq divby0 ; err exit if den = 0 bl sdiv64by32 ; uses r11, cr0.EQ = pov mtlr r10 cmpi cr2,r4,0 ; cr2.EQ = (quo.hi = 0) cmpi cr3,r5,0 ; cr3.LT = (quo.lo < 0) crandc 10,10,12 ; cr2.EQ = (quo.hi = 0) & (quo.lo >= 0) cmpi cr4,r4,-1 ; cr4.EQ = (quo.hi = -1) crand 18,18,12 ; cr4.EQ = (quo.hi = -1) & (quo.lo < 0) cror 10,10,18 ; cr2.EQ = [(quo.hi = 0) & (quo.lo >= 0)] ; | [(quo.hi = -1) & (quo.lo < 0)] crandc 2,10,2 ; cr0.EQ = cr2.EQ & ~pov bne divov PUT_DSP r2 stw r5,WSIZE(r2) ; S1' = quo.lo stw r6,2*WSIZE(r2) ; S2' = rem INC_DTSP NEXT _L_stod: DEC_DSP ; r2 = dsp' = dsp - WSIZE lwz r3,2*WSIZE(r2) ; r3 = S2' li r0,32 sraw r0,r3,r0 ; all r0 bits = sign bit stw r0,WSIZE(r2) li r0,OP_IVAL PUSH_DT NEXT ; In the following stack comments, x, y, and arithmetical ; results are double precision, 64-bit floating point numbers, ; while flag is a single cell forth true or false. _L_stof: ; ( n -- x ) ; based on CWG GET_DTSP r2 subi r2,r2,1 li r0,OP_IVAL stb r0,1(r2) PUT_DTSP r2 LDA r3,FCONST_sconv lfd f1,0(r3) ; f1 = 0x43300000 80000000 GET_DSP r2 lis r0,0x4330 stw r0,0(r2) ; upper half lwz r3,WSIZE(r2) ; r3 = n xoris r3,r3,0x8000 ; toggle sign bit stw r3,WSIZE(r2) ; lower half lfd f0,0(r2) subi r2,r2,WSIZE ; r2 = dsp' = dsp - WSIZE fsub f0,f0,f1 stfd f0,WSIZE(r2) PUT_DSP r2 NEXT _L_dtof: ; ( d -- x ) GET_DSP r2 lwz r3,WSIZE(r2) ; r3 = d.hi lwz r4,2*WSIZE(r2) ; r4 = d.lo _DABS r3,r4,r5 ; r3:r4 = |d|.hi:|d|.lo ; r5 = (d < 0) ? 1 : 0 lis r0,0x4330 stw r0,WSIZE(r2) ; upper half of lo stw r4,2*WSIZE(r2) ; lower half of lo LDA r4,FCONST_dconv_lo lfd f2,0(r4) ; f2 = 0x43300000 00000000 lfd f0,WSIZE(r2) fsub f0,f0,f2 ; f0 = float(|d|.lo) lis r0,0x4530 stw r0,WSIZE(r2) ; upper half of hi stw r3,2*WSIZE(r2) ; lower half of hi cmpwi r5,0 ; cr0.EQ = (d >= 0) LDA r4,FCONST_dconv_hi lfd f3,0(r4) ; f3 = 0x45300000 00000000 lfd f1,WSIZE(r2) fsub f1,f1,f3 ; f1 = float(|d|.hi) fadd f0,f0,f1 beq 0f fneg f0,f0 0: stfd f0,WSIZE(r2) NEXT _L_froundtos: ; ( x -- n ) INC_DTSP mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_lround mtctr r0 bctrl ; r3 = n GET_DSP r2 addi r2,r2,WSIZE ; r2 = dsp' = dsp + WSIZE PUT_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stw r3,WSIZE(r2) mtlr r0 NEXT _L_ftrunctos: ; ( x -- n ) INC_DTSP GET_DSP r2 lfdu f0,WSIZE(r2) ; f0 = x, r2 = dsp' = dsp + WSIZE PUT_DSP r2 fctiwz f0,f0 ; round towards zero stfd f0,0(r2) ; S1 = n, S0 = ?? NEXT _L_ftod: ; ( x -- d ) ; Based on CWG and AG. GET_DSP r2 lwz r0,WSIZE(r2) ; r0 = x.raw.hi cmpi cr3,0,r0,0 ; cr3.LT = ~signbit rlwinm r0,r0,0,1,31 ; clear sign, r0 = |x|.raw.hi lwz r3,2*WSIZE(r2) ; r3 = |x|.raw.lo LDA r6,FCONST_maxftod lwz r4,0(r6) lwz r5,4(r6) ; r4:r5 = 0x43ef ffff ffff ffff _DLT r4,r5,r0,r3 ; cr0.LT = (maxftod < |x|), cr2 = ? blt 4f ; if maxftod < |x|, use maxdint lis r4,0x3ff0 ; r4 = 1e.raw.hi cmpw r0,r4 blt 1f ; if |x| < 1, use 0 rlwinm r6,r0,12,20,31 subi r6,r6,1023 ; r6 = exp rlwinm r0,r0,0,12,31 ; r0 = mant.hi (low 20 bits) oris r0,r0,0x10 ; r0 = mant.hi | 0x10 0000 addic. r6,r6,-52 ; if exp - 52 >= 0 bge 3f ; then shift left neg r6,r6 ; else shift right ; r6 = sh = 52 - exp < 52 < 64 subfic r7,r6,32 ; r7 = 32 - sh srw r3,r3,r6 slw r4,r0,r7 or r3,r3,r4 addi r7,r6,-32 ; r7 = sh - 32 srw r4,r0,r7 or r5,r3,r4 srw r4,r0,r6 b 0f 3: ; r6 = sh <= 10 < 32 subfic r7,r6,32 ; r7 = 32 - sh slw r0,r0,r6 srw r4,r3,r7 or r4,r0,r4 slw r5,r3,r6 0: cror 0,12,12 ; cr0.LT = cr2.LT _QDNEG r4,r5,r0 ; attach sign to result 2: stw r4,WSIZE(r2) stw r5,2*WSIZE(r2) NEXT 4: li r5,-1 addis r4,0,0x7fff ori r4,r4,0xffff b 0b 1: li r4,0 li r5,0 b 2b _L_degtorad: ; ( x -- x*pi/180 ) GET_DSP r2 LDA r3,FCONST_pi_by_180 lfd f2,WSIZE(r2) lfd f3,0(r3) fmul f0,f2,f3 stfd f0,WSIZE(r2) NEXT _L_radtodeg: ; ( x -- x*180/pi ) GET_DSP r2 LDA r3,FCONST_180_by_pi lfd f2,WSIZE(r2) lfd f3,0(r3) fmul f0,f2,f3 stfd f0,WSIZE(r2) NEXT _L_fne: ; ( x y -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lfd f0,3*WSIZE(r2) lfd f1,WSIZE(r2) addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 fcmpu cr1,f0,f1 ; cr1.FE = (x = y) crnor 0,6,6 ; cr0.0 = (x <> y) mfcr r0 srawi r0,r0,31 ; r0 = (x = y) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_feq: ; ( x y -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lfd f0,3*WSIZE(r2) lfd f1,WSIZE(r2) addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 fcmpu cr1,f0,f1 ; cr1.FE = (x = y) cror 0,6,6 ; cr0.0 = cr1.FE mfcr r0 srawi r0,r0,31 ; r0 = (x = y) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_flt: ; ( x y -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lfd f0,3*WSIZE(r2) ; f0 = x lfd f1,WSIZE(r2) ; f1 = y addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 fcmpu cr0,f0,f1 ; cr0.FLT = (x < y) mfcr r0 srawi r0,r0,31 ; r0 = (x = y) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fgt: ; ( x y -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lfd f0,3*WSIZE(r2) lfd f1,WSIZE(r2) addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 fcmpu cr1,f0,f1 ; cr1.FGT = (x > y) cror 0,5,5 ; cr0.0 = cr1.FGT mfcr r0 srawi r0,r0,31 ; r0 = (x > y) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fle: ; ( x y -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lfd f0,3*WSIZE(r2) lfd f1,WSIZE(r2) addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 fcmpu cr1,f0,f1 ; cr1.FGT = (x > y) crnor 0,5,5 ; cr0.0 = ~cr1.FGT mfcr r0 srawi r0,r0,31 ; r0 = (x <= y) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fge: ; ( x y -- flag ) GET_DTSP r2 addi r2,r2,3 ; r2 = dtsp' = dtsp + 3 PUT_DTSP r2 GET_DSP r2 lfd f0,3*WSIZE(r2) ; f0 = x lfd f1,WSIZE(r2) ; f1 = y addi r2,r2,3*WSIZE ; r2 = dsp' = dsp + 3*WSIZE PUT_DSP r2 fcmpu cr1,f0,f1 ; cr1.FLT = (x < y) crnor 0,4,4 ; cr0.0 = ~cr1.FLT mfcr r0 srawi r0,r0,31 ; r0 = (x >= y) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fzeroeq: ; ( fp -- flag ) INC_DTSP GET_DSP r2 lfd f0,WSIZE(r2) li r0,0 stwu r0,WSIZE(r2) ; r2 = dsp' = dsp + WSIZE stw r0,WSIZE(r2) ; 0 0 in the cache? lfd f1,0(r2) ; fcmpu cr1,f0,f1 ; cr1.FEQ = (fp = 0) cror 0,6,6 ; cr0.0 = cr1.FEQ PUT_DSP r2 mfcr r0 srawi r0,r0,31 ; r0 = (fp = 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fzerolt: ; ( fp -- flag ) INC_DTSP GET_DSP r2 lfd f0,WSIZE(r2) li r0,0 stwu r0,WSIZE(r2) ; r2 = dsp' = dsp + WSIZE stw r0,WSIZE(r2) ; 0 0 in the cache? lfd f1,0(r2) ; fcmpu cr0,f0,f1 ; cr0.LT = (fp < 0) PUT_DSP r2 mfcr r0 srawi r0,r0,31 ; r0 = (fp < 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fzerogt: ; ( fp -- flag ) INC_DTSP GET_DSP r2 lfd f0,WSIZE(r2) li r0,0 stwu r0,WSIZE(r2) ; r2 = dsp' = dsp + WSIZE stw r0,WSIZE(r2) ; 0 0 in the cache? lfd f1,0(r2) ; fcmpu cr1,f0,f1 ; cr1.GT = (fp > 0) PUT_DSP r2 cror 0,5,5 ; cr0.0 = cr1.GT mfcr r0 srawi r0,r0,31 ; r0 = (fp > 0) ? -1 : 0 stw r0,WSIZE(r2) NEXT _L_fadd: ; ( x y -- x+y ) GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 GET_DSP r2 lfd f1,WSIZE(r2) ; f1 = y lfd f0,3*WSIZE(r2) ; f0 = x addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE fadd f0,f0,f1 stfd f0,WSIZE(r2) PUT_DSP r2 NEXT _L_fsub: ; ( x y -- x-y ) GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 GET_DSP r2 lfd f1,WSIZE(r2) ; f1 = y lfd f0,3*WSIZE(r2) ; f0 = x addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE fsub f0,f0,f1 stfd f0,WSIZE(r2) PUT_DSP r2 NEXT _L_fmul: ; ( x y -- x*y ) GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 GET_DSP r2 lfd f1,WSIZE(r2) ; f1 = y lfd f0,3*WSIZE(r2) ; f0 = x addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE fmul f0,f0,f1 stfd f0,WSIZE(r2) PUT_DSP r2 NEXT _L_fdiv: ; ( x y -- x/y ) GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 GET_DSP r2 lfd f1,WSIZE(r2) ; f1 = y lfd f0,3*WSIZE(r2) ; f0 = x addi r2,r2,2*WSIZE ; r2 = dsp' = dsp + 2*WSIZE fdiv f0,f0,f1 stfd f0,WSIZE(r2) PUT_DSP r2 NEXT _L_fabs: ; ( x -- |x|) GET_DSP r2 lfd f0,WSIZE(r2) fabs f0,f0 stfd f0,WSIZE(r2) NEXT _L_fneg: ; ( x -- -x ) GET_DSP r2 lfd f0,WSIZE(r2) fneg f0,f0 stfd f0,WSIZE(r2) NEXT _L_floor: ; ( x -- floor[x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_floor mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 NEXT _L_fround: ; ( x -- round[x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_round mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 NEXT _L_ftrunc: ; ( x -- trunc[x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_trunc mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 NEXT _L_fsqrt: ; ( x -- sqrt[x] ) .if 1 mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_sqrt mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 NEXT .else ; This gives "Illegal instruction" when compiled with ; -force_cpusubtype_ALL on our dual G4. GET_DSP r2 lfd f0,WSIZE(r2) fsqrt f0,f0 stfd f0,WSIZE(r2) NEXT .endif _L_fcos: ; ( x -- cos[x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_cos mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 NEXT _L_fsin: ; ( x -- sin[x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_sin mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 NEXT _L_fatan2: ; ( y x -- atan[y/x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,3*WSIZE(r2) lfd f2,WSIZE(r2) LDWZ r0,r2,kf_atan2 mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz addi r2,r2,2*WSIZE lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 PUT_DSP r2 GET_DTSP r2 addi r2,r2,2 PUT_DTSP r2 NEXT _L_fsincos: ; ( x -- sin[x] cos[x] ) mflr r0 stw r0,8(r1) stwu r1,-minFrameSz(r1) GET_DSP r2 lfd f1,WSIZE(r2) LDWZ r0,r2,kf_sin mtctr r0 bctrl ; f1 = result GET_DSP r2 lfd f0,WSIZE(r2) stfd f1,WSIZE(r2) fmr f1,f0 LDWZ r0,r2,kf_cos mtctr r0 bctrl ; f1 = result GET_DSP r2 addi r1,r1,minFrameSz subi r2,r2,2*WSIZE lwz r0,8(r1) stfd f1,WSIZE(r2) mtlr r0 PUT_DSP r2 GET_DTSP r2 subi r2,r2,2 PUT_DTSP r2 NEXT