QRZ! Ham Radio 6

home *** CD-ROM | disk | FTP | other *** search

/ QRZ! Ham Radio 6 / QRZ Ham Radio Callsign Database - Volume 6.iso / pc / files / dsp / 56100tar.z / 56100tar / 56100 / source.1 < prev next >

Wrap

Text File | 1992-04-28 | 43.4 KB | 1,138 lines

======================================================================== Program Program Benchmark Length Length in Icyc in Words B.2.1 Real Multiply 3 3 B.2.2 N Real Multiplies 2N 11 B.2.3 Real Update 4 4 B.2.4 N Real Updates 3N 14 B.2.5 N Term Real Convolution (FIR) 1N 9 B.2.6 N Term Real*Complex Convolution 2N 15 B.2.7 Complex Multiply 6 6 B.2.8 N Complex Multiplies 4N 14 B.2.9 Complex Update 7 7 B.2.10 N Complex Updates 6N 18 B.2.11 N Term Complex Convolution (FIR) 4N 14 B.2.12 Nth Order Power Series 1N 13 B.2.13 2nd Order Real Biquad Filter 12 12 B.2.14 N Cascaded 2nd Order Biquads 5N 23 B.2.15 N Radix 2 FFT Butterflies 10N 13 B.2.16 Adaptive LMS FIR 2N+19 22 B.2.17 FIr Lattice Filter 4N+7 10 B.2.18 All Pole Iir Lattice Filter 3N+11 14 B.2.19 General Lattice Filter 4N+12 15 B.2.20 Normalized Lattice Filter 5N+11 15 B.2.21 [1x3][3x3] Matrix Multiply 21 21 B.2.22 [NxN][NxN] Matrix multiply N3+7N2 25 B.2.23 3x3 2-D FIR Kernel 12 44 B.2.24 Signed 16 Bit Result Divide 36 18 B.2.25 Signed Integer Divide 32 B.2.26 Multiply 32/48-bit Fractions 4+8 B.3.1 Wave Generation Double Integration 2N 15 B.3.2 Wave Generation 2nd Order Oscillator 4N 16 B.3.3 Cascaded Transpose BIQUAD Cell 8N 15 B.3.4 IIR nth Order Direct Form II Canonic 2N 11 B.3.5 Find Index Of A Max Value In Array 3N 10 B.3.6 PID Algorithm 5 5 B.3.7 Reed Solomon Main Loop 18N 17 B.3.8 N Double Precision Real Multiplies 9N 18 B.3.9 Double Precision Autocorrelation 19 ======================================================================== APPENDIX B DSP BENCHMARKS B.1 INTRODUCTION Appendix B consists of a set of DSP Benchmarks intended to highlight the DSP56100 performance in various applications, show examples of programming techniques, and provide code fragments for user application programs. Additional code will be put on the Dr. Bub Electronic Bulletin Board System as it becomes available. The following table lists these Benchmark programs and provides an overview of the program's performance. The assembly language source is organized into 5 columns as shown below. Label Opcode Operands Data Bus Comment FIR MAC X1,X0,A X:(R0)+,X1 X:(R3)+,X0 ;Do each tap The Label column is used for program entry points and end of loop indication. The Opcode column indicates the Data ALU, Address ALU or Program Controller operation to be performed. The Operands column specifies the operands to be used by the opcode. The Data Bus specifies an optional data transfer over the Data Bus and the addressing mode to be used. The Comment column is used for documentation purposes and does not affect the assembled code. The Opcode column must always be included in the source code. For each benchmark, the number of program words and instruction cycles are given. The following equates are used in the benchmark programs. page 132 opt cc ;define section AD EQU 0 BD EQU $100 bd EQU $100 C EQU $200 c EQU $200 D EQU $300 N EQU 100 AR EQU $300 AI EQU $400 OUTPUT EQU $500 output EQU $FFF1 INPUT EQU $501 input EQU $FFF1 W EQU 0 w EQU 0 H EQU 0 XM EQU 0 state equ 0 ntaps equ $10 k equ 0 n equ 32 p equ 10 mask equ 10 image equ $40 dividend equ .25 divisor equ .5 paddr equ 0 qaddr equ 4 w1 equ 0 w2 equ 10 s equ 0 tablebase equ 0 lpc equ 8 frame equ 0 cor equ $100 shift equ $80 ;shift constant table equ $180 ;base address of a-law table org p:$40 B.2 FIRST SET OF BENCHMARKS ======================================================================== B.2.1 Real Multiply ;c = a * b move x:(r0)+n0,x1 x:(r3)+n3,x0 mpyr x1,x0,a move a,x:(r1) ======================================================================== B.2.2 N Real Multiplies ;c(I) = a(I) * b(I), I=1,...,N move #AD,r0 move #BD,r3 move #C,r2 move x:(r0)+,y0 x:(r3)+,x0 do #N,END_DO2 mpyr y0,x0,a x:(r0)+,y0 x:(r3)+,x0 move a,x:(r2)+ END_DO2 ======================================================================== B.2.3 Real Update ;d = c + a * b move x:(r0)+n0,x1 x:(r3)+n3,x0 move x:(r2),a macr x1,x0,a move a,x:(r1) ======================================================================== B.2.4 N Real Updates ;d(I) = c(I) + a(I) * b(I), I=1,...,N move #AD,r0 move #BD,r3 move #C,r2 move #D,r1 move x:(r0)+,y0 x:(r3)+,x0 do #N,END_DO4 move x:(r2)+,a macr y0,x0,a x:(r0)+,y0 x:(r3)+,x0 move a,x:(r1)+ END_DO4 ======================================================================== B.2.5 Real Correlation Or Convolution (FIR Filter) ;c(n) = SUM(I=0,...,N-1) {a(I) * b(n-I)} move #AD,r0 move #BD,r3 clr a x:(r0)+,y0 move x:(r3)+,x0 rep #N mac y0,x0,a x:(r0)+,y0 x:(r3)+,x0 rnd a ======================================================================== B.2.6 Real * Complex Correlation Or Convolution (FIR Filter) ;cr(n) + jci(n) = SUM(I=0,...,N-1) {(ar(I) + jai(I)) * b(n-I)} ;cr(n) = SUM(I=0,...,N-1) {ar(I) * b(n-I)} ;ci(n) = SUM(I=0,...,N-1) {ai(I) * b(n-I)} move #AR,r0 move #AI,r1 move #BD,r3 clr a x:(r0)+,x1 clr b x:(r1)+,y1 move x:(r3)+,x0 do #N,END_DO6 mac x0,x1,a x:(r0)+,x1 mac x0,y1,b x:(r1)+,y1 x:(r3)+,x0 END_DO6 rnd a rnd b ======================================================================== B.2.7 Complex Multiply move x:(r0)+,y1 x:(r3)+,x1 ;ar br mpy y1,x1,a x:(r0)+,y0 x:(r3)+,x0 ;ar*br, ai, bi macr -y0,x0,a ;ar*br-ai*bi mpy y1,x0,b a,x:(r2)+ ;ar*bi macr y1,x0,b ;ar*bi+ai*br move b,x:(r2)+ ======================================================================== B.2.8 N Complex Multiplies ; cr(I) + jci(I) = (ar(I) + jai(I)) * (br(I) + jbi(I)), I=1,...,N ; cr(I) = ar(I) * br(I) - ai(I) * bi(I) y1=ar x1=br ; ci(I) = ar(I) * bi(I) + ai(I) * br(I) y0=ai x0=bi move #AD,r0 move #C-1,r2 move #BD,r3 move x:(r2),b ;dummy move! move x:(r0)+,y1 x:(r3)+,x1 ;ar;br do #N,END_DO8 mpy y1,x1,a x:(r0)+,y0 x:(r3)+,x0 ;ar*br, ai, bi macr -y0,x0,a b,x:(r2)+ ;ar*br-ai*bi mpy y0,x1,b a,x:(r2)+ ;ai*br macr y1,x0,b x:(r0)+,y1 x:(r3)+,x1 ;ar*bi+ai*br, ar END_DO8 move b,x:(r2)+ ======================================================================== B.2.9 Complex Update move x:(r2)+,a ;cr move x:(r0)+,y1 x:(r3)+,x1 mac y1,x1,a x:(r0)+,y0 x:(r3)+,x0 ;cr+ar*br,ai,bi macr -y0,x0,a x:(r2)+,b ;cr+ar*br ai*bi mac y1,x0,b a,x:(r1)+ ;ci+ar*bi macr y0,x1,b ;ci+ar*bi+ai*br move b,x:(r1)+ ======================================================================== B.2.10 N Complex Updates move #AD,r0 move #BD,r3 move #D-1,r1 move #C,r2 move x:(r1),b ;dummy in B move x:(r0)+,y1 ;ar do #N,END_DOA move x:(r2)+,a x:(r3)+,x0 ;cr,br mac y1,x0,a x:(r0)+,y0 x:(r3)+,x1 ;cr+ar*br, ai, bi macr -y0,x1,a b,x:(r1)+ ;cr+ar*br ai*bi move x:(r2)+,b ;ci mpy y1,x1,b a,x:(r1)+ ;ci+ar*bi, dr macr y0,x0,b x:(r0)+,y1 ;ci+ar*bi+ai*br END_DOA move b,x:(r1)+ ======================================================================== B.2.11 Complex Correlation Or Convolution (Complex FIR) ; cr(n) + jci(n) = ; SUM(I=0,...,N-1) {(ar(I) + jai(I)) * (br(n-I) + jbi(n-I))} ; cr(n) = SUM(I=0,...,N-1) {ar(I) * br(n-I) - ai(I) * bi(n-I)} ; ci(n) = SUM(I=0,...,N-1) {ar(I) * bi(n-I) + ai(I) * br(n-I)} ; y1=ar x1=br ; y0=ai x0=bi move #AD,r0 move #BD,r3 clr a x:(r0)+,y1 ;ar clr b x:(r3)+,x1 ;br do #N,END_DOB mac y1,x1,a x:(r0)+,y0 x:(r3)+,x0 ;ar*br, ai, bi mac y1,x0,b ;ar*bi mac y0,x1,b x:(r0)+,y1 x:(r3)+,x1 ;ar*bi+ai*br, ar mac -y0,x0,a ;ar*br-ai*bi END_DOB rnd a rnd b ======================================================================== B.2.12 Nth Order Power Series (Real) ; c = SUM(I=0,...,N) {a(I) * b**I} ; = [[[a(n) *b+a(n-1)] *b+a(n-2)]*b+a(n-3)]... move #BD,r1 move #AD,r0 move x:(r1),y0 ;b move y0,x0 move x:(r0)+,a ;a(n) move x:(r0)+,b ;a(n-1) do #N/2,END_DOC mac a1,y0,b x:(r0)+,a ;a(n-2) mac b1,x0,a x:(r0)+,b ;a(0)+a(1)*b END_DOC rnd a ======================================================================== B.2.13 2nd Order Real Biquad IIR Filter ; w(n)/2 = x(n)/2 - (a1/2) * w(n-1) - (a2/2) * w(n-2) ; y(n)/2 = w(n)/2 + (b1/2) * w(n-1) + (b2/2) * w(n-2) ; DHigh Memory Order - w(n-2), w(n-1) ; DLow Memory Order - (a2/2), (a1/2), (b2/2), (b1/2) ; this version uses two pointers move #-1,n0 ori #$08,mr rnd a x:(r3)+,x1 ;x1=a2/2 move x:(r0)+,x0 ;x0=wn-2 mac y1,x0,a x:(r0)+n0,y1 x:(r3)+,x1 ;y1=wn-1 mac y1,x1,a x1,x:(r0)+ ;a=wn move x:(r3)+,x1 ;x1=b2/2 mac x1,x0,a a,x:(r0)+ move x:(r3)+,x1 ;x1=b1/2 macr y1,x1,a move a,x:<<output ======================================================================== B.2.14 N Cascaded Real Biquad IIR Filters ; w(n)/2 = x(n)/2 - (a1/2) * w(n-1) - (a2/2) * w(n-2) ; y(n)/2 = w(n)/2 + (b1/2) * w(n-1) + (b2/2) * w(n-2) ; D High Memory Order - w(n-2)1,w(n-1)1,w(n-2)2,w(n-1)2,... ; D Low Memory Order - (a2/2)1,(a1/2)1,(b2/2)1,(b1/2)1,(a2/2)2,... ; this version uses two pointers ori #$08,mr move #W,r0 move #C,r3 move #-1,n0 movep x:<<input,a rnd a x:(r3)+,x1 ;x1=a2/2 move x:(r0)+,y0 ;y0=wn-2 do #N,END_DOE mac y0,x1,a x:(r0)+n0,y1 x:(r3)+,x1 ;y1=wn-1 macr y1,x1,a x1,x:(r0)+ move x:(r3)+,x1 ;x1= b2/2 mac y0,x1,a a,x:(r0)+ move x:(r3)+,x1 ;x1=b1/2 mac y1,x1,a x:(r0)+,y0 x:(r3)+,x1 END_DOE ; this version uses three pointers ori #$08,mr move #W,r0 move #C,r3 move #C+2,r1 move #2,n3 move #4,n1 move #-1,n0 movep x:<<input,a ;a=x move x:(r0)+,y0 x:(r3)+,x0 ;y0=w-2 do #N,END_DOF mac y0,x0,a x:(r0)+n0,y1 x:(r3)+n3,x0 ;w-1;a1/2 macr y1,x0,a y1,x:(r0)+ ;a=w move x:(r1)+n1,x0 x:(r3)+,x1 ;x0=b2/2 mac y0,x0,a a,x:(r0)+ ;a=w+b2/2w-2 mac y1,x1,a x:(r0)+,y0 x:(r3)+,x0 ;a=y; next w-2 END_DOF ======================================================================== B.2.15 N Radix 2 FFT Butterflies ; Decimation in time (DIT), in-place algorithm ; Twiddle Factor Wk= wr - jwi = cos(2pk/N) -j sin(2pk/N) pointed by r1 ; which must be saved on each pass. ; xr = ar + wr * br - wi * bi ; xi = ai + wi * br + wr * bi ; yr = ar - wr * br + wi * bi = 2 * ar - xr ; yi = ai - wi * br - wr * bi = 2 * ai - xi move x:(r1)+,y0 x:(r3)+,x1 ;y0=wr; x1=br move x:(r0),b ;b=ar move x:(r1)+n1,y1 ;y1=wi ; save r1, update r1 to point last bi/yi do #n,end_bfly mac y0,x1,b x:(r3)+,x0 ;b=ar+wrbr macr y1,x0,b a,x:(r1)+ ;b=xr move x:(r0)+,a ;a=ar subl b,a b,x:(r2)+ ;a=2ar-xr=yr move x:(r0),b move a,x:(r1)+ ;b=ai mac y1,x1,b x:(r3)+,x1 ;b=ai+wibr macr y0,x0,b x:(r0)+,a x:(r3)+,x0 ;b=xi;a=ai subl b,a b,x:(r2)+ ;a=2ai-xi=yi move x:(r0),b ;b=ar end_bfly move b,x:(r1)+n1 ;save last yi ; save r1, update r1 to point last bi/yi ======================================================================== B.2.16 LMS ADAPTIVE FILTER ;Notation and symbols: ; x(n) - Input sample at time n. ; d(n) - Desired signal at time n. ; y(n) - FIR filter output at time n. ; H(n) - Filter coefficient vector at time n. ; H={c0,c1,c2,...,ck,...,c(N-1)} ; X(n) - Filter state variable vector at time N. ; X={x(n),x(n-1),....,x(n-N+1)} ; Mu - Adaptation gain. ; N - Number of coefficient taps in the filter. ; True LMS Algorithm Delayed LMS Algorithm ; Get input sample Get input sample ; Save input sample Save input sample ; Do FIR Do FIR ; Get d(n), find e(n) Update coefficients ; Update coefficients Get d(n), find e(n) ; Output y(n) Output y(n) ; Shift vector X Shift vector X ; System equations: ; e(n)=d(n)-H(n)X(n) e(n)=d(n)-H(n)X(n) (FIR filter and error) ; H(n+1)=H(n)+uX(n)e(n) H(n+1)=H(n)+uX(n-1)e(n-1) (Coefficient update) ;References: ;"Adaptive Digital Filters and Signal Analysis", Maurice G. Bellanger, ; Marcel Deker Inc. New York and Basel ;"The DLMS Algorithm Suitable for the Pipelined Realization of Adaptive ; Filters", Proc. IEEE ASSP Workshop, Academia Sinica, Beijing, 1986 ;Note: ;The sections of code shown describe how to initialize all registers, ;filter an input sample and do the coefficient update. Only the instruc- ;tions relating to the filtering and coefficient update are shown as ;part of the benchmark. Instructions executed only once (for initial- ;ization) or instructions that may be user application dependent are not ;included in the benchmark. ======================================================================== ;Implementation of the true LMS on the DSP56100 move #XM,r0 ;start of X move #N-1,m0 ;mod 4 move #-2,n0 ;adjustment for filtering move m0,m2 ;mod N movep x:<<input,y0 ;get input sample move #H,r3 ;coefficients clr a y0,x:(r0)+ ;save x(n) move x:(r3)+,x1 ;get c0 rep #N-1 ;do fir mac y0,x1,a x:(r0)+,y0 x:(r3)+,x1 macr y0,x1,a ;last tap movep a,x:<<output ;output fir if desired ;Get d(n), subtract fir output, multiply by "u", put the result in x0. ;This section is application dependent. move #H,r3 ;coefficients move r3,r2 ;coefficients move x:(r0)+,y0 ;get x(n) move x:(r3)+,a ;a=c0 do #ntaps,_coefupdate ;update coef. macr x0,y0,a x:(r0)+,y0 x:(r3)+,x1 tfr x1,a a,x:(r2)+ ;copy c, _coefupdate move x:(r0)+n0,y0 ;update r0 move x:(r3)-,y0 ;update r3 ======================================================================== ; Implementation of the delayed LMS on the DSP56100 ; Delayed LMS algorithm with matched coefficient and data vectors ; Algorithm runs in 2N (2 coeffs processed in each 4 cycle loop) ; Register Usage: ; Data Sample is stored in Y0 and Y1. ; Coefficient is stored in X1 ; Loop Gain * Error is stored in X0. ; FIR operation done in B. ; Coeff update operation done in A. ; FIR sum = a = a +c(k)old*x(n-k) ; c(k)new= b = c(k)old -mu*eold *x(n-k-1) move #state,r0 move #ntaps,m0 move #-2,n0 move #1,n1 move #c+1,r3 move #c,r1 clr b x:(r0)+,y0 ;y0 = x(n) move x:(r0)+,y1 x:(r3)+,x1 ;y1=x(n-1) do #ntaps/2,end_lms mac y0,x1,b a,x:(r1)+n1 x1,a macr x0,y1,a x:(r0)+,y0 x:(r3)+,x1 mac x1,y1,b a,x:(r1)+n1 x1,a macr y0,x0,a x:(r0)+,y1 x:(r3)+,x1 end_lms move a,x:(r1)+ move (r0)+n0 ======================================================================== ; Implementation of the double precision true LMS on the DSP56100 ; Memory map: move #XM,r0 ;start of X move #N-1,m0 ;mod 4 move #-2,n0 ;adjustment for filtering move #2,n3 move m0,m2 ;mod N movep x:<<input,y0 ;get input sample move #H,r3 ;coefficients clr a y0,x:(r0)+ ;save x(n) move x:(r3)+n3,x1 ;get c0 rep #N-1 ;do fir mac x1,y0,a x:(r0)+,y0 x:(r3)+n3,x1 ;mac; next x macr x1,y0,a ;last tap movep a,x:<<output ;output fir if desired ;Get d(n), subtract fir output, multiply by "u", put the result in x0. ;This section is application dependent. move #H,r3 ;coefficients move r3,r2 ;coefficients move x:(r0)+,y0 ;get x(n) move x:(r3)+,a ;a1=c0h move x:(r3)+,a0 ;a0=col do #ntaps,_coefupdat ;update coef. mac x0,y0,a x:(r0)+,y0 move x:(r3)+,b ;u e(n) x(n)+c move x:(r3)+,b0 ;b0=next c()l move a1,x:(r2)+ ;save next c()h tfr b,a a0,x:(r2)+ ;copy c _coefupdat move x:(r0)+n0,y0 ;update r0 move (r3)- ;update r3 move (r3)- ======================================================================== ;Implementation of the double precision delayed LMS on the DSP56100 ; Delayed LMS algorithm with matched coefficient and data vectors ; Algorithm runs in 4N (2 coeffs processed in each 8 cycle loop) ; Register Usage: ; Data Sample is stored in Y0 and Y1. ; Coefficient is stored in X1 ; Loop Gain * Error is stored in X0. ; FIR operation done in B. ; Coeff update operation done in A. ; FIR sum = a = a +c(k)old*x(n-k) ; c(k)new = b = c(k)old -mu*eold *x(n-k-1) move #state,r0 move #ntaps,m0 move #-2,n0 move #1,n1 move #c,r3 move #c-2,r1 clr b x:(r0)+,y0 ;y0 = x(n) move x:(r0)+,y1 x:(r3)+,x1 ;y1= x(n-1) x1=c0h do #ntaps/2,end_lms2 mac y0,x1,b a,x:(r1)+n1 tfr x1,a a0,x:(r1)+n1 ;a1=ckh move x:(r3)+,a0 ;a0=ckl macr x0,y1,a x:(r0)+,y0 x:(r3)+,x1 ;x1=c(k+1)h mac x1,y1,b a,x:(r1)+n1 tfr x1,a a0,x:(r1)+n1 move x:(r3)+,a0 macr y0,x0,a x:(r0)+,y1 x:(r3)+,x1 end_lms2 move a,x:(r1)+ move a0,x:(r1)+ move (r0)+n0 ======================================================================== ;----------------------------------------------------------------------- ; The complete code for a true LMS that executes in two instruction ; cycles per tap is shown below. A brief description of how the algorithm ; is derived precedes the LMS code. Note that the coefficients stored to ; memory are saturated (should overflow occur), whereas the coefficients ; used in the FIR filter are not saturated. Therefore, the coefficients ; stored to memory, and the coefficients used in the FIR filter calcula- ; tion, are not guaranteed to be the same. This should not be a problem ; in designs where the echo gain is guaranteed to be less than one. ;----------------------------------------------------------------------- opt cc,cex page 132,66 section FAST_LMS n_tap equ 16 org x:$0100 ref_buf dsm n_tap ;Ref_buf is a modulo n_tap buffer, containing ;a reference signal. coeff ds n_tap ;Note: Coefficients are stored in reverse order ref_ptr dc ref_buf ;data pointer for reference buffer scaled_error dc 0 ;scaled error sample from last call of echo_input norm_factor dc 0.1 ;scale factor for error signal org p:0 jmp Test_EC org p:$0100 ;----------------------------------------------------------------------- ; The following pseudo code is for the "standard" LMS echo canceller ; algorithm. ; y(n) = estimate of echo at time sample n. ; x(n) = reference input signal at time n. ; input (n) = input signal (containing echo signal) at time n. ; c(n,k) = k'th coefficient at time n. ; ; /* initialize N coefficients at time 0 to 0 */ ; ; for (k = 0 to n-1) { ; c(0,k) = 0; ; } ; ; /* LMS follows, do forever */ ; ; n = 0; ; do forever { ; y(n) = 0; ; ; for (k=0 to N-1) { ; y(n) = y(n) + c(n,k)*x(n-k); /* FIR filter */ ; } ; ; error(n) = input(n) - y(n); ; ; for (k = 0 to N-1) { ; c(n+1,k) = c(n,k) + delta*error(n)*x(n-k) ; /* Coefficient Update */ ; ; } ; ; n = n+1; ; } ;----------------------------------------------------------------------- ;----------------------------------------------------------------------- ; The following is equivalent to the above (i.e., given the same input ; signals, the error signal and coefficients will follow the exact ; same trajectories. Note the calculations are run from the back of ; the filter to the front. This saves two registers. Also note that ; the calculation order of the coefficient and the FIR filter has been ; reversed. ; ; /* initialize N coefficients at time -1 to 0 */ ; ; for (k = 0 to N-1) { ; c(-1,k) = 0; ; } ; ; error (-1) = 0 ;The initial error must be set to zero. ; ; /* LMS follows, do forever */ ; ; n = 0; ; do forever { ; ; y(n) = 0; ; ; for (k = N-1 to 0) { ; c(n,k) = c(n-1,k) + delta*error(n-1)*x(n-1-k); /* Coefficient */ ; } ; ; for (k= N-1to 0) { ; y(n) = y(n) + c(n,k)*x(n-k); /* FIR filter */ ; } ; ; error(n) = input(n) - y(n); ; ; n = n+1; ; ; } ;----------------------------------------------------------------------- ; Note that the two "for" loops in the do forever loop can now be combined. ;----------------------------------------------------------------------- ; ; /* initialize N coefficients at time -1 to 0 */ ; ; for (k = 0 to N-1) { ; c(-1,k) = 0; ; } ; ; error(-1) = 0; ; ; /* LMS follows, do forever */ ; ; n = 0; ; do forever { ; ; y(n) = 0; ; ; for (k = N-1 to 0) { ; c(n,k) = c(n-1,k) + delta*error(n-1)*x(n-1-k); /* Coefficient */ ; y(n) = y(n) + c(n,k)*x(n-k); /* FIR filter */ ; } ; ; error(n) = input(n) - y(n); ; ; n = n+1; ; ; } ;----------------------------------------------------------------------- ;----------------------------------------------------------------------- ; ; Echo Canceller Routine (Fast LMS) ; ; Upon Entry ; x1 should contain newest reference sample ; y1 should contain newest input sample ; ; Upon Exit ; b will contain echo cancelled output ; ; Note that the coefficients are stored in reverse time order. ;----------------------------------------------------------------------- FAST_LMS: move #+1,n1 move #n_tap-1,m0 move #-1,m1 move m1,m3 move x:ref_ptr,r0 ;r0 is the get reference signal pointer move #coeff,r3 ;r3 is the get coefficient pointer move r3,r1 ;r1 is the put coefficient pointer move x:(r0),y0 ;y0 contains the oldest reference sample move x1,x:(r0)+ ;store newest reference sample in reference register clr b x:(r3)+,a ;fetch first coefficient, and clear b for FIR move x:scaled_error,x0 ;x0 is the scaled error sample do #n_tap,end_fir_update macr x0,y0,a x:(r0)+,y0 x:(r3)+,x1 mac a1,y0,b a,x:(r1)+n1 x1,a end_fir_update neg b move r0,x:ref_ptr ;store get reference pointer add y1,b ;b = EC output = input - echo_estimate move x:norm_factor,x0 move b,y0 mpyr y0,x0,a move a,x:scaled_error move b,x:output_port rts ;----------------------------------------------------------------------- ;; Test shell follows ; Remote signal is an impulse train, period greater than echo span ; Input is the resulting echo signal ;----------------------------------------------------------------------- org x:$1000 output_port ds 1 ;write output to D/A org x:$0400 Remote_signal dc 0.8 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 org x:$0420 echo_input dc 0.0 dc 0.0 dc 0.2 dc 0.4 dc 0.7 dc 0.4 dc 0.2 dc 0.1 dc 0.0 dc -0.1 dc -0.2 dc -0.1 dc 0.0 dc 0.1 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 dc 0.0 remote_get dc remote_signal input_get dc echo_input org p: Test_EC move #0 ,x0 move #-1,m0 move #coeff,r0 rep #n_tap move x0,x:(r0)+ ;zero coefficients move #$ffff,x0 do x0,end_test_loop move x:remote_get,r0 move #19,m0 move x:input_get,r1 move #19,m1 move x:(r0)+,x1 move x:(r1)+,y1 move r0,x:remote_get move r1,x:input_get jsr FAST_LMS end_test_loop nop nop debug endsec end ======================================================================== B.2.17 FIR Lattice Filter ; Lattice filter benchmarks. N refers to the number of "k" coefficients ; in the lattice filter. Some filters may have other coefficients other ; than the "k" coefficients but their number may be determined from k. move #state,r0 ;point to state variable storage move #N,m0 ;N=number of k coefficients move #k,r1 ;point to k coefficients move #N-1,m1 ;mod for k's move #0,n0 movep x:<<input,b ;get input move b,x:(r0)+ ;save 1st state move x:(r1)+,x0 ;get k do #N,end_elat move x:(r0)+n0,a b,y0 ;get s;copy t macr x0,y0,a x:(r0)+n0,x1 ;t*k+s, copy s macr x1,x0,b x:(r1)+,x0 ;s*k+t, nxt k move a,x:(r0)+ ;sv st end_elat move x:(r0)-,y1 move x:(r1)-,x0 movep b,x:<<output ;output ======================================================================== B.2.18 All Pole IIR Lattice Filter move #k+N-1,r0 ;point to k move #N-1,m0 ;number of k's-1 move #-1,n1 move n1,n3 movep x:<<input,a ;get input sample move #state,r3 ;pt to x() move x:(r0)-,y1 ;y1=k3 move x:(r3)+,x1 ;x1=s3 macr -x1,y1,a x:(r0)+n0,y1 ;a=in-k3s3;y1=k2 move x:(r3)-,x1 ;x1=s2 do #n-1,endlat macr -x1,y1,a b,x:(r3)+ ;a=a-s2k2=t2;update s3 move x:(r3)+,b a,x1 ;b=s2 macr x1,y1,b x:(r0)+n0,y1 x:(r3)+n3,x1 ;b=s2+t2k2;get s1,k1 endlat move b,x:(r3)+ ;sv 2nd last s move x:(r0)+,y1 ;update r0 move a,x:(r3)+ ;save last s movep a,x:<<output ;output ======================================================================== B.2.19 General Lattice Filter move #k,r0 ;point to coefficients move #2*N,m0 ;mod 2*(# of k's)+1 move #-1,n3 movep x:<<input,a ;get input sample move #state,r3 ;pto filter states move x:(r0)+,y1 ;get first k move x:(r3)-,x1 ;first s do #N,el ;do filter macr -y1,x1,a b,x:(r3)+ ;t-k*s, save s move x:(r3)+,b a,x1 ;get s again macr x1,y1,b x:(r0)+,y1 x:(r3)+n3,x1 ;t'*k+s,get k& s el move b,x:(r3)+ ;s 2nd to1st st clr a a,x:(r3)+ ;s first state move x:(r3)+,x1 ;get last state rep #N ;do fir taps mac y1,x1,a x:(r0)+,y1 x:(r3)+,x1 macr y1,x1,a x:(r3)+,x1 ;finish, adj pter movep a,x:<<output ======================================================================== B.2.20 Normalized Lattice Filter move #c,r0 ;point to coefficients move #3*N,m0 ;mod on coefficients move #0,n3 movep x:<<input,a ;get input sample move #state,r3 ;pt to state move x:(r0)+,y1 a,x1 ;get first Q do #n,endnlat mpy x1,y1,a x:(r0)+,y0 x:(r3)+n3,x0 ;q*t; get k & s macr -x0,y0,a b,x:(r3)+ ;q*t-k*s,save s mpy y0,x1,b a,x1 ;k*t, set t' macr y1,x0,b x:(r0)+,y1 ;k*t+q*s, get q endnlat move b,x:(r3)+ ;sv scnd lst st move a,x:(r3)+ ;save state clr a x:(r3)+,x1 ;clr acc rep #n ;do fir taps mac x1,y1,a x:(r0)+,y1 x:(r3)+,x1 macr x1,y1,a x:(r3)+,x1 ;rnd, adj pter movep a,x:<<output ;output sample ======================================================================== B.2.21 [1x3][3x3] Matrix Multiply move #AD,r3 ;point to mat a move #bd,r0 ;point to vec b move #2,m0 ;addrb mod 3 move #c,r2 ;point to vec c move x:(r0)+,y0 x:(r3)+,x0 ;y0=a11;x0=b1 mpy y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;a11*b1 mac y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;+a12*b2 macr y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;+a13*b3 move a,x:(r2)+ ;store c1 mpy y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;a21*b1 mac y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;+a22*b2 macr y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;+a23*b3 move a,x:(r2)+ ;store c2 mpy y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;a31*b1 mac y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;+a32*b2 macr y0,x0,a ;+a33*b3->c3 move a,x:(r2)+ ;store c3 ======================================================================== B.2.22 [NxN][NxN] Matrix Multiply ;The matrix multiplications are for square NxN matrices. ;All the elements;are stored in "row major" format. i.e. for the array A: move #AD,r0 ;point to A move #bd,r3 ;point to B move #c,r2 ;output mat C move #N,b ;array size move b,n3 do #N,erows ;do rows do #N,ecols ;do columns move x1,r0 ;copy row A move r1,r3 ;copy col B clr a x:(r0)+,y0 move x:(r3)+n3,x0 ;clr sum & pipe rep #N-1 ;sum mac y0,x0,a x:(r0)+,y0 x:(r3)+n3,x0 macr y0,x0,a x:(r3)+,y1 ;finish, next col move a,x:(r2)+ ;save output ecols add x1,b ;next row A move b,x1 move #bd,r1 ;first element B erows ======================================================================== B.2.23 N Point 3x3 2-D FIR Convolution ;The two dimensional FIR uses a 3x3 coefficient mask: ; ;The image is an array of 512x512 pixels. To provide boundary conditions ;for the FIR filtering, the;image is surrounded by a set of zeros such ;that the image is actually stored as a 514x514 array. i.e ; ;The image (with boundary) is stored in row major storage. The first ;element of the array image is image(1,1) followed by image(1,2). The ;last element of the first row is image(1,514);followed by the beginning ;of the next column image(2,1). These are stored sequentially in the ;array; "im" in d memory. ; ;Image(1,1) maps to index 0, image(1,514) maps to index 513, ;Image(2,1) maps to index 514 (row major storage). ; ;Although many other implementations are possible, this is a realistic ;type of image environment;where the actual size of the image may not ;be an exact power of 2. Other possibilities include storing a 512x512 ;image but computing only a 511x511 result, computing a 512x512 result ;without boundary conditions but throwing away the pixels on the border, ;etc. ; r0 -> image(n,m) image(n,m+1) image(n,m+2) ; r1 -> image(n+514,m) image(n+514,m+1) image(n+514,m+2) ; r2 -> image(n+2*514,m) image(n+2*514,m+2) image(n+2*514,m+3) ; r3 -> FIR coefficients ; b -> output image move #mask,r3 ;pt to coef. move #-8,n3 move #image,r0 ;top boundary move #image+514,r1 ;left of first pixel move #image+2*514,r2 ;left of 2nd row move #512,y1 move #-1,n1 ;adjust. move n1,n2 move #output,b ;output image move x:(r0)+,y0 ;y0=im(1,1) move x:(r3)+,x0 ;x0=c11 do y1,rows do y1,cols mpy y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;im(1,1)*c11 mac y0,x0,a x:(r0)+n0,y0 x:(r3)+,x0 ;+im(1,2)*c12 mac y0,x0,a x:(r1)+,y0 x:(r3)+,x0 ;+im(1,3)*c13 mac y0,x0,a x:(r1)+,y0 x:(r3)+,x0 ;+im(2,1)*c21 mac y0,x0,a x:(r1)+n1,y0 x:(r3)+,x0 ;+im(2,2)*c22 mac y0,x0,a x:(r2)+,y0 x:(r3)+,x0 ;+im(2,3)*c23 mac y0,x0,a x:(r2)+,y0 x:(r3)+,x0 ;+im(3,1)*c31 mac y0,x0,a x:(r2)+n2,y0 x:(r3)+n3,x0 ;+im(3,2)*c32 macr y0,x0,a x:(r0)+,y0 x:(r3)+,x0 ;+im(3,3)*c33 move a,x:(b1) inc24 b cols ; adjust pointers for frame boundary move #2,n1 move n1,n2 inc b x:(r0)+,x1 ;adj r0 inc b x:(r1)+n1,x1 ;adj r1 move (r2)+n2 ;adj r2 move x:(r0)+,x1 ;preload move #-1,n1 ;adjust. move n1,n2 rows ======================================================================== B.2.24 Signed 16 Bit Result Divide ;This is a routine for a 4 quadrant divide (i.e., a signed divisor and ;a signed dividend) which generated a 16-bit signed quotient and a 32-bit ;signed remainder. The quotient is stored in the lower 16 bits of accum- ;ulator a, a0, and the remainder in the upper 16 bits a1. The true ;(restored) remainder is stored in b1. The original dividend must occupy ;the low order 32 bits of the destination accumulator, a, and must be a ;POSITIVE number. The divisor must be larger than the dividend so that a ;fractional quotient is generated. abs a a,b ;make dividend positive move b,x:$0 ;save rem. sign in x:$0 eor x0,b ;quo. sign in N bit of CCR andi #$fe,ccr ;clear carry bit C (quotient sign bit) rep #$10 ;form a 16-bit quotient div x0,a ;form quot. in a0, remainder in a1 tfr a,b ;save remainder and quot. in b1,b0 jpl savequo ;go to savequo if quot. is positive neg b ;complement quotient if N bit is set savequo tfr x0,b ;get signed divisor move b0,x1 ;save quo. in x1 abs b ;get abs value of signed divisor add a,b ;restore remainder in b1 bftstl #$8000,x:$0 ;test if remainder is positive beq <done1 ;branch if positive move #$0,b0 ;prevent unwanted carry neg b ;complement remainder done1 ======================================================================== B.2.25 Signed Integer Divide ;Registers usex: a,b,x0 ;Output: Quotient -> a0 move #dividend,a ;sign ext A2 move a2,a1 ;and A1 move #dividend,a0 ;move into A asl a ;prep divide move #divisor,x0 ;divisor into x0 abs a a,b ;dividend pos andi #$fe,ccr ;clr the carry rep #$10 ;16bit quotient div x0,a ;form quot. a0 eor x0,b ;save sign in N bpl <done2 neg a ;comp.bit is set done2 ======================================================================== B.2.26 Multiply 32-bit Fractions ;This routine will execute the multiplication of two 32-bit FRACTIONAL ;numbers that are already stored in memory as follows: ; r0 -> X:$Paddr P0 ; X:$Paddr P1 ; r3 -> X:$Qaddr Q0 ; X:$Qaddr Q1 ;The initial 32-bit numbers are: ; P = P1:P0 (16:16 bits) ; Q = Q1:Q0 (16:16 bits) ;The result, R, is a 64 bit number that is stored in the two ;accumulators A and B as follows: ; R = R3:R2:R1:R0 ; = A1:A0:B1:B0 (32:32bits) ; = A2:A1:A0:B1:B0 (sign extended) move #paddr,r0 ;init pter for P move #qaddr,r3 ;init pter for Q nop move x:(r0)+,y0 x:(r3)+,x0 ;P0,Q0 move x:(r0)+,y1 x:(r3)+,x1 ;P1,Q1 mpyuu x0,y0,a move a0,b0 ;b0=P0*Q0=R0 dmacsu x1,y0,a ;a=P0*Q1+a1 macsu y1,x0,a ;a=a+ P1*Q0 move a0,b1 ;b1=R1 dmacss x1,y1,a ;a=P1*Q1+ a1=R3:R2