How to get a fast 2x2-FLI routine by Wolfram Sang (Ninja/The Dreams - www.the-dreams.de) ====================================================== Shortly before X2004, Oswald/Resource asked me if I could do a 2x2-FLI routine, which is fast enough to have some extra cycles for the main routine while displaying the FLI. That was an interesting problem, so maybe the things I came up with may be inspiring for you, too. To see the routine in action, have a look for the demo "REAL" by Resource and The Dreams. If it wasn't for such a routine, no one would probably have dared to do a tunnel and the julia-effect in such a resolution, because it would have been awfully slow. Okay, now enjoy the article! Preface ------- First of all, be aware of the terms "interrupt", "IRQ" and "NMI". When I say "interrupt", I mean interrupts in general. If I say "IRQ" and "NMI" I mean this specific interrupt. We need to have this strict. Furthermore, it will be helpful if you know (at least in theory) stuff like how to get a stable raster using a VIC-IRQ and a timer. You should not be afraid of timers, in general. The task -------- We need to do FLI on every 3rd, 5th and 7th line of every charline (counting from 1). The 1st line is handled by the VIC automatically, of course. Doing this full-screen, we will end up with 75 interrupts per frame altogether! That leads to the conclusion that the interrupt doing the FLI must be as fast as somewhat possible. Every cycle saved here gives us 75 additional cycles per frame for the main routine. So what can be done? Double Timer ------------ Of course, for doing FLI we need a stable raster. The fastest way of doing this is using the double timer method. Look at 4x4-routines which use the VIC-IRQ. As this interrupt occurs always at the beginning of a rasterline, they usually need some NOPs to throw away cycles until the correct position for FLI is reached. Horrible! Using a double timer, we can set the beginning of the interrupt anywhere we want. Okay, measuring the correct position in the init-routine can be nasty at the beginning, but keep at it, the result pays off. Of course, here we set the interrupt in a way we reach the FLI-position "just in time". Keep in mind that you always have to use a CIA-detection routine, when using timers for stable rasters. New CIAs initiate an IRQ one cycle earlier, not taking care of this can lead to crashes. You can find an example in my source-code, if needed. Using NMI --------- To get the desired three interrupts per charline, we can use two timers. One fires every 4th rasterline, doing FLI on the 3rd and 7th line of a charline. The other timer fires every 8th rasterline, doing FLI on the 5th line of a charline. If we now use the timers of CIA2, which trigger the NMI, we have an elegant way to keep boundary checks out of our FLI-routine. Again, check some 4x4-routines which often check inside the FLI-IRQ if a certain rasterline has been reached, so displaying FLI has to stop. We can do it differently now: We use VIC-IRQs to allow/forbid the timers of CIA2 to trigger NMIs, which is equal to start/stop displaying FLI. So, the boundary checks are within two tiny VIC-IRQs instead of the 75-times-per-frame-called NMIs. Making stable ------------- Okay, now we already gained some cycles between two NMIs, but we need more. What is left to optimize? The routine to get a stable raster. You probably know routines like this: LDA $DC06 EOR #$0F STA self_mod+1 self_mod: bpl * ... They take around 22 cycles at maximum to get a stable raster, so here is a lot to win. Though, we have to pay a price. Usually, if you want a *very* fast routine, you need to sacrifice memory (and vice versa). So, for a significant speed-up we need 8 pages of memory. How is this achieved? We will use another timer to tell us the position within a rasterline. The trick is now to use the result of this timer as a part of a jump-instruction, so for every value in this timer, there will be an appropriate interrupt-routine. In detail, the timer at $DC06/7 runs from $003e down to $0000 (rasterline-x-position). The timer at $DC04 does not run and will just serve as simple memory. It stores the jump-opcode $4C and the low-byte of the desired NMI-routine. Now we set the NMI-vector to $DC04 and according to the value in $DC06 this or that or the other FLI-routine is used. As the jitter can be 8 cycles, we need to have 8 pages of NMI-routines. We need to have 3 different interrupts per charline, so every page must have 3 appropriate NMIs to make FLI in the desired rasterline. That leads to 8*3=24 NMI routines in those 8 pages. When using interlace inside the same VIC-bank, this value doubles to 48! So, it will get quite messy in that memory, but it is fast. As every jitter-value gets its own NMI-routine, we have another small bonus. We don't have to use NOPs to clean the jitter, we can use "sensible" opcodes. For example, if the jitter is at least 4 cycles, you could acknowledge the NMI by using BIT $DD0D before the FLI takes place. If the jitter is below 4 cyles, you simply do it after the FLI. In both cases, you didn't waste the cycles. As a result, this version to make a stable raster just needs 6 cycles in the worst case (3 for the JMP and 3 to clean the maximum jitter). Comparing this to the 22-cycle-routine before, this is a gain of 16 cycles * 75 interrupts/frame = another 1200 cycles/frame. Yeah! The outcome ----------- For the best case, the FLI-routine now looks as simple as this: DC04: JMP $xxxx xxxx: sta nmi_a ; save accu lda #d018val sta $d018 lda #d011val sta $d011 ; do FLI bit $dd0d ; acknowledge NMI lda #next_nmi ; low byte of next NMI-handler sta $dc05 ; set it lda nmi_a ; get accu rti ; exit from NMI Not much left to gain anymore. The routines in the other pages look very similar, of course, just with added cycles for the jitter. The other routines in the same page have simply other values for $D018 and $D011 (and one has to set them twice to activate the first line of a charline, of course). These routines should give you about 40% of the cycles back, compared to doing FLI all the time. If your effect does not use full width, you could put it on the right side of the screen and initiate the FLI some cycles later. If you just use the right half, you will be at 50%. Ninja version ------------- I did an implemenation which is ready to use (...please, give credits, blabla and such... well, this routine is easy to identify, anyway). It implements the aformentioned ideas plus some more. For example, it saves some more cycles by using the same value for $D011 and the low bytes of the NMI handlers (skipping lda #next_nmi from above). Another neat thing is the use of the y-register inside the NMI-routines instead of the accumulator. As we now just need load and store instructions, this is easily possible. The benefit is, that the next opcode after the FLI is now always STY ($8C). So, the FLI-bug has light grey as screen-ram color, and grey as color-ram color. That is a color combination you can atleast work with a little (did somebody notice the anti-aliased logo next to the julia-routine in "REAL"?). A little bonus is that it opens the upper/lower-border for free, so to say. You will find two different versions on this disk. One for the 2x2-mode without interlacing. One for 2x2 with standard interlacing. All these routines need one zeropage-location (default = $02). The 8 pages containing the NMI-routines reside from $0800-$0FFF (the files itself are a bit shorter). Keep in mind that relocating them means also re-adjusting the NMI-timers, because they must invoke NMIs only when the timer at $DC06 gives the correct high byte for the JMP-instruction! The code in these 8 pages is already terribly fragmented, that is why I fiddled the init-routine for the 2x2-mode inbetween the gaps. So, you don't lose another 2 pages for that, at least. To use the 2x2-modes, simply JSR $0CDC and you are done. They don't initialize $d016 and $dd00, that remains your job. They do set $01 to $35, however. In the IRQ-routine for the lower-border, you can find around $0F00 a BIT $1003, which you can easily change either to a music-call or to your own subroutine, in case you need something done once a frame. For more advanced changes, I strongly recommend using the source-code (to be assembled with "AS"). Even there, I must say it is pretty easy to spoil things. Think at least twice before making changes other than changing the options at the beginning! For a maximum of flexibility you won't come around doing your own version, anyway (not that you wouldn't know that). Still, I hope my source serves as educational material even without comments. Well, you have this text as a guide and you can write me an email if you have further questions, or need a special version of it or so. You are hereby encouraged to write me, if you have comments or ideas for further improvements! That should be all for now. I hope this article was a little enriching to you. To be complete, I was not the first who used the timers as part of the opcode (and I never claimed to be). At least Kjer/Horizon used a JMP ($DC03) in the Demo "A Load of old Shit". Still, I developed the routine and ideas from scratch by myself and I am quite proud of it. Okay then, happy hacking and keep the spirit! === uuencoded binary begin 644 2x2-fli.zip M4$L#!!0``@`(`!-Z*S+'NKZ"918```"K`@`+`!4`,G@R+69L:2YD-C155`D` M`\;?XT$/N"-'57@$`.@#Z`/LG`UP&]6=P)^DM2W)EK0KR;'B`ED2X42)'239 ML1T3$IDX`5,GD(\"P[5U].58($M&5HA]5SX"\9!2*1?*E$E:W!,^N$,W9)K, MW?7JF_B@%#))AH)ZX_;:0TI22E/P,<6ET'*)\=Y[N[)L:W<=*\'0"_]?9*WV M?>_;__M_/*V"$``````````````````````````````````````````````` M```````````````````````````````````````````````````````````` M```````````````````````````````````````````````````````````` M```````````````````````````````````````````````````````````` M```````````````````````````````````````````````````````````` M`````````````````````````/"7!*T1CGN5HXE$S))*#,>85+4N$RM*/ZMT M3>4=('E+Q'GS#^G]".E]7:[WQ,/DW('/\3@&2]XSIE(7$II]J4]^AM[!)V/* MGR'^='1T5#A_]3S"']%_[^E^[790[K,IV=G6VG2V,- M.CS#4X)`YIK+WNG8-CXO)PC3\^:?V)M\[SE!F"$%>!R;6S??VLS>V;R-O67# MU@TK5ZZ4;"5^E1ZWDGPTKLO$%Z:2F?@$:````+ZP^I_)M*75O(9M+!6Y`),* M/K:]5.0"?';*/Y8NG<4%P.-X236F3?X\7EZ:M,9'U8-4(KW_/!>[P"7;XE0Z MN2Y>E$[2<7UZ<&T"[2].QTK2@Q?V4[B5S."?O_#WW\0E/XX;4\GLW<3>5KX' M,-T/S/<`/EL_4-X#P..(3^B3*L&PQR>X_>/ZV'E]THXOK5I%)[A!+OG9.2L` M``#_'_2_ZL3^XDRL)-.6T>Y5RJI^).1]3JH_V[NLZN]-KNI7##XD1+))H#)?>`%X3>1M-1I&Y38!8ML-HY_? M)D`L;1B==1,`EG/!,&J%NE1-H>_7UZ^JGV7]R6X.DKS/;W.0]#[;YF"B@FSU M'3)D6+/FELZ?OQDWI),/$0]1ESZL2P^J$HRFN!F9N`MTR$T,OQ#XP.W M,@,_-!Y2C.S`ET.2WJ#)R2VN'0?5"+_N-E4]/\1H+[Y'>B5'B$PIVG'[0K2' M;\;P?B-"WV87Z'^G(*2G+'G6&7S>K\26[FIYIOH4T5];#J)&2NL;P?L7-13C!$N'P4D.*Q0[A MB![DZ^E1=T6V_^864K^>U;2X;G*UHX?+^?13"6%\FH>O)0<77][`T5P6A:)$ MH2E1H!(%AU3*$DI5I"K.';5*M0J5<"K$/8R0^T7,,>YQZ]^5*,E'EBG+GTOQ M'NF5'"$R.N[38@&F`F.Y*`O%[/SI?_V1NVJE:ZOI:M?+KQP_<>JUUU/_.?*+ M7_YW^O39M]X^]\[H>[\?^^##/WU\?GP"2U2Q6ENFIXWF!9;*JZYA%UO1\N4; M-K>PMVUD-WUE6^OZY/(%S]*T8"]96)^F] MX\\/-9BQ'L1*9>A59N`-&FN49\>/GA_`.FC1P./TT?/VQ5@9;<#U!VYFR/NM M3-MNIJ6E78/0\>/'A4[V>XPX*]YAM!FTC>_2\00Y394DQ^(5(RE-_%[C\#\; M\<S`,"ETDDLY3[]+TRN&A\S]YX>/F?LO''N,/B'D*!/HX,!C M>`P#<9IT?8ZV&0;>I4GJ'GJ'8@JE@(I`88J*BHJ+BTM*U&H-8UBBOG*MFW[_ M!!?C.'X;)+DZSJ22BGAEZJX=T^X_??$RO(9MT,BI?A3;IOG\5#^*O:F95?5? M1/Z9@UBPAIXG@M7"'#U_(Y;`O42:[)3E@9.T[:&XS8S/QE3Q[_/R.'&@;(Q* MH+%B;-S'OO/)P8$6)K%16"/"`MFQ8&GCE]M['G_FWT?&]#6;[WMJ^'>FFZ+) MLU^Z^WMO5?_-ZTO[W]GRXX9_:7CESH^>WJQ)_?TCH<[0OA_\S^JG3<]LI-*I M-/WU7T0:JVZ*J7^4^!%[[(6WN]OV6NAN5[>K]ZX39P^J[>S*G8$7?_GA==>O MWM@>Z'D@_MU__+>S9K.Y?!J+_^'H*:R*]F00\S1BOH>8)Q#S+<3L0PY@UPO(-?SR/4TDV'39T0O/7)A`5)*N4:13'_1B-2R(6D._(JF.H]/! M<2Z!]EY(NOK'DWN?N'`"OY+7"L?4[\^-/S=^!<\DWV-V5R1!Y^QIYSM"N;SSZK2>??NX'0Z^D,J,??_+.L5^9]HZ3<,25^CI6A$/_ M48Y5Z8OE.&HY5#SRW/E#12,X>K%-X/@%QTHOE6/%.O3CJL@[[80@I^4M3*!`?*J6VXV+TV.;3 MFQBZ=0*W/6RTJ9X?^HF1A'JTXDJV;G.X_\H"O.LK\`DA1O4";P?Y=\1Q$WMJ M4/,@VF'?(IG-/5*T[CG@78,^: MO0;EM]=03ZY9=/11ZY-*I2B!>VRMIHE:K6_2K-4T4VOYX\SS1@WV'O0-^+B& M:N2/XG,[?VR2/%^M644Y]:OP<0UN=Y7H?+:@E2D6363;+73;M70OBW8GN9\2 MR6!5NP\KTC8Z]7[<,H(#@T'T^@':7L96&=GKC&W7&7$S.&8X0+,NFEU'OW/Z M<=K>UE9E3"["Q?D8SV:(>XR#!AR5G:+/#RJG8KD$R@_>3M&_GF'H%$B%CL7I MER92&MZ68!NR8\BW$-N9JH789=]#(E`6V\[U?#B:G,`1)'N0;GN,025Z2\/Z MN^O55.7:DXD#"_C\(^>08K)A%7EIC8IF5,YQBR[US_#11_P>#3/.E4R.6?OV MV\4FCBN9_,,3I)+Z8SA.D?^W\;:M[!V;$].PVQ/.9O)AGM;-(PQRWN7]SL]>SV3C_;$O&[NWK80(AUVNUU.BWYQ\X@$/(&=_G\;$_4'?*Y M([Z5WOIL.7^OWSNMX(VLG1QN8-W1J-O;R=ZW*^"]=VEHJ2\0B?:Q47]/M"8B M#$*G#82B_DC0[?7[9E;-I:_3:;WA4$^TW==HM[=[P\&I,E:[EXS72G+8<(@E M5^?9M9/UW^\/L;L[\9NI=&I4_F"//]"1?U%LT-?'+@EU!3KQ107]D?Q\?\@G MJB25AF>EC\7-K'#D900Z6%]XER?HSZ_`=^RS.QKO=P=KK+5VJ0:M)%^RXC*< MX\`U5SAM50W?"'?[0^V><,3GC[3CQ%P-4]EE/-T9Z\+)(CW`5XKMPUDB/2!D M'<99L^@!D?#/Y^__3+JN0'LPW)2=#D]?E)6]U=F[*D[MT@FKJ.9R8+<3L5\? M]OE%76`)(0M(O.#"D9U8QAOM6*3PFHM$F_(+N'T1OH"GNBX_BURJ=;5)CR^W MO:\K[!/)I?1RR$IQ^U]W]\U!Z/'(E_G\07=?E=.6WU(HW%U0"W6B%CP!?`D^ MG]TWE^4W?35-Y9D,^>7NZ<%3YK#;[>(&W'QOH@PL+.P2:X=7?'?<[!*G>`)E MFB'M+[&*5[-0P>ZHEQII(!2(8KV--2R1V)Z>)@GUY6:763L\MNH^B3P3S:^P MPI]_B`4U,DL?VSV-S-+'=D]3T-*?]^ M@-@9NNC3S0L,TY]N-AMB1H/@^+@V65)Q2^IP4;HU@'7AX06I5F5\@?"[6)-Q MRKYB0SK-3`C&UB$84I+.[NKN]D>N#X9WXTRAS#JBDK+:(&?:K78G7^=N?R1\ MNWNGO^9^=R3@QO:-7>8@6L=OX^O,<`>L/J_@4D3V\*IO,(&9/2`QWYTIW M-)*RT4`7MF;$6\B.M7XR+4C<%SZMOK;&,9G:$0PX:OG4977+ZVMM),=4E&=, M'07I)(=P+;P";9+4.-:-&^>D5TGAN>M5-A(-2)F)/$>HJHI=-F7.UM8X;'S2 ME#X-F127]72GD"PE^MDL*='/9LU=].?Q]W^F$G_U8O+;I\75=O)/QL9*V0// MI+N[N"?@^VH@U$%T<=]*GSNZN-HIUX[;SG?1XXVT=X>B4_+?X1'7P&(BI?)[ M_-%=W9+&PRYC.IRR&0Z3:BZ^9-:5E'-`)3S:!Q_\5*PV66DS!5@4 MZ`J$=LH%W6Z?:`RS<0->"O?R.V/$$<8.;90(:ZX7B9'VLDM$(_-V]Q9T0WCC MO3H_U>>?[U7(8*9%NQ1XC[ZP[B MPZYNFYPL>7U>LL_)1_)-86O5DOOZ!RJ[-^VDJM M;RILV6_C@UE^J#VL.XJ=U4C$[\7'/F_0OU+RTE;)M=7L\PGU>MA.?P3'TQY_ M1Q@?O0%W.PY`;1(+?9:11<-X-LR&]D#D/MF=12*8_)HFI:37M;-V-OTR0S\Y M+KX%F-/P'>+M*F+.(E&1_]S=Z18E!7%2=E:<33D#;FVY*=^5-],2PL@^>*]4DMH18]$22 MY_'?QX:#/J%=J1GPY>M,LW%V"?7R]LUNE]@]#81ZYVZ?\OI#$>BU>P]NWJBN-$._VZ) M93B+628[Y#ULT-\1O39GPYJF[9RUW%1P="5A2B=5<+EJRA?K*-"S(?79UJU; M"MN]LCM6RSF#,FM),D,0&/XKB>S^@>QZFJV0F>+]_$F[*1V'83O3$7%W.9H^ M'?7$MS47Y31-Q8M,C:<[*-T0[R')?.60_9IXVAWOD;NYL.E):/?86K![-Q1)&42+IDHU]G92QYQ?Q5!%I0R\. MFB[+TDM?0;EZSJMTTI;P7P#*K;E)$R6]HS/Y19YN$MF6J09K.ZJMGER#CNKLMNK_M6__O$T#40#`B\0&`TMV4V6C#(E% MFV9E8H#O4()-*]&F:H*4?F:^!.?$^4/OSG4EMOQ^H^4XL7OWWIW?:_\-Q_;[ M/IWW^[[O9^$Y[F]@>)&]@XM1<\5C[W\5.C\+<[%>8%"'\3[N303L`?JQ?8`[>9MI?REY5Z_6*I#D[$>`_7U_= M_:R*Y?7-HJCG#\7M[\7-[&/8#SZ&7U`LYNDW;IMI^3A]6D+?'%]-#U?E^^/_ MQHLZ6RZ/7N;I`.:HX]^[J\Q+V[)W[:5[79^.K*LV_ZX^C%)AMXT"F9_VXM"; M3[$G?Y[/J6%14!WFU,DLEU,GL]ZK@GJ\7Q4,XY-W]8?-2?MBP++8'(D?S:X1 M]>G1HGU61GMB_)_$=<33J)G^]#\4PYN=L[=HX_M/G6FJV M=;#\7!Z\>N%=?/OZ)7$7MV%BK>Z+>9-YXQ$;;60>JOME,BLRQ[J^@F46````JP(`"P`-````````````I($`````,G@R+69L:2YD E-C155`4``\;?XT%5>```4$L%!@`````!``$`1@```*,6```````` ` end