The main difference is that 32 bit assembler is not cursed with the problems
and restrictions of segment arithmetic and is both much clearer and far more
powerful."
08 May 98 | SLH | ~ | hutch1.htm | The supression and resurrection of assembler programming. | papers | ~ | fra_0116 |
16 May 98 | SLH | ~ | hutch_61.htm | The Bridge: In Pursuit Of Lost Knowledge | papers | ~ | fra_011C |
21 May 98 | SLH | ~ | hutquest.htm | THE QUEST: Building the launch pad | papers | ~ | fra_011F |
29 May 98 | SLH | ~ | hutch28.htm | Software warriors through the warp | papers | ~ | fra_0123 |
05 June 98 | SLH | ~ | hutch_65.htm | The Eye Of The Warrior (a "graphical" windows' paper) | papers | ~ | fra_0128 |
10 June 98 | SLH | ~ | hutchif1.htm | The iron fist (Keeping The Crackers Amused) | papers | ~ | fra_0129 |
15 June 98 | SLH | ~ | hutsting.htm | Applying the sting | papers | ~ | fra_012C |
SLH > END GAME > > Applying The Sting > > For the modern software warrior who has done their penance and paid homage > to the great god of visual garbage, its time to put aside the trivia of > endless interface graphics and wield the weapon of war in the way it was > intended. A minimum size, fast and efficient template, the mastery of > resources and controls and the easy manipulation of the interface in terms > of its appearance is a formula that is without peer in the world of software > design. > > With some variation between compilers, this package should compile at around > 20k depending on how much space is taken up in the resources which is one > hell of a lot better than 200k or worse that you get from the RAD vendors. > > How you proceed with it depends on what type of application you have in > mind. If you are writing small specialised applications, then you will > probably add different modules which group related functions together so > that the project is well laid out and easy enough to extend and maintain. > > If however, the application you have in mind is going to be a multi-megabyte > monster, then you have to think about what is the most efficient way to > write it. You could in fact write it the same way as a small specialised > application but even if it is well written, fast and compact code, the load > time and the memory demand will tend to make it sluggish, particularly if > the machine it is run on does not have large resources available. > > One of the few good things that came out of the early versions of windows > was dynamic link libraries. When you consider that the early versions of > windows would run on a 286 with one meg of ram, albeit, only just, you > understand where the need came from to be efficient in the use of system > resource. > > Dynamic link libraries solved one of the problems that was left over from > DOS application programming, how to get capacity in excess of the available > memory limitations. The idea of having a separate file that carried a > collection of useful functions that could be called on demand meant that > you could have a relatively small executable file that called what it needed > and unloaded it when it was finished. > > In a modern 32 bit application, this equation is still in the leading edge > of design in that whatever the viable size may be for either the operating > system or the specific hardware that it is being run under, you can deliver > more capacity than the immediate size will allow by writing the core of > your application as a fast minimum size executable and farming out the > occasional functions or objects to a DLL which is loaded on demand and > unloaded when finished. > > This extends your application's size and capacity to nearly the size of > available disk space if that is what you need to write. It also allows you > to write miniature rockets that do the same thing so that they minimise > resource demand while delivering extended capacity. > > What follows is a bare skeleton of the DLLMain function in C. There is a > reasonably broad range of C compilers that will do this job well but you > will need to chase up the specifics of each compiler to ensure that you get > it right. The nod is that not only are the M$ and Borland C compilers up to > scratch to do the job but the Zortech and Symantec C compilers are also very > grunty in the area of EXE and DLL creation. > > For the warrior who likes their code written with the flavour of bubble gum, > the PowerBasic DLL compiler is an industry standard specialist DLL compiler > that gives nothing away in terms of performance so you are not at any > disadvantage at all to warriors of other languages. > --------------------------------------------------------------- > BOOL WINAPI DllMain(HINSTANCE hDLLInst, > DWORD fdwReason, > LPVOID lpvReserved) > { > switch (fdwReason) > { > case DLL_PROCESS_ATTACH: > break; > > case DLL_PROCESS_DETACH: > break; > > case DLL_THREAD_ATTACH: > break; > > case DLL_THREAD_DETACH: > break; > } > return TRUE; > } > --------------------------------------------------------------- > The DllMain function is like the WinMain function in that you don't call it, > windows does when you call the DLL. The second parameter [ fwdReason ] is > the one that you will process when you are starting or terminating processes > within your DLL. > > It is the processing of the DLL_PROCESS_ATTACH constant that determines > whether the DLL loads or not. If your nominated processing returns "TRUE", > then the DLL loads, if it returns "FALSE" the DLL unloads. The other three > return values are ignored. > > The DLL_PROCESS_DETACH constant is where you perform any actions that you > want to happen as the DLL unloads. This can be deleting resources, closing > windows or any other process you like. > > One of the tricks when designing a DLL is to put an ordinary system dialog > box in each one of the four constants that can be processed as it gives you > an intuitive feel for what is happening when you either load or unload it > from your application. > > When you write your DLL functions in C, you need do no more than "export" > it so that it is available to a calling application as C is case sensitive. > A C DLL function skeleton looks like the following, > > DLLEXPORT BOOL WINAPI MyDLLFunc(HWND hWnd, LPSTR lpszText) > { > Write your function code here > } > > With basic, you must do a little more work to ensure that the function name > that occurs in the export list of the compiled DLL is in the form that you > want. You use ALIAS to do this. > > FUNCTION MyDLLFunc ALIAS "MyDLLFunc"(ByVal hWnd as LONG,_ > lpszText as ASCIIZ) EXPORT as LONG > Write your function code here > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > END FUNCTION > > In the application, if you want the DLL at startup, you declare it in your > header file as follows, > > DECLARE FUNCTION MyDLLFunc LIB "MYDLL.DLL" ALIAS "MyDLLFunc"_ > (ByVal hWnd as LONG,lpszText as ASCIIZ) as LONG > > In C, you put the following into your header file ensuring that the data > types match your function call. > > DLLIMPORT BOOL WINAPI MyDLLFunc (HWND, LPSTR); > > To check the exports in your compiled DLL, you use dumppe.exe to make sure > that your function name has compiled the way you want it because if you get > the case wrong, your app will not be able to find it when it is called. > > There are basically two way that you load a DLL, either at startup as is > done in Visual Basic or by using the LoadLibrary function. In terms of > performance, there is little advantage in loading a DLL at startup as it > increases the load time of the calling app to that of simply writing the > code in the DLL into the app directly. > > The only reason why you bother if you have the option to do both is to use > a common block of code between a number of different applications that you > may wish to run at the same time. > > Using the LoadLibrary() function and the corresponding FreeLibrary() > function gives you the capacity to only use the resources occupied by the > DLL when you need its functions in the calling application. > > Encapsulated Applets > ~~~~~~~~~~~~~~~~~~~~ > One of the smart things you can do with dynamic link libraries is write > complete working applets within them so that you can call a complete > "object" from you application. If for example, you write accountancy > software, you may want a specialised calculator that not only does the > normal number crunching but other profession specific functions. > > One of the trick ways of writing a DLL of this type is to write it as an > EXE first because they are far easier to debug in that form. There are a > couple of tricks involved in doing this as you need to be able to pass the > applications "hInstance" and "hWnd" to it so you write the WinMain function > with an eye to modifying it later. > > When you have the applet up and running as an executable file, you convert > it to a DLL by first writing a DllMain or LibMain function as we have seen > above, Then you modify the WinMain function in it so that it is a function > call that is exported. > > FUNCTION SmartCalc ALIAS "SmartCalc"(ByVal Instance as LONG, _ > ByVal hParent as LONG,) _ > EXPORT as LONG > > Both Instance and hParent must be declared in the header file for the DLL > so that they can be "seen" all over the DLL. You "assign" the passed > Instance parameter to hInstance, > > hInstance = Instance > > and your applet then uses the instance handle of the calling application. > > If the applet has a simple interface that does not call any dialog boxes, > you may not need to use the hParent parameter but if you call any system > dialog boxes you need to use the hParent rather than the applet's "hWnd" > to avoid an unusual problem. > > The dialog boxes will run OK from the applet's hWnd but if you close the > application without closing the applet first, it will GP fault on exit > because the DLL applet no longer has a valid instance handle. You solve > the trash on exit by closing down the applet with a, > > SendMessage hWnd, WM_SYSCOMMAND, SC_CLOSE, 0 > > called from the DLL_PROCESS_DETACH part of the DllMain or LibMain function > but it will still GP fault on exit if you have a system dialog box displayed > when the calling application is closed down. You solve this second problem > by using the passed hParent for the dialog boxes instead of the applet's > "hWnd" so that on closing the main application, any of its child windows > are closed automatically. > > The win from re-writing the executable as an applet in a DLL is that you > can pass far more complex data in a function call than you can pass on a > command line to an executable and you can also get a return value from it. > > With some imagination, there are many things that you can do with dynamic > link libraries, applets, your own custom controls and of course, functions > packed in convenient ways for use by your aplications. > > When you think of processors running at hundreds of megabytes per second, > you would think that things should start to happen fast yet so much of what > you see is not even as fast as it was in DOS or 16 bit windows. Trapped in > the endless manipulation of interface components, a linear 32 bit processor > is not given the opportunity to really perform but below the glitzy surface > is enormous power waiting to be tapped. > > The Rocket Trip > ~~~~~~~~~~~~~~~ > This is where the real fun stuff starts, no more "Mister Nice Guy", no > compiler writer holding you hot little hand, just no holds barred use > of the processors native instructions at speeds that will leave the > uninitiated breathless. This has the feel of getting out of your Sopwith > Camel biplane and into the starship Enterprise, winding it up on impulse > power and then flicking the switch for warp drive. > > There is an option available to warriors who are writing in C to write the > following examples directly in an assembler environment if they have one > available. You will need to check the literature of either TASM or MASM to > get the details of how to set up an external module in terms of parameter > passing and calling conventions but it is an attractive option in that you > can use the macro capacity of a full assembler for later development of > more complex code structures. > > You don't pay any penalty by writing modules in inline assembler except that > the macro capacity is not available and this is how the following code is > written. > > Data Types > ~~~~~~~~~~ > If there is one thing that needs to be driven in with a large hammer, it is > the need to understand the size of your data types. In 32 bit assembler you > have three native data types, > > Byte DB = 00 ------------ hex > Word DW = 00 00 --------- hex > Dword DD = 00 00 00 00 --- hex > > You can think of each one as an appropriate size "can" for the right size > piece of data. Higher level languages construct their data types out of > these fundamental building blocks. C had at last count 59 data types but > when you look in the header file, they reduce down to these three sizes. > > The Data Manipulation Instructions > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > We start exploring the built in capacity in the processor to move data > around at high speed with a deviation of the [ movs ] instruction, "movsb", > move string in byte size chunks. > > This instruction has to be set up and used in conjunction with other > instructions to perform its function of copying bytes from one location in > memory to another. It uses the architecture of placing a number of the > correct data size in the ecx register, it then loads a pointer to the Source > block of memory into the source index [ esi ], loads a pointer to the > destination block of memory into the destination index [ edi ] and finally > it requires the use of a conditional repeat instruction [ repnz ]. > > mov ecx, numBytes ; number of bytes to copy > mov esi, adrSource ; pointer to source > mov edi, adrDestination ; pointer to destination > repnz movsb ; repeat byte copy while ecx not = zero. > > This combination decrements ecx for each byte copied until it is zero where > it then exits the loop. You then have the contents of the Source copied into > the Destination. To set this up in a working function, you must allocate > two blocks of memory, one which has the source data in it and the second > which is either the same size in bytes or larger. If it is not at least the > same size, you will get a page fault from overwriting memory that your app > does not own. > > In the following function, you pass a pointer to both blocks of memory and a > long integer for the length. To re-write this function in C inline asm, you > will need to pay attention to the passing of parameters to ensure that you > are passing pointers to both source and destination. In basic, you use > either the VarPtr() function or the StrPtr() function if your data is > dynamic string. > --------------------------------------------------------------------------- > FUNCTION MemCopyB(ByVal Source as LONG, _ > ByVal Dest as LONG, _ > ByVal ln as LONG) as LONG > > ' Small mover, [ movsb ] > ' ~~~~~~~~~~~~~~~~~~~~~~ > > ! cld ; copy forward in string > ! mov ecx, ln ; copy byte count into ecx > ! mov esi, Source ; copy source pointer into esi > ! mov edi, Dest ; copy destination pointer into edi > ! repnz movsb ; repeat not zero, move string BYTE > > ! sub ln, ecx ; very limited value > > FUNCTION = ln > > END FUNCTION > --------------------------------------------------------------------------- > This function, dubbed "small mover", will clock at a data transfer rate of > about 17 Meg/Sec on a 166 meg pentium, more than fast enough for many > applications moving data in sizes from bytes to kilobytes but for moving > data in megabytes, we need to use its big brother [ movsd ] move string > DWORD. > > This is where you take the gloves off and get serious, for only a small > increase in overhead, you copy four bytes at a time instead of one. You have > to solve the problem of byte lengths that are not equally divisable by 4 and > this is done by producing a hybrid function that does the major data > transfer in 4 byte chunks and cleans up the remaining bytes in one byte > chunks. > > The following function passes the same parameters as the one above. > --------------------------------------------------------------------------- > FUNCTION MemCopyD(ByVal Source as LONG, _ > ByVal Dest as LONG, _ > ByVal ln as LONG) as LONG > > ' Big mover, [ movsd ] BURP ! > ' ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > LOCAL lnth as LONG > LOCAL divd as LONG > LOCAL rmdr as LONG > > ! cmp ln, 4 ; if under 4 bytes long > ! jl tail ; jump to label tail > > ! mov eax, ln ; copy length into eax > ! push eax ; place a copy of eax on the stack > > ! shr eax, 2 ; integer divide eax by 4 > ! shl eax, 2 ; multiply eax by 4 to get dividend > > ! mov divd, eax ; copy it into variable > ! mov ecx, divd ; copy variable into ecx > ! pop eax ; retrieve length in eax off the stack > ! sub eax, ecx ; subtract dividend from length to get remainder > ! mov rmdr, eax ; copy remainder into variable > > ! cld ; copy bytes forward > ! mov ecx, ln ; put byte count in ecx > ! shr ecx, 2 ; divide by 4 for DWORD data size > > ! mov esi, Source ; copy source pointer into source index > ! mov edi, Dest ; copy dest pointer into destination index > ! repnz movsd ; repeat while not zero, move string DWORD > > ! mov ecx, rmdr ; put remainder in ecx > ! jmp over > > tail: > ! mov ecx, ln ; set counter if less than 4 bytes in length > ! mov esi, Source ; copy source pointer into source index > ! mov edi, Dest ; copy dest pointer into destination index > > over: > ! repnz movsb ; copy remaining BYTES from source to dest > > ! sub ln, ecx ; calculate return value ( little use ) > > FUNCTION = ln ' return bytes copied > > END FUNCTION > --------------------------------------------------------------------------- > Whereas "small mover" had a data transfer rate of about 17 Meg/Sec, "Big > mover" clocks at about 42 Meg/Sec on a 166 Meg Pentium, figures that are > only fantasies in high level languages. On a late model fast machine, you > will easily see over a hundred Meg/Sec. > > Reading byte data is one of the useful functions that is often performed in > computer software. We explore a variation of the [ scas ] instruction, > "scasb". > > It is another built in loop instruction that has the following logic, > > mov al, Search_Char ; copy character to search for into al > mov ecx, lenString ; load length of string to search into ecx > mov edi, lpStrng ; copy pointer to the search string into edi > repne scasb ; repeat while not equal, scan string BYTE > > This loop structure also decrements ecx until either a match is found > between the search character and the character in the string where it then > exits the loop or if no match is found it decrements to zero and exits the > loop. > > Note that the data size of the search character MUST match the [ scas ] > instruction size, "al" for scasb, "ax" for scasw and "eax" for scasd. > > The following function passes 4 parameters, a pointer to the string data, > the length of the data, the character to scan for and the starting position > in the string to start scanning from. > ------------------------------------------------------------------------ > FUNCTION ScanString(ByVal lpStrng as LONG, _ > ByVal lnStr as LONG, _ > ByVal Char as BYTE, _ > ByVal sPos as LONG) as LONG > > ! inc lnStr ; needed so last char is compared > > ! mov eax, lnStr ; copy length into eax > ! sub eax, sPos ; shorten length by start pos > ! mov lnStr, eax ; copy result into length > > ! mov eax, lpStrng ; set starting offset in string by > ! add eax, sPos ; adding the starting position to it > ! mov lpStrng, eax ; put sum back into variable > > ! cld ; scan forward in string > ! mov al, Char ; copy "search" character into al > ! mov ecx, lnStr ; set maximum count in ecx > ! mov edi, lpStrng ; copy pointer into destination index > ! repne scasb ; repeat if not equal, scan string BYTE > > ! cmp ecx, 0 ; if no matches found > ! je zero ; jump to zero > > ! sub lnStr, ecx ; subtract char pos from string len > ! mov eax, lnStr ; put value in eax > ! add eax, sPos ; add starting pos to it > ! mov lnStr, eax ; put result back into value > ! jmp TheEnd > > zero: > ! mov lnStr, 0 ; set return value to zero if no match > > TheEnd: > > FUNCTION = lnStr > > END FUNCTION > ------------------------------------------------------------------------ > This function returns a long integer which indicates where the match > character is in the string. It returns a zero if no match is found. If you > bother to clock it, this one is no slouch either. > > The next function uses two separate mnemonics to read, compare and write > data from one memory block to another. The two mnemonics are, > > 1. [ lods ] "Load String" with variations lodsb, lodsw, lodsd > 2. [ stos ] "Store String" with variations stosb, stosw, stosd > > They work in much the same way as the previous mnemonics, you use ecx as > the counter, you load pointers to data in the source index and the > destination index and you use loop instructions to decrement the counter > until it is zero where you exit the loop. > > They both use the accumulator register [ al/ah, ax, eax ] in the appropriate > data size to access the data from the source index and write it to the > destination index, this is how it is accessible for processing by the > programmer. [ lodsd ] reads a DWORD size piece of data from the source > index [ esi ] into eax and [ stosd ] to write a DWORD from eax into the > destination register [ edi ]. > > What you do with the data in the accumulator register between these two > instructions depends on what you need to do. > > The following function uses the BYTE size instructions lodsb & stosb to > read a string and convert it to either upper or lower case. > ------------------------------------------------------------------------ > FUNCTION CaseTxt(ByVal Source as LONG, _ > ByVal Dest as LONG, _ > ByVal ln as LONG, _ > ByVal UorL as LONG) as LONG > > ! cld ; read & write forward > ! mov ecx, ln ; copy length into counter > ! shl ecx, 1 ; multiply length by 2 > > ! mov esi, Source ; move Source pointer into esi > ! mov edi, Dest ; move Dest pointer into edi > > ! cmp UorL, 0 ; zero for lower case > ! jne cLoop2 ; other for upper case > > ctLoop1: > ! lodsb ; read BYTE from esi (source index) > ! cmp al, 65 ; bypass addition if below ascii 65 > ! jl loc1 > ! cmp al, 90 ; bypass addition if above ascii 90 > ! jg loc1 > ! add al, 32 ; add 32 to ascii for upper to lower > loc1: > ! stosb ; write byte to edi (destination index) > ! dec ecx ; decrement counter > ! loopnz ctLoop1 ; loop while ecx not = zero > > ! jmp outCase > > ctLoop2: > ! lodsb ; read BYTE from esi (source index) > ! cmp al, 97 ; bypass subtraction if below ascii 97 > ! jl loc2 > ! cmp al, 122 ; bypass subtraction if above ascii 122 > ! jg loc2 > ! sub al, 32 ; subtract 32 from ascii for lower to upper > loc2: > ! stosb ; write byte to edi (destination index) > ! dec ecx ; decrement counter > ! loopnz ctLoop2 ; loop while ecx not = zero > > outCase: > > FUNCTION = UorL > > END FUNCTION > ------------------------------------------------------------------------ > This function is no slouch either, it clocks at 10 Meg/Sec on a 166 Meg > Pentium. The basic framework of this function can be used for a multitude > of string replacement / stripping and similar functions as long as you work > out a way to calculate the size of the destination memory block as it must > be large enough to accept what is being written to it. > > It is worth noting that these "string" instructions are by no means limited > to the higher level language data types of string. At this level you only > have sequences of bytes and as long as you can supply a pointer to the data, > you can read and write anything in whatever data size (DB DW DD) you find > convenient. > > It is worth taking the time to write functions of the type we have addressed > in a more convenient form for the particular language that you write in. > > If you are using C and you need string type data, you would normally > allocate a zero terminated string within the function as your destination > buffer, write your source data to it and pass it as the return value. > > This is one area where the old Cadillac is showing signs of age, zero > terminated strings are leftovers from the days when you live patched a > a core dump in a crashed mainframe. The basic programmer has the option of > allocating dynamic string data on the fly and this is how it should be done > in basic with a function of the type we have addressed above. > > Pass your source string as a dynamic string, allocate the Destination > buffer with Space$(len(source$)) and then write your data from the source > index to destination index as we have done above. Then you pass it as the > return value like normal. > > Depending on how your language handles addresses of allocated memory in a > function, you may need to use the [ lea ] "load effective address" mnemonic > to place the address of the destination buffer into the destination index > [ edi ]. > > What we have explored are the 486 level instructions for manipulating data > in small, fast and convenient ways yet these instructions are the workhorses > of data movement in current processors and will remain that way for some > time to come. > > When written into you application, functions of this type will suck the > headlights out of high level languages trying to do the same things while > compiling at a far smaller size. This is where the programmer finds the holy > grail. > > For the modern software warrior who has come on this journey recovering the > knowledge that was almost lost to the current generation, this is end game > where you apply the "sting". While many suffering the RAD packages are > riding a one way ticket to oblivion, your recovery of these "ancient" arts > puts you in the driver's seat for the next generation of software. > > Intel mnemonic cracking > ~~~~~~~~~~~~~~~~~~~~~~~ > There is a technique for cracking Intel mnemonics that would not be widely > known if you didn't have at least a couple of bottles of malt handy. You > also need an object heavy enough to hold the left button of the mouse down. > An old Pentax camera does the job fine. > > Armed with this "cracking" kit, you can undertake Intel mnemonic cracking > by following the instructions, Load any one of the three Intel PDF files > into Acrobat Reader, start dragging downwards with the mouse. As it starts > to scroll downwards, sit the base of the camera on the left button of the > mouse and prop up the camera so that it does not move with the two bottles > of malt. > > Once this is going, you can sit back and watch it until it reaches the end > of the file which takes just about forever but at least you are not cramping > your hand holding the button down. When it eventually gets to the end which > seems like eternity, copy it to the clipboard and save it into a text file. > > About the only thing that can go wrong with this "cracking" technique is if > you forget while you are watching it scroll away and reach for one of the > bottles of malt. This will almost exclusively result in the camera slipping > off the mouse button and you have to start again. > > With some judicious use of the search and replace function in a good, BIG > text editor, you can get this mess into reasonably readable form without > having to use that absolutely disgusting Adobe PDF format Acrobat Reader. > > Perform this "crack" on the other two Intel files and you have a programming > manual that will not go out of date quickly. This is a point not to be lost > after having shovelled through other language manuals over time. > > Whereas most languages have changed in almost unintelligible ways over a > long period, the assembler of 15 years ago which you can still find on the > net looks very much the same as the current 32 bit assembler. In terms of > time scale, this is like digging up an ancient acheological site and finding > an inscription on the wall that you can read. > > The main difference is that 32 bit assembler is not cursed with the problems > and restrictions of segment arithmetic and is both much clearer and far more > powerful. > > Back Through The Warp > ~~~~~~~~~~~~~~~~~~~~~ > For this ancient warrior, the warp is starting to waiver and I must return > for a while to do those things that ancient warriors do, rescue the odd > damsel in distress, lay siege to a castle or two, wind up the ancient chariot > to go where many have gone before and yes, write some more code. > > In your hands is Excalibur, how you wield it will shape the world of > computing to come and in only a short time in real world terms, you will > yourselves become one of the ancient warriors. > > In recovering these near lost arts, you have re-established the connection > between the past and the future which had almost been lost in the present. > The tools of the future are built in the present from the knowledge of the > past. > > Wield your Excalibur well as the ancient warriors often roam the corridors > of the Internet in search of clever things and their eye will not miss the > work of another warrior who has maintained their professional skill and > integrity by refusing to sell out to the wide and easy path that leads to > oblivion. > > END GAME >