Programming and Reverse-Engineering on the PC - Basic concepts © 1998 Icedragon of MiB

╖ Programming and Reverse-Engineering on the PC ╖
╖ Basic concepts ╖

© 1998 by Icedragon / MiB

Table of Contense


Chapter I	Introduction
Chapter II	Programming / developing on the PC
Chapter III	Reverse-Engineering on the PC
Chapter IV	Some final words...

Chapter I - Introduction

I will explain in this text a bit of the basic concepts of program developing as well because i think it may be necessary (and very useful) for you to understand the whole process in order to reverse-engineer existing programs. If you don`t agree or think you are already familiar with these things, this tutorial may not be the right one for you to read. This is only written for those who have had only slight programming and/or reverse-engineering experience on the PC yet, and thus lack some general understanding needed for cracking.

Chapter II - Programming / developing on the PC

Generally, we have to distinguish between two levels of programming languages: machine-level and high-level languages. There is only one language a computer can really understand, this language is called machine-language. Machine-language consists only of binary numbers and would be VERY difficult to program, so the Assembler-language (short ASM) was invented. ASM is nearly the same as machine-language except that you have real commands and syntax in TEXT and not just binary numbers. The fact that ASM is merely just a readable form of machine-language makes it the fastest language for programming, since the set of commands are both the same and no extra ätranslation" is needed. Both machine-language and ASM are considered machine-level languages, in contrary to other high-level languages like Basic, C, Pascal etc. When you write a program, you have to use a language-specific compiler and a linker to translate the source-code into executable machine-language. A compiler takes the source-code and translates it into an object file, which is an intermediate state of a program. Intermediate refers to the fact that this isn┤t an executable yet but its not a program in source language anymore either. The linker finally generates from one or more object files the executable program (with usually has the extension .exe).

Here is the main process for developing a program:

1. writing /editing the source-code
2. compiling the source-code to object-file
3. linking all relevant object-files to executable
4. debugging / testing the program

If any error occurs at any point from 1. to 4., you may have to run through the whole process again.

Example:
Since you mistyped a command in your source-code, the compilation aborts with a syntax error. You have to edit the source-code in order to fix the syntax error, then start compiling again etc.....

Chapter III - Reverse-Engineering on the PC

What is Reverse-Engineering ?

First of all, you will ask yourself what the heck "Reverse-Engineering" means... Well, when you develop a program you are "engineering", that means you write the source-code, build the object files, link the executable etc. Thus, when you reverse-engineer you do it exactly the other way round. You want to modify a program, but you just have the final product of the developing/engineering process which is the .exe file (including any other data files, online-manuals etc. that belong to the final program version). So in order to be able to modify the existing program without the source-code, you have to "reverse-engineer" it, which means that you have to generate your source-code from the executable (in contrary to generating an executable from a source-code). You can do this either with disassembling the exe-file, modifying the source-code and then recompile it - or, for less complex reverse-engineering tasks, using a debugger like soft-ice to analyze the desired code-part and patching the exe-file afterwards.

Getting the source-code

Now the problem with generating source-code from an executable is that every compiler for a high-level language has its own way of translating its specific commands into machine-language. This would require a different high-level language "disassembler" for every different compiler, and since these kind of disassemblers are quite difficult to program this is seldomly used as a solution. But as you should remember from the second chapter, ASM is just a readable form of machine language. This implies that its quite easy to convert an executable (machine-language) into an ASM source-code, and thus several disassemblers for machine-language to ASM source-code conversion exist. Of course it is a lot more difficult to edit programs in ASM language than in a high-level language like C, but on the other hand you can reverse-engineer ANY program with this disassembling methods (which is quite a deal i think). As always, since there are quite a few of these disassemblers you have to choose the right one for your task. The right choice depends mainly on the environment the target program is run under, so the best choice for windows programs would be i.m.h.o. WDASM (iam currently using version 8.xx) which detects calls to Win-APIs, string references and a whole lot more ... For DOS programs i would use the latest versions of SOURCER or IDA.

Debug & Patch

Like i said before, this method is only useful for small modifications to the target program (like ... ehm ... cracking ?). And even then it┤s sometimes useful or even necessary to have a disassembled listing of the source-code in reach, in order not to get lost in the masses of gloomy ASM-Code for example. If you are an ASM-buff, know your way around debugging and don't mind spending nights in front of a dark screen (and maybe want to get rid of your girlfriend ?) then this is JUST THE THING for you ! But hell, it┤s a lot of fun, a big challenge and a fair one as well (sometimes at least). A great debugger for this task is Soft-Ice from NuMega, i wrote a small basic tutorial (MEEP ! MEEP ! self-promotion detected !) for this precious tool which is also available on the MiB-HP. Go find Soft-Ice on Caligo┤s Page (take a look at the Links on our MiB-Homepage), you find everything you need (along with tons of other topic-related tuts) there. After you found the desired routine in the target program spending hours in front of your debugger, you have to figure out a proper modification to fix the bug (or whatever you wanted to do to the target) in ASM and add this modification to the exe-file. This is done by translating the new commands into machine-language, which should be no problem for you if you are familiar with ASM (just take the opcodes), and inserting this at the correct location in the exe-file (here you often need a disassembled listing). Note that this may cause trouble if the target checks for a certain file length (checksum), and you have to be careful not to mess up the program flow with wrong addressing too. A much more safer method of patching is to overwrite the unwanted code parts with new code, the major disadvantage being in having to match the exact length of the code you want to overwrite with the new code you want to insert.

Chapter IV - Some final words...

I know i didn┤t cover some topics, but i was a bit indecisive at what i should put in here and what should be better left out. It would be nice to get your feedback, positive or negative, if you find i neglected something important or went way to deep into useless stuff. Maybe i could be able then to do another version about this tut, although it┤s more an essay than a tutorial.

For comments, complaints or blackmail, drop me a line at: icedragon@thevortex.com

Back to tutorial page.