Reverse Engineering

Reverse Engineering : For Loops, Switch statements and their representation

In this blog, we will be looking at how the loops are represented in lower assembly language. Here we will take some simple to complex programs in Higher level language like C/C++ and compile them to create an executable. We will then disassemble the executable to analyse how different loops are represented or optimized by the compiler. After that we will be able to easily identify the loops used in complex program.

For Loop

First we will take a simple program which takes an integer as input and prints the statement “I like to learn how things work” that many times. The code snippet is given below:

#include 
#include 
int main() { 
 int n; 
 int i=0; 
 printf("Enter n?"); 
 scanf("%d",&n); 
 for(i=0;i<n;i++) { 
 printf("I like to learn how things work\n"); 
 } 
 return 0;
}

Now we will compile the program using any available compiler and load the executable in IDA. The following is the graph generated when the compiled executable is loaded in the IDA.

wpforloop

At first the base pointer is pushed to stack for further usage and the stack pointer is copied to the base pointer. The next statement sub rsp, 10h is used to move the stack pointer downward by 16 to accommodate required variables. Instruction mov [rbp+var_4], 0 is used to initialize 0 to a variable, in our case the variable is ‘i’. The offset of the memory location for format where the string “Enter n?” is stored is loaded to rdi i.e. lea rdi, format. Then the printf method is called to display the format string and a integer variable is read from the buffer by calling scanf method. Then the program makes a jump to the address lOC_74B using jmp instruction. Till now our program asked for a integer value and stored it.

The execution of for loop begins at LOC_74B. The second variable i.e. ‘n’ is moved to the accumulator(EAX). The value of ‘i’ is compared with value of ‘n’ , cmp [rbp+var4], eax. If the value of [rbp+var_4] or ‘i’ is less than the value of [rbp+var_8] or ‘n’ the program makes a jump to the location LOC_73B else the program sets EAX as 0 and terminates.

On LOC_73B, the relative address of s is loaded in rdi. Now rdi points to the string “i like to learn how things work”. Then puts method is called, the rdi is passed as a parameter to the puts method. The value of [rbi+var_4] is incremented by 1. The program iterates until the condition at LOC_74B is false. The cmp instruction in LOC_74B is the condition part of for loop and add instruction in LOC_73B is the increment part of the for loop.

for ( initialization, condition, increment) ;

Nested Loop

Now we will take a look at nested for loop. For that we will consider the following C program :

#include 
int main() {
 int m,n;
 int i=0,j=0;
 printf("Enter m?");
 scanf("%d",&m);
 printf("Enter n?");
 scanf("%d",&n);
 for(i=0; i<m; i++) {
    for(j=0; j<n; j++) {
        printf("I like to learn how things work\n");
    }
 }
 return 0;
}

The graph generated by IDA :

wpforloop2

Here execution of for loop starts at location LOC_790. The value of [rbp+var_C] i.e. ‘m’ in our C code is copied to the accumulator EAX. It is then compared with [rbp+var_4] i.e. ‘i’. If the value of [rbp+var_4] is less than value of [rbp+var_C] the program makes a jump to LOC_76B else the program terminates. On LOC_76B [rbp+var_8] i.e. ‘j’ in our program is assigned to 0 and the program makes a jump to location LOC_784. (Notice we are now on the inner for loop). Now the value of [rbp+var_10] i.e. ‘n’ is moved onto EAX and compared with [rbp+var_8] i.e. ‘j’. If [rbp+var_8] is not less than [rbp+var_10] the program increments the value of [rbp+var_4] i.e. ‘i’ and reiterates from the outer loop i.e LOC_790 else the program jumps to LOC_774 and prints the statement “I like to learn how things work” , increments [rbp+var_8] i.e ‘j’ and reiterates from the inner loop i.e LOC_784.

Switch Statement

Again we will take a simple program which takes an integer as input and determines the input value using switch statement. We will be using only 2 case statements for better understanding.

#include 
using namespace std;
int main()
{
 int x=0;
 printf("Enter value for x?");
 scanf("%d",&x);
 switch (x) {
    case 1: printf("Input is 1");
            break;
    case 2: printf("Input is 2");
            break;
    default: printf("Unknown Input");
 }
 return 0;
}

After compiling and loading the executable in IDA, we get the following graph overview.

wpswitch1

The instructions before cmp eax, 1 in this case are similar to that of the first program. It reads an integer and stores it at [rbp+var_4]. Which means [rbp+var_4] holds the value of ‘x’ in our case. All the case statements are represented in different location blocks LOC_831,LOC_844,LOC_857 as shown in the above snap. The program then compares the value of [rbp+var_4] stored in EAX with 1(since 1 is our first case). If the comparison succeeds the case instructions are executed from their respective location else if comparison fails the EAX is then compared with 2. The jz instruction used jumps to the location LOC_857 if the accumulator is 0. Else it will execute the default block present in location LOC_857.

*case blocks in graph view are placed in sequential manner

Now let us take same program which has many case statements. The code snippet used is given below:

#include 
using namespace std;
int main()
{
 int x=0;
 printf("Enter value for x?");
 scanf("%d",&x);
 switch (x) {
    case 1: printf("Input is 1");
            break;
    case 2: printf("Input is 2");
            break;
    case 3: printf("Input is 3");
            break;
    case 4: printf("Input is 4");
            break;
    default: printf("Unknown Input");
    }
 return 0;
}

The graph view presented by IDA after loading the executable for the above program :

wpswitch2

In the previous example of switch with only two case statement we see the case blocks in graph view were placed in sequential manner. But now when there are many case statements the case location blocks are not sequential any more. The compiler won’t check for the cases sequentially. The compiler will instead implement binary search algorithm to reduce the complexity of the program. In this case the value at [rbp+var_4] is first compared with 2. If satisfied the instruction on location LOC_855 are executed else the value is again compared against 2(cmp eax, 2) and if it is greater than 2 [rbp+var_4] is checked against 3 else if it is smaller than 2 [rbp+var_4] is checked against 1.

We have seen the various representation of loops and switch statements in this article. The while loop is also similar to for loop except the increment takes place in a different block in a while loop. We also looked at how nested for loops are represented in lower assembly languages. We also noticed that if there are less number of case statement in a switch the compiler compiles them in sequence and if there are many case statements compiler makes use of binary search algorithm to optimize the efficiency.

Buy me a coffeeBuy me a coffee

2 thoughts on “Reverse Engineering : For Loops, Switch statements and their representation

  1. Pingback: WOW Blog

Leave a Reply

Your email address will not be published. Required fields are marked *