Getting started with Reverse Engineering binaries Part 2: A closer look

Part 1 introduced the basic idea of what a binary is and how it is mapped in memory. This part continues on that path, by looking at registers and how data is represented at the processor level. The following focuses on Intel architecture.

A Note On Registers

In order for the program to execute, there is a need for a way to transfer and store chunks of a program’s flow during execution. This is where registers come in.

A register is the quickest available location to store and retrieve information for the cpu.

x86 systems running Intel processors typically come with 16 basic program registers used in general system and application programming.

They are grouped into 4 categories:

General Purpose Registers

These are 8 registers for storing operands and pointers. Their special uses can be listed as:

eax – accumulator for operands and results data
ebx – pointer to data in the DS segment
ecx – counter for string and loop operations
edx – I/O pointer
esi – pointer to data in the segment printed to by the DS register ; a source pointer for string operations
edi – pointer to data in the segment printed to by the es register; destination pointer for string operations
esp – stack pointer in the SS segment
ebp – pointer to data on the stack in the SS segment

These 32-bit registers can alternatively be addressed by referencing the lower 2 bytes, the lower bytes’ higher 8 bits and lower byte’s lower 8 bits. For example the lower bits of eax would then be addressed as ax, ah and al respectively.

**General-Purpose Register alternate names**

In 64-bit mode, there are 16 general purpose registers with a default operand size of 32 bits. These registers are able to work with either 32-bit or 64-bit operands.

In addition to the 8 listed above, the additional registers available in 64-bit mode are R8D, R9D, up to R15D. If 64-bit operand size is specified, the general purpose registers are prefixed by R instead of E; i.e. eax becomes rax etc. They can then be accessed at the byte, word, word and qword level.

Segment Registers

Segment registers hold up to six 16-bit special pointers that identifies a segment in memory, called segment selectors.

CS – Contains segment selector for the code segment, where the current instructions being executed are stored.
DS, ES, FS and GS – Point to four data segments. There are separate data segments to access different types of data structures in an efficient and secure manner. For example there might be data structures for the current module, for data exported from a higher level model and another for data shared with another program.
SS – Contains the segment selector for the stack segment, where the function stack is stored from the program task. All stack operations use the SS register to find the stack segment.

EFLAGS Register

In 32-bit, this register contains a group of status flags, a control flag, and a group of status flags. When the processor is initialised, the state of the EFLAGS register is 00000002H. There are no instructions that allow the whole register to be examined or modified directly.

Instead, some flags can be modified directly using special-purpose instructions. Some of the bits ( 1, 3, 5, 15, 22 – 31) are reserved and their state should not be depended upon.

To move groups of flags to and from the procedure stack or the EAX register, the LAHF, SAHF, PUSHF, PUSHFD, POPF, and POPFD instructions can be used. Using the processor’s bit manipulation instructions (BT, BTS, BTR, BTC) the flags can be examined and modified after they have been moved to the EAX register or to the procedure stack.

When suspending a task, for example during multitasking ,the processor automatically saves the state of the EFLAGS register in the task state segment, TSS.

When binding to a new task, the processor loads the EFLAGS register with data from the new task’s TSS. The state of this register is also automatically saved in TSS in the event of an interrupt or an exception.

The flags register has the following categories for the various flags:

Status Flags
- indicate the results of arithmetic instructions.
- only the CF flag can be modified directly
System Flags and IOPL field
- control operating-system or executive operations
- should not be modified by application programs

The table below lists in detail the flags, their bit positions and a brief description of their functions.

Status Flags	Bit Position	Description
Carry Flag(CF)	0	Set if the arithmetic operation generates a carry or a borrow out of the most significant bit of the result; cleared otherwise. The flag indicates an overflow condition for unsigned-integer arithmetic
Parity Flag(PF)	2	Set if the least-significant byte of the result contains an even number of 1 bits.
Auxiliary Carry Flag (AF)	4	Set if the arithmetic operation generates a carry or a borrow out of bit 3 of the result; otherwise cleared. Used in binary-coded decimal (BCD).
Zero Flag (ZF)	6	Set if the result is zero; otherwise cleared.
Sign Flag (SF)	7	Set equal to the most significant bit of the result, the sign bit fo a signed integer. 0 indicates a positive value; 1 a negative value.
Overflow Flag (OF)	11	Set if the integer result is too large excluding the sign-bit to fit in the destination operand. Cleared otherwise.
Direction (DF) Flag	10	Controls string instructions (MOVS, CMPS, SCAS, LODS and STOS). Set and cleared by STD and CLD respectively.
System Flags	Bit Position	Description
Trap Flag (TF)	8	Set to enable single-step mode for debugging; clear to disable single-step mode.
Interrupt Enable Flag (IF)	9	Controls the response of the processor to maskable interrupt requests. Set to respond to maskable intrerrupts; cleared to inhibit maskable interrupts.
I/O privilege level field (IOPL)	12 & 13	Indicates the I/O privilege level of the currently running program or task. The current privilege level (CPL) of the currently running program or task must be less than or equal to the I/O privilege level to access the I/O address space. The POPF and IRET instructions can modify this field only when operating at a CPL of 0.
Nested Task Flag (NT)	14	Controls the chaining of interrupted and called tasks. Set when the current task is linked to the previously executed task; cleared when the current task is not linked to another task.
Resume Flag (RF)	16	Controls the processor’s response to debug exceptions.
Virtual-8086 Mode Flag (VM)	17	Set to enable virtual-8086 mode; clear to return to protected mode without virtual-8086 mode semantics.
Alignment Check/ Access Control Flag (AC)	18	If the AM bit is set in the CR0 register, alignment checking of user-mode data accesses is enabled if and only if this flag is 1.
Virtual Interrupt Flag (VIF)	19	Virtual image of the IF flag. Used in conjunction with the VIP flag.
Virtual Interrupt Pending Flag (VIP)	20	Set to indicate that an interrupt is pending; clear when no interrupt is pending. Set and cleared by the software; processor only reads it. Used in conjunction with the VIF flag.
Identification Flag (ID)	21	The ability of a program to set or clear this flag indicates support for the CPUID instruction. CPUID returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers. The instruction’s output is dependent on the contents of the EAX register upon execution.

EIP Register

Also known as the instruction pointer, the EIP register contains the offset in the current code segment for the next instruction to be executed.

It is advanced from one instruction boundary to the next in straight-line code or it is moved ahead or backwards by a number of instructions when executing JMP, Jcc, CALL, RET, and IRET instructions.

The EIP register cannot be accessed directly by software. It is controlled implicitly by control-transfer instructions (JMP, Jcc, CALL and RET), interrupts and exceptions.

It is read by executing a CALL instruction and then reading the value of the return instruction pointer on the procedure stack.

Loading the register can only be done indirectly by modifying the value of a return instruction pointer on the procedure stack and executing a return instruction (RET or IRET).

In 64-bit Mode

The EFLAGS register is extended to 64 bits and is called RFLAGS. The upper 32 bits of RFLAGS are reserved. The lower 32 bits are the same as EFLAGS.

The instruction pointer register is RIP. This register holds the 64-bit offset of the next instruction to be executed.

Processor Instructions

When using the instructions one the intel platform have the format:

label: mnemonic argument1, argument2, argument3

label is an identifier followed by a colon
mnemonic is a reserved name for a class of instruction opcodes which have the same function
arguments1, 2 and 3 are optional; they maybe zero to three depending on the opcode
operands can be reserved names of registers or are assumed to be assigned to data items declared in another part of the program
when two operands are present in an arithmetic or logical instruction, the right operand is the source and left operand is the destination,e.g.

loadreg: mov eax, subtotal

With loadreg as the label, mov as the mnemonic identifier or an opcode, eax as the destination operand and subtotal as the source operand. In some assembly languages such as AT&T, the source and destination are in reverse order.

Data Representation

Hexadecimal (base 16) digits are followed by the character H, for example 0F8EH. Binary (base 2) numbers are represented by a string of 1s and 0s, optionally followed by the character B. The appearance of the B after a binary number depends mostly on the need to reduce ambiguity on the number type being represented.

The processor uses byte addressing, therefore the memory is organised and accessed as a sequence of bytes. The range of memory that can be addressed is referred to as an address space.

In a typical scenario where a processor can handle a program that may have many independent address spaces, so called segments, the processor is said to support segmented addressing. This just refers to a program that keeps its code (instructions) and stack in separate segments, as illustrated in the segment register section of this article. To specify a byte address within a segment, the following notation is used:

Segment-register:Byte-address

for example, identifying the byte at address FF79H in the segment pointed to by the DS register;

DS:FF79H

To identify an instruction addressing the code segment;

CS:EIP

where the CS register points to the code segment and the EIP register contains the address of the instruction.

Summary

With this information, it is now less intimidating to read the output given by the various reverse engineering tools introduced in Part 1. Part 3 goes on to show how the process of loading and analysing a binary to understand its inner workings may be carried out.