Thursday, 31 May 2012

Aspect of Compilation

A compiler bridges the semantic gap between a PL(programming language) domain and an execution domain.

There are 2 aspects of compiler
Generate code to implement meaning of a source program in the execution domain.
Provide problem solving approach for harms of PL semantics in a source program.
     There are so many features which contribute to the semantic gap between a PL domain and an execution domain.

Data Types

Data structures

Scope rules

Control structure
Data Type : it is the specification of legal values for variables of the type and legal operations on the legal values of the type. Semantics of datatype require a compiler to ensure that variables of a type are assigned or manipulated only through legal operations.
The following tasks are involved in ensuring this:

Checking legality of an operation for the types of its operands. This ensures that a variable is subjected only to legal of its type.

Use type conversion operations to convert values of one type into values of another type wherever necessary and permissible according to the rules of a PL.

Use appropriate instruction sequence of the target machine to implement the operations of a type. Ex :
Float x,y;
    X=y + i;

int i;

Assembly code:

Data Sturcture :
array , structure

Scope rules :
float x,y;
int a,b;

Control Structure :
if condition, switch….case, loop
Memory allocation

Memory allocation involves three important tasks:
Determine the amount of memory required to represent the value of a data item.
Use an appropriate memory allocation model to implement the lifetimes and scopes of data items.
Determine appropriate memory mappings to access the values in a non scalar data item, ex : array.
Memory Binding

A memory binding is an association between the memory address attribute of a data item and the address of a memory area.
Static memory allocation
Dynamic automatic memory allocation
Program controlled allocation
Extended Stack

A stack is a linear data structure. Which follow the concept of LIFO.

In data structure an area of memory is reserved for the stack.

A pointer called the stack base (SB) points to the first word of the stack .

Last element of the stack is called TOP OF Stack (TOS).

To SB and TOS 2 new pointers exits in the model.
A record base pointer(RB) pointing to the first word of the last record in stack.
The first word of each record is a reserved pointer.

Memory allocation & access

Automatic dynamic allocation is implemented using the extended stack model.

SB(stack base)


RB(record base pointer)

Reserve pointer

Each record in the stack has 2 reserved pointers instead of one. Each record accommodates the variables for one activation of a block, hence we call it an activation record (AR). During the execution of a block structured program , a register called the activation record base(ARB)  always points to the start of the TOS record. This record belongs to the block which contains the statement being executed.

The first reserved pointer in a block AR points to the activation record of its dynamic parent. This called Dynamic pointer. The dynamic pointer is used for deallocating an AR.

Access to nonlocal variables is implemented using the second pointer called Static pointer. When AR is created for block its static pointer is set to point to the AR of static ancestor of the block.

Symbol table requirements : In order to implement dynamic allocation and access, a compiler should perform :
Determine the static nesting level of  current block.
Determine the variable designated by the name. this should be done in accordance with the scope rules.
Determine the static nesting level of the block in which variables are defined.
Generate the code.
A simple scheme to implement function 1,2 and 3.
  using extended stack model to organize symbol table.
When the start of block(current block) is encountered during compilation, a new record is pushed on the stack. It contains
Nesting level of current block.
Symbol table for current block.
The reserved pointer of the new record points to the previous record in the stack.
Compilation of expression

In source code we use so many types of expression, in code generation for expression following task are performed :
Determination of an evaluation order for the operators in an expression.
Selection of instruction to be used in the target code.
Use of register and handling of partial results.
     The expression is evaluated according to precedence rule of operators. 

The choice of an instruction to be used in the target code depends on the following :
The type and length of each operand.
The addressability of each operand, where the operand is located and how it can be accessed.

The addressability of an operands indicates where an operand is located and how it can be accessed.

An operand descriptor to maintain the type, length and addressability information for each operand. The choice of an instruction can be made by analyzing an operator and the descriptors of its operands.

A partial result is the value of some sub expression computed while evaluating an expression.

Partial results are maintained in CPU registers. Some of them have to be moved to memory if no of results exceeds the no of available CPU registers. An important issue in code generation is when and how to move partial result is contained in a register. We use a register descriptor to maintain information.

Operand descriptors :
An operand descriptor has the following fields :
Attributes : contains the subfields type, length and miscellaneous information.
Addressability : specifies where the operand is located and how it can be accessed. It has 2 subfields
Addressability code : Takes the values ‘M’ (operand is in memory) and ‘R’ (operand is in register).
Address : Address of a CPU register or memory word.

Register Descriptor

A register descriptor has 2 fields
Status : Contains the code free or occupied to indicate register status.
Operand descriptor #: if status= occupied, this field contains the descriptor # for the operand contined in the register.

Register descriptions are stored in an array called Register descriptor. One register descriptor exists for each CPU register.