Thursday, 31 May 2012

DAVV Indore CET 2012

Program linking algorithm


1.Program_linked_origin = <link origin> from linker command
2.For each object module
a.T_origin = translated origin of the object module.  OM_size = size of the object module.
b.Relocation_factor=program_linked_origint_origin.
c.Read the machine language program in work_area.
d.Read LINKTAB of the object module.
e.for each LINKTAB entry with type = public definition                    name=symbol                                                                      linked_address = translated_address + relocation_factor               Enter(name,linked_address) in NTAB.
f.Enter (object module name, program_linked_origin) in NTAB.
g.Program_linked_origin=program_linked_origin+OM_size.
3.For each object module
a.T_origin = translated origin of the object module.
      Program_linked_origin = load_address from NTAB.
b.For each LINKTAB entry with type = EXT
i.Address_in_work_area = address of work_area + program_linked_origin - <link_origin> + translated address – t_origin.
ii.Search symbol in NTAB and copy its linked address. Add the linked address to the operand address in the word with the address address_in_work_area.

Linking requirement


In c program files are translated separately. Thus only function calls that cross files boundaries and references to global data require linking.
A reference to an external symbol can be resolved only if symbol is declared as a public definition in some object module. This observation forms the basis of program linking.
The linker process all object modules being linked and builds a table of all public definitions and their load time addresses.
Linking for symbol is simply a matter of searching for that symbol in this table and copying its linked address into the word containing the external reference.
A name table (NTAB) is defined for use in program linking. Each entry of the table contains 2 fields.
1.Symbol : symbolic name of public definition or an object module.
2.Linked address : for a public definition, this field contains linked address of the symbol. For an object module, it contains the linked origin of the object module.
Most information in NTAB is derived from LINKTAB entries with type = public definition.

Self Relocating Programs


A self-relocating program is a program which can perform the relocation of its own address sensitive instructions.
It contains two provisions for relocation
1.A table of information concerning the address sensitive instruction exists as a part of program.
2.Code to perform the relocation of address sensitive instructions also exists as a part of the program. This is called the relocating logic.
The start address of the relocating logic is specified as the execution start address of the program.
The relocating logic gains control when the program is loaded in memory for execution.  It uses the load address and the information concerning address sensitive instructions to perform its own relocation.
A self relocating program can execute in any area of the memory.
This is very important in time sharing operating system where the load address of a program is likely to be different for different executions.

Relocation algorithm


Prg_linked_origin=<link origin> from linker cmd.
For each object module
1.T_origin = translated origin of the object module. OM_size= size of object of the object module.
2.Relocation_factor=prg_linked_origin - t_origin.
3.Read the machine language program in work_area.
4.Read RELOCTAB  of the object module.
5.For each entry in RELOCTAB
a.Trans_addr = address in RELOCTAB ENTRY.
b.Address_in_work_area := address of work_area + translated_addresst_origin.
c.Add relocation_factor to the operand address in the word with the address address_in_work_area.
6.Program_linked_origin=program_linked_origin + OM_size.

Linking


linking is the process of binding an external reference to the correct link time address.
An application program consisting of a set of program units. A program unit interacts with another program by using addresses of instructions and data in its own instructions.
To realize such interactions both units of program must contain public definitions and external references.
1.Public definition : a symbol defined in a program unit which may be referenced in other program unit.
2.External reference : a reference to a symbol which is not defined in the program unit containing the reference.
Before the execution of program it necessary that for each program unit, every external reference in program units should be bound to the correct link time address.
An external reference is said to be unresolved until linking is performed for it. It is said to be resolved when its linking is completed.
Entry and Extern statements : The entry statement lists the public definitions of a program unit, it lists those symbols defined in the program unit. Which may be referenced in other program units. The extern statements lists the symbols to which external references are made in the program unit.
Binary Program : a binary program is a machine language program comprising a set of program unit.
1.Program unit has relocated to the area starting at its link origin.
2.Linking has been performed for each external reference in program unit.
To form a binary program from a set of object modules, the programmer invokes the linker using the command
Linker <link origin>, <object module names> [,<execution start address>]
Where <link origin> specifies the memory address to be given to the first word of the binary program. <execution start address> is usually a pair (program unit name, offset in program unit ).
The linker converts this into the linked start address.
This is stored with the binary program for use when the program is to be executed.

Object Module
The object module of a program contains all information necessary to relocate and link the program with other programs. The object module of a program consists of 4 components.
1.Header : the header contains translated origin, size and execution start address of program.
2.Program : this component contains the machine language program corresponding to program.
3.Relocation table (RELOCTAB): this table describes Instruction register requirement of program. Each relocation table contains a single field. It stores Translated address of an address sensitive instruction.
4.Linking Table (LINKTAB) : this table contains information concerning the public definition and external references in program.

Relocation and Linking concepts


Program relocation : in program instruction and data use some address. Program assumes its instructions and data to occupy memory words with specific address.
Such a program is called an address sensitive program. It contains one or more of the following:
1.An address sensitive instruction : an instruction which uses an address.
2.An address constant : a data word which contains an address.
Definition : program relocation is the process of modifying the address used in the address sensitive instructions of a program such that the program can execute correctly from the designated area of memory.
In complete process translation origin and linking are always different. but linking origin and loading origin may be same or different.
If loading origin and linking origin are different this means relocation must be performed by the loader.
Linker always performs relocation. But some loaders do not do this.
If loaders do not perform relocation this means load origin and link origin are equal, and this loader is called absolute loader.
If loaders perform relocation then loaders are called relocation loaders.
Performing relocation : we can calculate the relocation factor by linking origin and translate origin. Suppose translated origin and linked origin of program are t_origin and l_origin respectively. Consider a symbol  symb  in program P. its translation time address be tsymb and link time address be lsymb. The relocation factor of program is defined as
  relocation_factor = l_orgin – t_origin
Relocation factor can be positive, negative or 0.
Here symb  is working as an operand. The translator puts the address tsymb  in the instruction for it.
                     Tsymbt_origin + dsymb    
where dsymb is the offset of symb in program.
lsymb = l_origin + dsymb
lsymb = t_origin +relocation_faction +dsymb
       = t_origin + dsymb + relocation_factior
       = tsymb + relocation_factor

Linker


A Linker is a program that is used to properly combine all the object program files of the s/w and to convert them into the final executable, which is called load module.
It means that a linker takes object program files and fits them together to assemble them into the program final executable form.
A module approach is used to develop reasonable sized software. In this approach the s/w is divided into a no. of smaller program and module.
It is easier to develop, test and debug smaller program.
Steps involved in execution of a program 
1.Translation of the program.
2.Linking of the program with other programs needed for its execution.
3.Relocation of the program to execute from the specific memory area allocated to it.
4.Loading of the program in the memory for the purpose of execution. 
 
Translated, linked and load time addresses
While compiling a program, a translator is given an origin specification for program. This is called the translated origin of program.
The program can specify the origin in a START or ORIGIN statement.
The translator uses the value of the translated origin to perform memory allocation for the variable declared in program.
The execution start address or simple the start address of a program is the address of the instruction from which its execution must begin.
The start address specified by the translator is the translated start address of the program.
The origin of a program may have to be changed by the linker or loader by one of two reasons.
1.The same set of translated address may have been used in different object modules constituting a program.
2.An operating system may require that a program should execute from a specific area of memory. This may require a change in its origin. The change of origin leads to changes in the execution start address and in the address assigned to symbols.
Following terminology is used to refer to the address of a program entity at different times :
1.Translation time address : address assigned by the translator.
2.Linked address : address assigned by the linker.
3.Load time or load address : address assigned by the loader.
The same prefixes translation time, linked and load time are used with the origin and execution start address of a program. Thus
1.Translated origin : address of the origin assumed by the translator.
2.Linked origin : address of the origin assigned by the linker while producing a binary program.
3.Load origin : address of the origin assigned by the loader while loading the program execution.
The linked and load origins may differ from the translated origin of a program.

Overview of interpreter


The interpreter consists of three main components:
1.Symbol Table : the symbol holds information concerning entities in the source program.
2.Data Store : the data store contains values of the data items declared in the program being interpreted. The data store consists of a set of components. A components is an array containing elements of a distinct type.
3.Data manipulation routines : a set of data manipulation routine exist. This set contains a routine for every legal data manipulation action in the source language.

Interpreter


Interpreter is a language translator, it take one statement of a high level language at a time and translate it into a machine instruction which is immediately executed.
Interpreters are easy to write and they do not require large memory space in the computer.
Unlike compiler it does not create an object code & store it.
Compiled programs generally run faster than interpreted programs.
Interpreter take little time to spent analyzing and processing the program.
Interpreter executes byte code line by line, and it converts byte code into Machine code. (Ex. Java)
Interpreter avoids the overheads of compilation. This is the advantage during program development, because a program may be modified.
To make an informed decision in practice we need a quantitative basis for a comparison of compilers and interpreters.
Tc  =average compilation time per statement
T= average execution time per statement
Ti  = average interpretation per statement
Let P is a program. The CPU time required to execute a program using compilation or interpretation is calculated by no. of statements and no. of statements executed in some execution of P.
Let size of program is 200. for a specific set of data, let program P executes as follows : 20 statements are executed for initialization purpose. This is followed by 10 iterations of a loop containing 8 statements, followed by the execution of 20 statements for printing results.
Statement execution = 20 + 10 * 8 + 20 =120
Total execution time using the compilation model
     = 200.Tc + 120.Te
   let Tc = 20.Te
  =206.Tc
Total execution time using the interpretation model = 120.Ti
Use of interpreters :
1.Efficiency in certain environments and simplicity.
2.It is better to use interpretation for a program.
3.In development environment interpreter is very fast.
4.Interpreter is a popular choice for commands to an operating system or an editor.
5.So many software packages use interpreter for interfaces.

PARAMETER PASSING MECHANISM


1.CALL BY VALUE:- In this, values of actual parameters are passed to the called function. these values are assigned to the corresponding formal parameters. A called function may allocate memory to a formal parameter & copy the value of actual parameter into this location at every call  .
1.CALL BY REFERENCE:- In this, the address of actual parameters is passed to the called function. If the parameter is an expression ,its value is computed & stored in a temporary location & the address of temporary location is passed to the called function. If the parameter is an array element, its address is similarly computed at the time of call.
 

Aspect of Compilation


A compiler bridges the semantic gap between a PL(programming language) domain and an execution domain.

There are 2 aspects of compiler
1.
Generate code to implement meaning of a source program in the execution domain.
2.
Provide problem solving approach for harms of PL semantics in a source program.
     There are so many features which contribute to the semantic gap between a PL domain and an execution domain.

Data Types

Data structures

Scope rules

Control structure
1.
Data Type : it is the specification of legal values for variables of the type and legal operations on the legal values of the type. Semantics of datatype require a compiler to ensure that variables of a type are assigned or manipulated only through legal operations.
  
The following tasks are involved in ensuring this:

Checking legality of an operation for the types of its operands. This ensures that a variable is subjected only to legal of its type.

Use type conversion operations to convert values of one type into values of another type wherever necessary and permissible according to the rules of a PL.

Use appropriate instruction sequence of the target machine to implement the operations of a type. Ex :
  
Float x,y;
  Int
i=8,j
  
Y=5;
    X=y + i;

Ex:
    
int i;
    float
a,b;
    a=
b+I;


Assembly code:
    
CONV_R   AREG, I
    
ADD_R     AREG, B
    
MOVEM   AREG, A

Data Sturcture :
   
array , structure

Scope rules :
  
{
  
float x,y;
  
{
  
int a,b;
  b=
x+y
  
}
}

Control Structure :
   
if condition, switch….case, loop
Memory allocation

Memory allocation involves three important tasks:
1.
Determine the amount of memory required to represent the value of a data item.
2.
Use an appropriate memory allocation model to implement the lifetimes and scopes of data items.
3.
Determine appropriate memory mappings to access the values in a non scalar data item, ex : array.
Memory Binding

A memory binding is an association between the memory address attribute of a data item and the address of a memory area.
1.
Static memory allocation
2.
Dynamic automatic memory allocation
3.
Program controlled allocation
Extended Stack

A stack is a linear data structure. Which follow the concept of LIFO.

In data structure an area of memory is reserved for the stack.

A pointer called the stack base (SB) points to the first word of the stack .

Last element of the stack is called TOP OF Stack (TOS).

To SB and TOS 2 new pointers exits in the model.
1.
A record base pointer(RB) pointing to the first word of the last record in stack.
2.
The first word of each record is a reserved pointer.

Memory allocation & access

Automatic dynamic allocation is implemented using the extended stack model.

SB(stack base)

TOS 

RB(record base pointer)

Reserve pointer

Each record in the stack has 2 reserved pointers instead of one. Each record accommodates the variables for one activation of a block, hence we call it an activation record (AR). During the execution of a block structured program , a register called the activation record base(ARB)  always points to the start of the TOS record. This record belongs to the block which contains the statement being executed.

The first reserved pointer in a block AR points to the activation record of its dynamic parent. This called Dynamic pointer. The dynamic pointer is used for deallocating an AR.

Access to nonlocal variables is implemented using the second pointer called Static pointer. When AR is created for block its static pointer is set to point to the AR of static ancestor of the block.

Symbol table requirements : In order to implement dynamic allocation and access, a compiler should perform :
1.
Determine the static nesting level of  current block.
2.
Determine the variable designated by the name. this should be done in accordance with the scope rules.
3.
Determine the static nesting level of the block in which variables are defined.
4.
Generate the code.
A simple scheme to implement function 1,2 and 3.
  using extended stack model to organize symbol table.
When the start of block(current block) is encountered during compilation, a new record is pushed on the stack. It contains
1.
Nesting level of current block.
2.
Symbol table for current block.
The reserved pointer of the new record points to the previous record in the stack.
Compilation of expression

In source code we use so many types of expression, in code generation for expression following task are performed :
1.
Determination of an evaluation order for the operators in an expression.
2.
Selection of instruction to be used in the target code.
3.
Use of register and handling of partial results.
     The expression is evaluated according to precedence rule of operators. 

The choice of an instruction to be used in the target code depends on the following :
1.
The type and length of each operand.
2.
The addressability of each operand, where the operand is located and how it can be accessed.

The addressability of an operands indicates where an operand is located and how it can be accessed.

An operand descriptor to maintain the type, length and addressability information for each operand. The choice of an instruction can be made by analyzing an operator and the descriptors of its operands.

A partial result is the value of some sub expression computed while evaluating an expression.

Partial results are maintained in CPU registers. Some of them have to be moved to memory if no of results exceeds the no of available CPU registers. An important issue in code generation is when and how to move partial result is contained in a register. We use a register descriptor to maintain information.

Operand descriptors :
    
An operand descriptor has the following fields :
1.
Attributes : contains the subfields type, length and miscellaneous information.
2.
Addressability : specifies where the operand is located and how it can be accessed. It has 2 subfields
A.
Addressability code : Takes the values ‘M’ (operand is in memory) and ‘R’ (operand is in register).
B.
Address : Address of a CPU register or memory word.

Register Descriptor

A register descriptor has 2 fields
1.
Status : Contains the code free or occupied to indicate register status.
2.
Operand descriptor #: if status= occupied, this field contains the descriptor # for the operand contined in the register.

Register descriptions are stored in an array called Register descriptor. One register descriptor exists for each CPU register.