Difference between revisions of "The register allocator"

From Free Pascal wiki
(Introduction: small language fixes)
Line 9: Line 9:
Then it calculates the real registers to replace the imaginary ones.
Then it calculates the real registers to replace the imaginary ones.
The fpc register allocator uses the Register coloring for determining the real registers.
The FPC register allocator uses Register coloring to determine the real registers.
Also uses register spilling technique - when there is not enough registers it uses the memory.
It also uses a register spilling technique - when there is not enough registers it uses memory.
== How to use the register allocator''' ==
== How to use the register allocator''' ==

Revision as of 16:20, 19 December 2013

Register allocation in FPC


Main unit for the register allocation is rgobj.pas
Main class for the register allocation is TRgObj located in rgobj.pas

Register allocator provides imaginary registers for the assembler instructions during the code generation. Then it calculates the real registers to replace the imaginary ones.

The FPC register allocator uses Register coloring to determine the real registers. It also uses a register spilling technique - when there is not enough registers it uses memory.

How to use the register allocator

This topic describes how to use the register allocator during the code generation.
Described like a black box with the public methods you can call to get job done.

Creating the Register allocator

Creating register allocator is the first step we make before we can use its functionality.

The low level code generator creates several instances of the TRgObj class. Normally this happens in:


And the instances are normally freed in:


Each TRgObj instance allocates registers of a certain type. For example, one register type is Integer registers. Another one is floating point (FPU) registers. That's why we have a few TRgObj instances, one for every type of register that the cpu supports.

They are created when the code generation of specific routine begins. Code generator works on subroutine level. The register allocator also allocates registers for specific method, procedure, function.

Using registers in the code generation

To allocate a register use the following method:


for the appropriate register allocator instance (depending on the register type) to get register for some assembler instruction. This way an imaginary register is allocated and can be used after that in a specific assembler instruction.

The code generator calls it in the following functions:


Additionally, the following methods:


allocate or free one or multiple real registers (not imaginary ones!) This is usually used for instructions that require a specific register (for example SHL/SHR/SAR on x86, which always uses ECX). This is also used when doing a function call to indicate that certain registers (which depend on the calling convention) may be destroyed by the called function.

After allocating the register we can use it in some assembler instructions. For every instruction that generates, the code generator notifies the register allocator. It passes the instruction, also the imaginary register as parameters to the following method.

         TRgObj.add_reg_instruction(instr, r, cg.executionweight);

But for MOV instruction there is specific method that is used


Also weight is provided to determine which register should be spilled and which is better to use real cpu register.

Generating of real registers

At the end when all the assembler instructions are generated we call

       do_register_allocation(list: TAsmList; headerTai: TAi)

It calculates real registers for the imaginary ones.

Some instructions can use only certain types of registers. So, the register allocator can not choose from all the Cpu registers. For these instructions we notify the register allocator by calling the method:


These interferences are build during the register allocation. Every descendent that wants to add some interferences overrides the method

       procedure TRgObj.add_cpu_interferences(p: TAi); virtual;

We have descendents of TRgObj for the different kind of processors. They call add_edge in add_cpu_interferences depending on the specific CPU architecture. It is used for Arm processors, x86 processors etc.


You can compile your project by using the -sr switch. This will leave the immaginary register names in the generated .s file

Calls hierarchy for the public methods




do_register_allocation - level 1 is ordered by calling
    add_edges_used(1, 2)			get_alias						add_edge
      add_edge												ibitmap.s
    make_work_list				ri_coalesced
    simplify					ri_coalesced
      decrement_degree				ri_coalesced
    coalesce					get_alias		simplifyworklist.add(v);	ibitmap
      conservative				ri_coalesced
      adjacent_ok				ri_coalesced						ibitmap
      add_worklist							simplifyworklist.add(u);
      combine												ibitmap, add_edge
        decrement_degree			ri_coalesced		simplifyworklist.add(m)
        add_edge											ibitmap.s
      freeze_moves(, 2)				get_alias(3)
      freeze_moves(, 2)				get_alias(3)
    assign_colours(, 2)				get_alias
  epilogue_colouring		Destroys the objects used during the coloring - worklist_moves, active_moves, frozen_moves, coalesced_moves, constrained_moves, reginfo.movelist
    clear_interferences											ibitmap.s
    instr_spill_register			get_alias
        add_edges_used(2, 2)			get_alias
	  add_edge											ibitmap.s
    assign_colours(, 2)				get_alias



combine						ri_coalesced(s)
  decrement_degree				ri_coalesced





Class TRgObj


Main class for register allocation in FPC

Public methods

Protected methods

Private methods

Imaginary registers

Storing imaginary registers

TRgObj stores the imaginary registers in list of TRegInfo structure.

Life of imaginary register

Life of imaginary register starts when it is first used by assembler instruction.
When it first appears in an assembler instruction in the assembler list for the specific routine.
And ends when it is last used in an assembler instruction. In the TRegInfo structure for the specific imaginary register there are 2 fields for the life of the register.
live_start: TAi; indicates which is the assembler instruction where the register is first used.
live_end: TAi; indicates which is the assembler instruction where the register is last used.


Grouping the instructions

More abstract look at the CPU instructions shows us that we have mainly few kinds of instructions. - The first group contains instructions that changes some data. This data remains on the same place but it is changed. Also do not need extra data ot perform the change Some instructions like these are inc, dec, CLC, CLD

- Second group are instructions that move date from one place to another. Such as MOV, LAHF, POP, PUSH, LEA

- Third group of instructions are those to compare data. They do not change the data. just check it. Such instructions are TEST, COMPARE

- Fourth group of instructions are those who change the flow of the execution of the instructions such as: JMP, Jcc, JCXZ,

- Fifth group are instructions that changes data based on other data like ADD,

Another grouping of instructions could be by the cpu version where the instruction first appear.

Another grouping of instructions could be based on the registers that can be used in the instruction. Some instructions can use only specific registers for the address - mov ax, [bx + si + 1000] You can choose between for example only these registers - BP, BX, SI и DI Some instructions can use only registers, some addresses, some both. Some registers have subregisters some do not.

Data in the instructions

Alsmost all of the instructions operate on some data. Also uses some data to perform operation. The data can be in the cpu registers, memory, cpu flags.

Programming using the instructions

The instructions are independant one from other.
However when a program is created there are some logical dependencies that programmer introduce.
Programmer wants a few modifications to be performed on some data.
Which can be performed by few instructions executed one after another.
And at the end the needed data will be produced.
But data is not exactly Cpu register, or memory address.
Data is something the the programmer cares about.
For example
we have some starting data that equals 1000. we can put it in cpu register
we want to add to it 200. we can use Asm instruction add
but then we can move the value that we reached in another register.
And then to increment it by 300
at the end we will have the needed value 1500. But we used 2 registers for one data.
Normally we don't have to use 2 registers. We can do it using one.
Cause we have to move the data to second register somewhere in the middle. And this is extra instruction.
However more than one data is parallel processed. And sometimes we can use this extra move.
Just to speed up executions on other data.
This is normally needed when to datas are produced that will calculate third one.

This is not very important for the register allocator. It is more dedicated to the code generator. Code generator could decide where to add extra instruction, to speed other executions. It look like register allocator should not care about this. But it's good to thing about it.

Object Design guidelines

In the current implementation the getting of the registers and the allocation are mixed together. Also they share same structures, same class.
But the data for each of them is used on the completely different phases.
Mixing them togather brings more code on one place.
Also its getting difficult to separate them when you work on them.
They can be easily devided without lost of speed.
Also different types of register allocations can be used. The getting of the registers and setting the weights, also life ... can be on one place.
But at the end different allocators can process this data.
It is difficult for further develop the register allocation cause many phases and logic was brought together on one place.

in the TRgObj.colour_registers method different approach can be performed.
Entering different states can be implemented.
Instead checking the counts of all lists.
It will be faster. Clear code. Easy for debugging. Easy to develop further

Improvements can be made in the weights of the instructions. Or in the way the allocator decides which register to spill and which not.