Chapter 13
Compilers
- Baseline Compiler
- JNI Compiler: the JNI compiler ”compiles” native methods by generating code to transition from Jikes RVM internal calling/register conventions to the native platforms ABI. It is almost completely platform-dependent.
- Optimizing Compiler
13.1 Baseline Compiler
13.1.1 General Architecture
The goal of the baseline compiler is to efficiently generate code that is ”obviously correct.” It also needs to be easy to port to a new platform and self contained (the entire baseline compiler must be included in all Jikes RVM boot images to support dynamically loading other compilers).
Roughly two thirds of the baseline compiler is machine-independent. The main file is BaselineCompiler and its parent TemplateCompilerFramework. The main platform-dependent file is BaselineCompilerImpl.
Baseline compilation consists of two main steps: GC map computation (discussed below) and code generation. The code generation in the baseline compilers is mostly straightforward, consisting of a single pass through the bytecodes of the method being compiled.
Differences in the hardware architectures lead to slightly different implementation strategies for the baseline compilers. For example, the IA32 baseline compiler does not try to optimize register usage, instead the bytecode operand stack is held in memory. This leads to bytecodes that push a constant onto the stack, creating a memory write in the generated machine code. The number of memory accesses in the IA32 baseline compiler corresponds directly to the number of bytecodes. In contrast to this, the PPC baseline compiler does some register allocation of local variables (and should probably do even more register allocation to properly exploit the register set).
TemplateCompilerFramework contains the main code generation switch statement that invokes the appropriate emit<bytecode>_ method of BaselineCompilerImpl.
13.1.2 GC Maps
The baseline compiler computes GC maps by abstractly interpreting the bytecodes to determine which expression stack slots and local variables contain references at the start of each bytecode. There are additional compilations to handle JSRs; see the source code for details. This strategy of computing a single GC map that applies to all the internal GC points for each bytecode slightly constrains code generation. The code generator must ensure that the GC map remains valid at all GC points (including implicit GC points introduced by null pointer exceptions). It also forces the baseline compiler to report reference parameters for the various invoke bytecodes as live in the GC map for the call (because the GC map also needs to cover the various internal GC points that happen before the call is actually performed). Note that this is not an issue for the optimizing compiler which computes GC maps for each machine code instruction that is a GC point.
13.2 Optimizing Compiler
The documentation for the optimizing compiler is organized into the following sections.
- Method Compilation: The fundamental unit for compilation in Jikes RVM is a single method.
- IR: The intermediate representation used by the optimizing compiler.
- BURS: The Bottom-Up Rewrite System (BURS) is used by the optimizing compiler for instruction selection.
- OptTestHarness: A test harness for compilation parameters for specific classes and methods.
13.2.1 Method Compilation
The fundamental unit for optimization in Jikes RVM is a single method. The optimization of a method consists of a series of compiler phases performed on the method. These phases transform the IR (intermediate representation) from bytecodes through HIR (high-level intermediate representation), LIR (low-level intermediate representation), and MIR (machine intermediate representation) and finally into machine code. Various optimizing transformations are performed at each level of IR.
An object of the class CompilationPlan contains all the information necessary to generate machine code for a method. An instance of this class includes, among other fields, the RVMMethod to be compiled and the array of OptimizationPlanElements which define the compilation steps. The execute method of an CompilationPlan invokes the optimizing compiler to generate machine code for the method, executing the compiler phases as listed in the plan’s OptimizationPlanElements.
The OptimizationPlanner class defines the standard phases used in a compilation. This class contains a static field, called masterPlan, which contains all possible OptimizationPlanElements. The structure of the master plan is a tree. Any element may either be an atomic element (a leaf of the tree), or an aggregate element (an internal node of the tree). The master plan has the following general structure:
- elements which convert bytecodes to HIR
- elements which perform optimization transformations on the HIR
- elements which perform optimization transformations using SSA form
- elements which convert HIR to LIR
- elements which perform optimization transformations on the LIR
- elements which perform optimization transformations using SSA form
- elements which convert LIR to MIR
- elements which perform optimization transformations on MIR
- elements which convert MIR to machine code
A client (compiler driver) constructs a specific optimization plan by including all the OptimizationPlanElements contained in the master plan which are appropriate for this compilation instance. Whether or not an element should be part of a compilation plan is determined by its shouldPerform method. For each atomic element, the values in the OptOptions object are generally used to determine whether the element should be included in the compilation plan. Each aggregate element must be included when any of its component elements must be included.
Each element must have a perform method defined which takes the IR as a parameter. It is expected, but not required, that the perform method will modify the IR. The perform method of an aggregate element will invoke the perform methods of its elements.
Each atomic element is an object of the final class OptimizationPlanAtomicElement. The main work of this class is performed by its phase, an object of type CompilerPhase. The CompilerPhase class is not final; each phase overrides this class, in particular it overrides the perform method, which is invoked by its enclosing element’s perform method. All the state associated with the element is contained in the CompilerPhase; no state is in the element.
Every optimization plan consists of a selection of elements from the master plan; thus two optimization plans associated with different methods will share the same component element objects. Clearly, it is undesirable to share state associated with a particular compilation phase between two different method compilations. In order to prevent this, the perform method of an atomic element creates a new instance of its phase immediately before calling the phase’s perform method. In the case where the phase contains no state the newExecution method of CompilerPhase can be overridden to return the phase itself rather than a clone of the phase.
13.2.2 IR
The optimizing compiler intermediate representation (IR) is held in an object of type IR and includes a list of instructions. Every instruction is classified into one of the pre-defined instruction formats. Each instruction includes an operator and zero or more operands. Instructions are grouped into basic blocks; basic blocks are constrained to having control-flow instructions at their end. Basic blocks fall-through to other basic blocks or contain branch instructions that have a destination basic block label. The graph of basic blocks is held in the cfg (control-flow graph) field of IR.
This section documents basic information about the intermediate represenation. For a tutorial based introduction to the material it is highly recommended that you read the presentation Jikes RVM Optimizing Compiler Intermediate Code Representation.
IR Operators
The IR operators are defined by the class Operators, which in turn is automatically generated from a template by a driver. The input to the driver are two files, both called OperatorList.dat. One input file resides in $RVM_ROOT/rvm/src-generated/opt-ir and defines machine-independent operators. The other resides in $RVM_ROOT/rvm/src-generated/opt-ir/$\{arch\} and defines machine-dependent operators, where $\{arch\} is the specific instruction architecture of interest.
Each operator in OperatorList.dat is defined by a five-line record, consisting of:
- SYMBOL: a static symbol to identify the operator
- INSTRUCTION_FORMAT: the instruction format class that accepts this operator.
- TRAITS: a set of characteristics of the operator, composed with a bit-wise or (|) operator. See Operator.java for a list of valid traits.
- IMPLDEFS: set of registers implicitly defined by this operator; usually applies only to machine-dependent operators
- IMPLUSES: set of registers implicitly used by this operator; usually applies only to machine-dependent operators
For example, the entry in OperatorList.dat that defines the integer addition operator is
The operator for a conditional branch based on values of two references is defined by
Additionally, the machine-specific OperatorList.dat file contains another line of information for use by the assembler. See the file for details.
Instruction Format
Every IR instruction fits one of the pre-defined Instruction Formats. The Java package org.jikesrvm.compilers.opt.ir defines roughly 75 architecture-independent instruction formats. For each instruction format, the package includes a class that defines a set of static methods by which optimizing compiler code can access an instruction of that format.
For example, INT_MOVE instructions conform to the Move instruction format. The following code fragment shows code that uses the Operators interface and the Move instruction format:
class X {
void foo(Instruction s) {
if (Move.conforms(s)) { // if this instruction fits the Move format
RegisterOperand r1 = Move.getResult(s);
Operand r2 = Move.getVal(s);
System.out.println("Found␣a␣move␣instruction:␣" + r1 + "␣:=␣" + r2);
} else {
System.out.println(s + "␣is␣not␣a␣MOVE");
}
}
}
This example shows just a subset of the access functions defined for the Move format. Other static access functions can set each operand (in this case, Result and Val), query each operand for nullness, clear operands, create Move instructions, mutate other instructions into Move instructions, and check the index of a particular operand field in the instruction. See the JavadocTM reference for a complete description of the API.
Each fixed-length instruction format is defined in the text file $RVM_ROOT/rvm/src-generated/opt-ir/InstructionFormatList.dat. Each record in this file has four lines:
- NAME: the name of the instruction format
- SIZES: the number of operands defined, defined and used, and used
- SIZES: the number of operands defined, defined and used, and used
- D/DU/U: Is this operand a def, use, or both?
- NAME: the unique name to identify the operand
- TYPE: the type of the operand (a subclass of Operand)
- [opt]: is this operand optional?
- VARSIG: a description of repeating operands, used for variable-length instructions.
So for example, the record that defines the Move instruction format is
This specifies that the Move format has two operands, one def and one use. The def is called Result and must be of type RegisterOperand. The use is called Val and must be of type Operand.
A few instruction formats have variable number of operands. The format for these records is given at the top of InstructionFormatList.dat. For example, the record for the variable-length Call instruction format is:
1 0 3 1 U 4
"D Result RegisterOperand" \
"U Address Operand" "U Method MethodOperand" "U Guard Operand opt"
"Param Operand"
This record defines the Call instruction format. The second line indicates that this format always has at least 4 operands (1 def and 3 uses), plus a variable number of uses of one other type. The trailing 4 on line 2 tells the template generator to generate special constructors for cases of having 1, 2, 3, or 4 of the extra operands. Finally, the record names the Call instruction operands and constrains the types. The final line specifies the name and types of the variable-numbered operands. In this case, a Call instruction has a variable number of (use) operands called Param. Client code can access the ith parameter operand of a Call instruction s by calling Call.getParam(s,i).
A number of instruction formats share operands of the same semantic meaning and name. For convenience in accessing like instruction formats, the template generator supports four common operand access types:
- ResultCarrier: provides access to an operand of type RegisterOperand named Result.
- GuardResultCarrier: provides access to an operand of type RegisterOperand named GuardResult.
- LocationCarrier: provides access to an operand of type LocationOperand named Location.
- GuardCarrier: provides access to an operand of type Operand named Guard.
For example, for any instruction s that carries a Result operand (eg. Move, Binary, and Unary formats), client code can call ResultCarrier.conforms(s) and ResultCarrier.getResult(s) to access the Result operand.
Finally, a note on rationale. Religious object-oriented philosophers will cringe at the InstructionFormats. Instead, all this functionality could be implemented more cleanly with a hierarchy of instruction types exploiting (multiple) inheritance. We rejected the class hierarchy approach due to efficiency concerns of frequent virtual/interface method dispatch and type checks. Recent improvements in our interface invocation sequence and dynamic type checking algorithms may alleviate some of this concern.
13.2.3 BURS
The optimizing compiler uses the Bottom-Up Rewrite System (BURS) for instruction selection. BURS is essentially a tree pattern matching system derived from Iburg by David R. Hanson. (See ”Engineering a Simple, Efficient Code-Generator Generator” by Fraser, Hanson, and Proebsting, LOPLAS 1(3), Sept. 1992, doi: 10.1145/151640.151642). The instruction selection rules for each architecture are specified in an architecture-specific file located in $RVM_ROOT/rvm/src-generated/opt-burs/${arch}, where ${arch} is the specific instruction architecture of interest. The rules are used in generating a parser, which transforms the IR.
Each rule is defined by a four-line record, consisting of:
- PRODUCTION: the tree pattern to be matched. The format of each pattern is explained below.
- COST: the cost of matching the pattern as opposed to skipping it. It is a JavaTM expression that evaluates to an integer.
- FLAGS: The flags for the operation:
- NOFLAGS: this production performs no operation
- EMIT_INSTRUCTION: this production will emit instructions
- LEFT_CHILD_FIRST: visit child on left-and side of production first
- RIGHT_CHILD_FIRST: visit child on right-hand side of production first
- TEMPLATE: Java code to emit
Each production has a non-terminal, which denotes a value, followed by a colon (”:”), followed by a dependence tree that produces that value. For example, the rule resulting in memory add on the Intel IA32 architecture is expressed in the following way:
ADDRESS_EQUAL(P(p), PLL(p), 17)
EMIT_INSTRUCTION
EMIT(MIR_BinaryAcc.mutate(P(p), IA32_ADD, MO_S(P(p), DW), BinaryAcc.getValue(PL(p))));
The production in this rule represents the following tree:
where r is a non-terminal that represents a register or a tree producing a register, riv is a non-terminal that represents a register (or a tree producing one) or an immediate value, and INT_LOAD, INT_ADD_ACC and INT_STORE are operators (terminals). OTHER_OPERAND is just an abstraction to make the tree binary.
There are multiple helper functions that can be used in Java code (both cost expressions and generation templates). In all code sequences the name p is reserved for the current tree node. Some of the helper methods are shortcuts for accessing properties of tree nodes:
- P(p) is used to access the instruction associated with the current (root) node,
- PL(p) is used to access the instruction associated with the left child of the current (root) node (provided it exists),
- PR(p) is used to access the instruction associated with the right child of the current (root) node (provided it exists), similarly, PLL(p), PLR(p), PRL(p) and PRR(p) are used to access the instruction associated with the left child of the left child, right child of the left child, left child of the right child and right child of the right child, respectively, of the current (root) node (provided they exist).
- VL(p) is used to access the integer constant value associated with the left child of the current (root) node
- VR(p) is used to access the integer constant value associated with the right child of the current (root) node
- See BURS_Common_Helpers class for definitions of the helper methods
How the above rule basically reads is as follows: If a tree shown above is seen, evaluate the cost expression (which, in this case, calls a helper function to test whether the addresses in the STORE (P(p)) and the LOAD (PLL(p)) instructions are equal. The function returns 17 if they are, and a special value INFINITE if not), and if the cost is acceptable, emit the STORE instruction (P(p)) mutated in place into a machine-dependent add-accumulate instruction (IA32_ADD) that adds a given value to the contents of a given memory location.
The rules file is used to generate a file called ir.brg, which, in turn, is used to produce a file called BURS_STATE.java. Note that if the BURS rules are changed, it is necessary to run ant real-clean in order to recreate the auto-generated Java source code for the BURS rules.
For more information on helper functions look at BURS_Helpers.java. For more information on the BURS algorithm see BURS.java.
Future directions Whilst jburg allows us to do good instruction selection there are a number of areas where it is lacking, e.g. vector operations.
We can’t write productions for vector operations unless we match an entire tree of operations. For example, it would be nice to write a rule of the form:
if say the architecture supported a vector add operation (ie SIMD). Unfortunately we can’t have tuples on the LHS of expressions and the comma represents that matching two coverings is necessary. Leupers has shown how to achieve this result with a modified BURS system. Their syntax is:
13.2.4 OptTestHarness
For optimizing compiler development, it is sometimes useful to exercise careful control over which classes are compiled, and with which optimization level. In many cases, a prototype-opt image will suit this process using the command line option -X:aos:initial_compiler=opt combined with -X:aos:enable_recompilation=false. This configuration invokes the optimizing compiler on each method run.The OptTestHarness provides even more control over the optimizing compiler. This driver program allows you to invoke the optimizing compiler as an ”application” running on top of the VM.
-useBootOptions | Use the same OptOptions as the bootimage compiler. |
-longcommandline <filename> | Read commands (one per line) from a file |
+baseline | Switch default compiler to baseline |
-baseline | Switch default compiler to optimizing |
-load <class> | Load a class |
-class <class> | Load a class and compile all its methods |
-method <class><method>[- or <descrip>] | Compile method with default compiler |
-methodOpt <class><method>[- or <descrip>] | Compile method with opt compiler |
-methodBase <class><method>[- or <descrip>] | Compile method with base compiler |
-er <class><method>[- or <descrip>] {args} | Compile with default compiler and execute a method |
-performance | Show performance results |
-oc | pass an option to the optimizing compiler |
Examples To use the OptTestHarness program:
will invoke the optimizing compiler on all methods of class Foo.
will invoke the optimizing compiler on the first method bar of class Foo it loads.
will invoke the optimizing compiler on method Foo.bar(I)V;. You can specify any number of -method and -class options on the command line. Any arguments passed to OptTestHarness via -oc will be passed on directly to the optimizing compiler. So:
will compile Foo.bar at optimization level O1 and print the final HIR.