13 Compilers

Chapter 13
Compilers

Baseline Compiler
JNI Compiler: the JNI compiler ”compiles” native methods by generating code to transition from Jikes RVM internal calling/register conventions to the native platforms ABI. It is almost completely platform-dependent.
Optimizing Compiler

13.1 Baseline Compiler

13.1.1 General Architecture

The goal of the baseline compiler is to eﬃciently generate code that is ”obviously correct.” It also needs to be easy to port to a new platform and self contained (the entire baseline compiler must be included in all Jikes RVM boot images to support dynamically loading other compilers).

Roughly two thirds of the baseline compiler is machine-independent. The main ﬁle is BaselineCompiler and its parent TemplateCompilerFramework. The main platform-dependent ﬁle is BaselineCompilerImpl.

Baseline compilation consists of two main steps: GC map computation (discussed below) and code generation. The code generation in the baseline compilers is mostly straightforward, consisting of a single pass through the bytecodes of the method being compiled.

Diﬀerences in the hardware architectures lead to slightly diﬀerent implementation strategies for the baseline compilers. For example, the IA32 baseline compiler does not try to optimize register usage, instead the bytecode operand stack is held in memory. This leads to bytecodes that push a constant onto the stack, creating a memory write in the generated machine code. The number of memory accesses in the IA32 baseline compiler corresponds directly to the number of bytecodes. In contrast to this, the PPC baseline compiler does some register allocation of local variables (and should probably do even more register allocation to properly exploit the register set).

TemplateCompilerFramework contains the main code generation switch statement that invokes the appropriate emit<bytecode>_ method of BaselineCompilerImpl.

13.1.2 GC Maps

The baseline compiler computes GC maps by abstractly interpreting the bytecodes to determine which expression stack slots and local variables contain references at the start of each bytecode. There are additional compilations to handle JSRs; see the source code for details. This strategy of computing a single GC map that applies to all the internal GC points for each bytecode slightly constrains code generation. The code generator must ensure that the GC map remains valid at all GC points (including implicit GC points introduced by null pointer exceptions). It also forces the baseline compiler to report reference parameters for the various invoke bytecodes as live in the GC map for the call (because the GC map also needs to cover the various internal GC points that happen before the call is actually performed). Note that this is not an issue for the optimizing compiler which computes GC maps for each machine code instruction that is a GC point.

13.2 Optimizing Compiler

The documentation for the optimizing compiler is organized into the following sections.

Method Compilation: The fundamental unit for compilation in Jikes RVM is a single method.
IR: The intermediate representation used by the optimizing compiler.
BURS: The Bottom-Up Rewrite System (BURS) is used by the optimizing compiler for instruction selection.
OptTestHarness: A test harness for compilation parameters for speciﬁc classes and methods.

13.2.1 Method Compilation

The fundamental unit for optimization in Jikes RVM is a single method. The optimization of a method consists of a series of compiler phases performed on the method. These phases transform the IR (intermediate representation) from bytecodes through HIR (high-level intermediate representation), LIR (low-level intermediate representation), and MIR (machine intermediate representation) and ﬁnally into machine code. Various optimizing transformations are performed at each level of IR.

An object of the class CompilationPlan contains all the information necessary to generate machine code for a method. An instance of this class includes, among other ﬁelds, the RVMMethod to be compiled and the array of OptimizationPlanElements which deﬁne the compilation steps. The execute method of an CompilationPlan invokes the optimizing compiler to generate machine code for the method, executing the compiler phases as listed in the plan’s OptimizationPlanElements.

The OptimizationPlanner class deﬁnes the standard phases used in a compilation. This class contains a static ﬁeld, called masterPlan, which contains all possible OptimizationPlanElements. The structure of the master plan is a tree. Any element may either be an atomic element (a leaf of the tree), or an aggregate element (an internal node of the tree). The master plan has the following general structure:

elements which convert bytecodes to HIR
elements which perform optimization transformations on the HIR
- elements which perform optimization transformations using SSA form
elements which convert HIR to LIR
elements which perform optimization transformations on the LIR
- elements which perform optimization transformations using SSA form
elements which convert LIR to MIR
elements which perform optimization transformations on MIR
elements which convert MIR to machine code

A client (compiler driver) constructs a speciﬁc optimization plan by including all the OptimizationPlanElements contained in the master plan which are appropriate for this compilation instance. Whether or not an element should be part of a compilation plan is determined by its shouldPerform method. For each atomic element, the values in the OptOptions object are generally used to determine whether the element should be included in the compilation plan. Each aggregate element must be included when any of its component elements must be included.

Each element must have a perform method deﬁned which takes the IR as a parameter. It is expected, but not required, that the perform method will modify the IR. The perform method of an aggregate element will invoke the perform methods of its elements.

Each atomic element is an object of the ﬁnal class OptimizationPlanAtomicElement. The main work of this class is performed by its phase, an object of type CompilerPhase. The CompilerPhase class is not ﬁnal; each phase overrides this class, in particular it overrides the perform method, which is invoked by its enclosing element’s perform method. All the state associated with the element is contained in the CompilerPhase; no state is in the element.

Every optimization plan consists of a selection of elements from the master plan; thus two optimization plans associated with diﬀerent methods will share the same component element objects. Clearly, it is undesirable to share state associated with a particular compilation phase between two diﬀerent method compilations. In order to prevent this, the perform method of an atomic element creates a new instance of its phase immediately before calling the phase’s perform method. In the case where the phase contains no state the newExecution method of CompilerPhase can be overridden to return the phase itself rather than a clone of the phase.

13.2.2 IR

The optimizing compiler intermediate representation (IR) is held in an object of type IR and includes a list of instructions. Every instruction is classiﬁed into one of the pre-deﬁned instruction formats. Each instruction includes an operator and zero or more operands. Instructions are grouped into basic blocks; basic blocks are constrained to having control-ﬂow instructions at their end. Basic blocks fall-through to other basic blocks or contain branch instructions that have a destination basic block label. The graph of basic blocks is held in the cfg (control-ﬂow graph) ﬁeld of IR.

This section documents basic information about the intermediate represenation. For a tutorial based introduction to the material it is highly recommended that you read the presentation Jikes RVM Optimizing Compiler Intermediate Code Representation.

IR Operators

The IR operators are deﬁned by the class Operators, which in turn is automatically generated from a template by a driver. The input to the driver are two ﬁles, both called OperatorList.dat. One input ﬁle resides in $RVM_ROOT/rvm/src-generated/opt-ir and deﬁnes machine-independent operators. The other resides in $RVM_ROOT/rvm/src-generated/opt-ir/$\{arch\} and deﬁnes machine-dependent operators, where $\{arch\} is the speciﬁc instruction architecture of interest.

Each operator in OperatorList.dat is deﬁned by a ﬁve-line record, consisting of:

SYMBOL: a static symbol to identify the operator
INSTRUCTION_FORMAT: the instruction format class that accepts this operator.
TRAITS: a set of characteristics of the operator, composed with a bit-wise or (|) operator. See Operator.java for a list of valid traits.
IMPLDEFS: set of registers implicitly deﬁned by this operator; usually applies only to machine-dependent operators
IMPLUSES: set of registers implicitly used by this operator; usually applies only to machine-dependent operators

For example, the entry in OperatorList.dat that deﬁnes the integer addition operator is

INT_ADD
Binary
none
<blank line>
<blank line>

The operator for a conditional branch based on values of two references is deﬁned by

REF_IFCOMP
IntIfCmp
branch | conditional
<blank line>
<blank line>

Additionally, the machine-speciﬁc OperatorList.dat ﬁle contains another line of information for use by the assembler. See the ﬁle for details.

Instruction Format

Every IR instruction ﬁts one of the pre-deﬁned Instruction Formats. The Java package org.jikesrvm.compilers.opt.ir deﬁnes roughly 75 architecture-independent instruction formats. For each instruction format, the package includes a class that deﬁnes a set of static methods by which optimizing compiler code can access an instruction of that format.

For example, INT_MOVE instructions conform to the Move instruction format. The following code fragment shows code that uses the Operators interface and the Move instruction format:

import org.jikesrvm.compilers.opt.ir.*;
class X {
  void foo(Instruction s) {
   if (Move.conforms(s)) {    // if this instruction fits the Move format
     RegisterOperand r1 = Move.getResult(s);
     Operand r2 = Move.getVal(s);
     System.out.println("Found␣a␣move␣instruction:␣" + r1 + "␣:=␣" + r2);
   } else {
     System.out.println(s + "␣is␣not␣a␣MOVE");
   }
  }
}

This example shows just a subset of the access functions deﬁned for the Move format. Other static access functions can set each operand (in this case, Result and Val), query each operand for nullness, clear operands, create Move instructions, mutate other instructions into Move instructions, and check the index of a particular operand ﬁeld in the instruction. See the Javadoc^TM reference for a complete description of the API.

Each ﬁxed-length instruction format is deﬁned in the text ﬁle $RVM_ROOT/rvm/src-generated/opt-ir/InstructionFormatList.dat. Each record in this ﬁle has four lines:

NAME: the name of the instruction format
SIZES: the number of operands deﬁned, deﬁned and used, and used
SIZES: the number of operands deﬁned, deﬁned and used, and used
- D/DU/U: Is this operand a def, use, or both?
- NAME: the unique name to identify the operand
- TYPE: the type of the operand (a subclass of Operand)
- [opt]: is this operand optional?
VARSIG: a description of repeating operands, used for variable-length instructions.

So for example, the record that deﬁnes the Move instruction format is

Move
1 0 1
"D Result RegisterOperand" "U Val Operand"
<blank line>

This speciﬁes that the Move format has two operands, one def and one use. The def is called Result and must be of type RegisterOperand. The use is called Val and must be of type Operand.

A few instruction formats have variable number of operands. The format for these records is given at the top of InstructionFormatList.dat. For example, the record for the variable-length Call instruction format is:

Call
1 0 3 1 U 4
"D Result RegisterOperand" \
"U Address Operand" "U Method MethodOperand" "U Guard Operand opt"
"Param Operand"

This record deﬁnes the Call instruction format. The second line indicates that this format always has at least 4 operands (1 def and 3 uses), plus a variable number of uses of one other type. The trailing 4 on line 2 tells the template generator to generate special constructors for cases of having 1, 2, 3, or 4 of the extra operands. Finally, the record names the Call instruction operands and constrains the types. The ﬁnal line speciﬁes the name and types of the variable-numbered operands. In this case, a Call instruction has a variable number of (use) operands called Param. Client code can access the ith parameter operand of a Call instruction s by calling Call.getParam(s,i).

A number of instruction formats share operands of the same semantic meaning and name. For convenience in accessing like instruction formats, the template generator supports four common operand access types:

ResultCarrier: provides access to an operand of type RegisterOperand named Result.
GuardResultCarrier: provides access to an operand of type RegisterOperand named GuardResult.
LocationCarrier: provides access to an operand of type LocationOperand named Location.
GuardCarrier: provides access to an operand of type Operand named Guard.

For example, for any instruction s that carries a Result operand (eg. Move, Binary, and Unary formats), client code can call ResultCarrier.conforms(s) and ResultCarrier.getResult(s) to access the Result operand.

Finally, a note on rationale. Religious object-oriented philosophers will cringe at the InstructionFormats. Instead, all this functionality could be implemented more cleanly with a hierarchy of instruction types exploiting (multiple) inheritance. We rejected the class hierarchy approach due to eﬃciency concerns of frequent virtual/interface method dispatch and type checks. Recent improvements in our interface invocation sequence and dynamic type checking algorithms may alleviate some of this concern.

13.2.3 BURS

The optimizing compiler uses the Bottom-Up Rewrite System (BURS) for instruction selection. BURS is essentially a tree pattern matching system derived from Iburg by David R. Hanson. (See ”Engineering a Simple, Eﬃcient Code-Generator Generator” by Fraser, Hanson, and Proebsting, LOPLAS 1(3), Sept. 1992, doi: 10.1145/151640.151642). The instruction selection rules for each architecture are speciﬁed in an architecture-speciﬁc ﬁle located in $RVM_ROOT/rvm/src-generated/opt-burs/${arch}, where ${arch} is the speciﬁc instruction architecture of interest. The rules are used in generating a parser, which transforms the IR.

Each rule is deﬁned by a four-line record, consisting of:

PRODUCTION: the tree pattern to be matched. The format of each pattern is explained below.
COST: the cost of matching the pattern as opposed to skipping it. It is a Java^TM expression that evaluates to an integer.
FLAGS: The ﬂags for the operation:
- NOFLAGS: this production performs no operation
- EMIT_INSTRUCTION: this production will emit instructions
- LEFT_CHILD_FIRST: visit child on left-and side of production ﬁrst
- RIGHT_CHILD_FIRST: visit child on right-hand side of production ﬁrst
TEMPLATE: Java code to emit

Each production has a non-terminal, which denotes a value, followed by a colon (”:”), followed by a dependence tree that produces that value. For example, the rule resulting in memory add on the Intel IA32 architecture is expressed in the following way:

stm: INT_STORE(INT_ADD_ACC(INT_LOAD(r,riv),riv),OTHER_OPERAND(r, riv))
ADDRESS_EQUAL(P(p), PLL(p), 17)
EMIT_INSTRUCTION
EMIT(MIR_BinaryAcc.mutate(P(p), IA32_ADD, MO_S(P(p), DW), BinaryAcc.getValue(PL(p))));

The production in this rule represents the following tree:

        r    riv
         \    /
        INT_LOAD  riv
            \    /
          INT_ADD_ACC  r  riv
                   \   |  /
                  INT_STORE

where r is a non-terminal that represents a register or a tree producing a register, riv is a non-terminal that represents a register (or a tree producing one) or an immediate value, and INT_LOAD, INT_ADD_ACC and INT_STORE are operators (terminals). OTHER_OPERAND is just an abstraction to make the tree binary.

There are multiple helper functions that can be used in Java code (both cost expressions and generation templates). In all code sequences the name p is reserved for the current tree node. Some of the helper methods are shortcuts for accessing properties of tree nodes:

P(p) is used to access the instruction associated with the current (root) node,
PL(p) is used to access the instruction associated with the left child of the current (root) node (provided it exists),
PR(p) is used to access the instruction associated with the right child of the current (root) node (provided it exists), similarly, PLL(p), PLR(p), PRL(p) and PRR(p) are used to access the instruction associated with the left child of the left child, right child of the left child, left child of the right child and right child of the right child, respectively, of the current (root) node (provided they exist).
VL(p) is used to access the integer constant value associated with the left child of the current (root) node
VR(p) is used to access the integer constant value associated with the right child of the current (root) node
See BURS_Common_Helpers class for deﬁnitions of the helper methods

How the above rule basically reads is as follows: If a tree shown above is seen, evaluate the cost expression (which, in this case, calls a helper function to test whether the addresses in the STORE (P(p)) and the LOAD (PLL(p)) instructions are equal. The function returns 17 if they are, and a special value INFINITE if not), and if the cost is acceptable, emit the STORE instruction (P(p)) mutated in place into a machine-dependent add-accumulate instruction (IA32_ADD) that adds a given value to the contents of a given memory location.

The rules ﬁle is used to generate a ﬁle called ir.brg, which, in turn, is used to produce a ﬁle called BURS_STATE.java. Note that if the BURS rules are changed, it is necessary to run ant real-clean in order to recreate the auto-generated Java source code for the BURS rules.

For more information on helper functions look at BURS_Helpers.java. For more information on the BURS algorithm see BURS.java.

Future directions Whilst jburg allows us to do good instruction selection there are a number of areas where it is lacking, e.g. vector operations.

We can’t write productions for vector operations unless we match an entire tree of operations. For example, it would be nice to write a rule of the form:

(r, r): ADD(r,r), ADD(r,r)

if say the architecture supported a vector add operation (ie SIMD). Unfortunately we can’t have tuples on the LHS of expressions and the comma represents that matching two coverings is necessary. Leupers has shown how to achieve this result with a modiﬁed BURS system. Their syntax is:

r: ADD(r,r)
r: ADD(r,r)

13.2.4 OptTestHarness

For optimizing compiler development, it is sometimes useful to exercise careful control over which classes are compiled, and with which optimization level. In many cases, a prototype-opt image will suit this process using the command line option -X:aos:initial_compiler=opt combined with -X:aos:enable_recompilation=false. This conﬁguration invokes the optimizing compiler on each method run.The OptTestHarness provides even more control over the optimizing compiler. This driver program allows you to invoke the optimizing compiler as an ”application” running on top of the VM.

-useBootOptions	Use the same OptOptions as the bootimage compiler.
-longcommandline <ﬁlename>	Read commands (one per line) from a ﬁle
+baseline	Switch default compiler to baseline
-baseline	Switch default compiler to optimizing
-load <class>	Load a class
-class <class>	Load a class and compile all its methods
-method <class><method>[- or <descrip>]	Compile method with default compiler
-methodOpt <class><method>[- or <descrip>]	Compile method with opt compiler
-methodBase <class><method>[- or <descrip>]	Compile method with base compiler
-er <class><method>[- or <descrip>] {args}	Compile with default compiler and execute a method
-performance	Show performance results
-oc	pass an option to the optimizing compiler

Table 13.1: OptTestHarness command line options

Examples To use the OptTestHarness program:

rvm org.jikesrvm.tools.oth.OptTestHarness -class Foo

will invoke the optimizing compiler on all methods of class Foo.

rvm org.jikesrvm.tools.oth.OptTestHarness -method Foo bar -

will invoke the optimizing compiler on the ﬁrst method bar of class Foo it loads.

rvm org.jikesrvm.tools.oth.OptTestHarness -method Foo bar ’(I)V;’

will invoke the optimizing compiler on method Foo.bar(I)V;. You can specify any number of -method and -class options on the command line. Any arguments passed to OptTestHarness via -oc will be passed on directly to the optimizing compiler. So:

rvm org.jikesrvm.tools.oth.OptTestHarness -oc:O1 -oc:print_final_hir=true -method Foo bar -

will compile Foo.bar at optimization level O1 and print the ﬁnal HIR.

Jikes RVM

Resources

Documentation

Project Information

Chapter 13
Compilers

13.1 Baseline Compiler

13.1.1 General Architecture

13.1.2 GC Maps

13.2 Optimizing Compiler

13.2.1 Method Compilation

13.2.2 IR

IR Operators

Instruction Format

13.2.3 BURS

13.2.4 OptTestHarness

Jikes RVM

Resources

Documentation

Project Information

Chapter 13Compilers

13.1 Baseline Compiler

13.1.1 General Architecture

13.1.2 GC Maps

13.2 Optimizing Compiler

13.2.1 Method Compilation

13.2.2 IR

IR Operators

Instruction Format

13.2.3 BURS

13.2.4 OptTestHarness

Chapter 13
Compilers