While platform-independent bytecode provides complete portability between different hardware platforms, a physical CPU still can't execute it. The CPU only knows to execute its particular flavor of native code.
Throughout this text, we will refer to code that is specific to a certain hardware architecture as native code. For example, x86 assembly language or x86 machine code is native code for the x86 platform. Machine code should be taken to mean code in binary platform-dependent format. Assembly language should be taken to mean machine code in human-readable form.
Thus, the JVM is required to turn the bytecodes into native code for the CPU on which the Java application executes. This can be done in one of the following two ways (or a combination of both):
Naturally, a native code version of a program executes orders of magnitude faster than an interpreted one. The tradeoff is, as we shall see, bookkeeping and compilation time overhead.
The Java Virtual Machine is a stack machine. All bytecode operations, with few exceptions, are computed on an evaluation stack by popping operands from the stack, executing the operation and pushing the result back to the stack. For example, an addition is performed by pushing the two terms to the stack, executing an add instruction that consumes the operands and produces a sum, which is placed on the stack. The party interested in the result of the addition then pops the result.
In addition to the stack, the bytecode format specifies up to 65,536 registers or local variables.
An operation in bytecode is encoded by just one byte, so Java supports up to 256 opcodes, from which most available values are claimed. Each operation has a unique byte value and a human-readable mnemonic.
The only new bytecode value that has been assigned throughout the history of the Java Virtual Machine specification is -0xba
previously reserved, but about to be used for the new operation invokedynamic
. This operation can be used to implement dynamic dispatch when a dynamic language (such as Ruby) has been compiled to Java bytecode. For more information about using Java bytecode for dynamic languages, please refer to Java Specification Request (JSR) 292 on the Internet.
Consider the following example of an add
method in Java source code and then in Java bytecode format:
public int add(int a, int b) {
return a + b;
}
public int add(int, int);
Code:
0: iload_1 // stack: a
1: iload_2 // stack: a, b
2: iadd // stack: (a+b)
3: ireturn // stack:
}
The input parameters to the add
method, a
and b
, are passed in local variable slots 1
and 2
(Slot 0
in an instance method is reserved for this
, according to the JVM specification, and this particular example is an instance method). The first two operations, with opcodes iload_1
and iload_2
, push the contents of these local variables onto the evaluation stack. The third operation, iadd
, pops the two values from the stack, adds them and pushes the resulting sum. The fourth and final operation, ireturn
, pops the sum from the bytecode stack and terminates the method using the sum as return value. The bytecode in the previous example has been annotated with the contents of the evaluation stack after each operation has been executed.
Bytecode for a class can be dumped using the javap
command with the -c
command-line switch. The command javap
is part of the JDK.
As we see, Java bytecode is a relatively compact format, the previous method only being four bytes in length (a fraction of the source code mass). Operations are always encoded with one byte for the opcode, followed by an optional number of operands of variable length. Typically, a bytecode instruction complete with operands is just one to three bytes.
Here is another small example, a method that determines if a number is even or not. The bytecode has been annotated with the hexadecimal values corresponding to the opcodes and operand data.
public boolean even(int number) {
return (number & 1) == 0;
}
public boolean even(int);
Code:
0: iload_1 // 0x1b number
1: iconst_1 // 0x04 number, 1
2: iand // 0x7e (number & 1)
3: ifne 10 // 0x9a 0x00 0x07
6: iconst_1 // 0x03 1
7: goto 11 // 0xa7 0x00 0x04
10: iconst_0 // 0x03 0
11: ireturn // 0xac
}
The program pushes its in-parameter, number
and the constant 1
onto the evaluation stack. The values are then popped, ANDed together, and the result is pushed on the stack. The ifne
instruction is a conditional branch that pops its operand from the stack and branches if it is not zero. The iconst_0
operation pushes the constant 0
onto the evaluation stack. It has the opcode value 0x3
in bytecode and takes no operands. In a similar fashion iconst_1
pushes the constant 1
. The constants are used for the boolean
return value.
Compare and jump instructions, for example ifne
(branch on not equal, bytecode 0x9a
), generally take two bytes of operand data (enough for a 16 bit jump offset).
For example, if a conditional jump should move the instruction pointer 10,000 bytes forward in the case of a true
condition, the operation would be encoded as 0x9a 0x27 0x10
(0x2710
is 10,000 in hexadecimal. All values in bytecode are big-endian).
Other more complex constructs such as table switches also exist in bytecode with an entire jump table of offsets following the opcode in the bytecode.
A program requires data as well as code. Data is used for operands. The operand data for a bytecode program can, as we have seen, be kept in the bytecode instruction itself. But this is only true when the data is small enough, or commonly used (such as the constant 0).
Larger chunks of data, such as string constants or large numbers, are stored in a constant pool at the beginning of the .class
file. Indexes to the data in the pool are used as operands instead of the actual data itself. If the string aVeryLongFunctionName
had to be separately encoded in a compiled method each time it was operated on, bytecode would not be compact at all.
Furthermore, references to other parts of the Java program in the form of method, field, and class metadata are also part of the .class
file and stored in the constant pool.