The Java Virtual Machine

While platform-independent bytecode provides complete portability between different hardware platforms, a physical CPU still can't execute it. The CPU only knows to execute its particular flavor of native code.

Note

Throughout this text, we will refer to code that is specific to a certain hardware architecture as native code. For example, x86 assembly language or x86 machine code is native code for the x86 platform. Machine code should be taken to mean code in binary platform-dependent format. Assembly language should be taken to mean machine code in human-readable form.

Thus, the JVM is required to turn the bytecodes into native code for the CPU on which the Java application executes. This can be done in one of the following two ways (or a combination of both):

  • The Java Virtual Machine specification fully describes the JVM as a state machine, so there is no need to actually translate bytecode to native code. The JVM can emulate the entire execution state of the Java program, including emulating each bytecode instruction as a function of the JVM state. This is referred to as bytecode interpretation. The only native code (barring JNI) that executes directly here is the JVM itself.
  • The Java Virtual Machine compiles the bytecode that is to be executed to native code for a particular platform and then calls the native code. When bytecode programs are compiled to native code, this is typically done one method at the time, just before the method in question is to be executed for the first time. This is known as Just-In-Time compilation (JIT).

Naturally, a native code version of a program executes orders of magnitude faster than an interpreted one. The tradeoff is, as we shall see, bookkeeping and compilation time overhead.

Stack machine

The Java Virtual Machine is a stack machine. All bytecode operations, with few exceptions, are computed on an evaluation stack by popping operands from the stack, executing the operation and pushing the result back to the stack. For example, an addition is performed by pushing the two terms to the stack, executing an add instruction that consumes the operands and produces a sum, which is placed on the stack. The party interested in the result of the addition then pops the result.

In addition to the stack, the bytecode format specifies up to 65,536 registers or local variables.

An operation in bytecode is encoded by just one byte, so Java supports up to 256 opcodes, from which most available values are claimed. Each operation has a unique byte value and a human-readable mnemonic.

Note

The only new bytecode value that has been assigned throughout the history of the Java Virtual Machine specification is -0xba previously reserved, but about to be used for the new operation invokedynamic. This operation can be used to implement dynamic dispatch when a dynamic language (such as Ruby) has been compiled to Java bytecode. For more information about using Java bytecode for dynamic languages, please refer to Java Specification Request (JSR) 292 on the Internet.

Bytecode format

Consider the following example of an add method in Java source code and then in Java bytecode format:

public int add(int a, int b) {
return a + b;
}
public int add(int, int);
Code:
0: iload_1 // stack: a
1: iload_2 // stack: a, b
2: iadd // stack: (a+b)
3: ireturn // stack:
}

The input parameters to the add method, a and b, are passed in local variable slots 1 and 2 (Slot 0 in an instance method is reserved for this, according to the JVM specification, and this particular example is an instance method). The first two operations, with opcodes iload_1 and iload_2, push the contents of these local variables onto the evaluation stack. The third operation, iadd, pops the two values from the stack, adds them and pushes the resulting sum. The fourth and final operation, ireturn, pops the sum from the bytecode stack and terminates the method using the sum as return value. The bytecode in the previous example has been annotated with the contents of the evaluation stack after each operation has been executed.

Note

Bytecode for a class can be dumped using the javap command with the -c command-line switch. The command javap is part of the JDK.

Operations and operands

As we see, Java bytecode is a relatively compact format, the previous method only being four bytes in length (a fraction of the source code mass). Operations are always encoded with one byte for the opcode, followed by an optional number of operands of variable length. Typically, a bytecode instruction complete with operands is just one to three bytes.

Here is another small example, a method that determines if a number is even or not. The bytecode has been annotated with the hexadecimal values corresponding to the opcodes and operand data.

public boolean even(int number) {
return (number & 1) == 0;
}
public boolean even(int);
Code:
0: iload_1 // 0x1b number
1: iconst_1 // 0x04 number, 1
2: iand // 0x7e (number & 1)
3: ifne 10 // 0x9a 0x00 0x07
6: iconst_1 // 0x03 1
7: goto 11 // 0xa7 0x00 0x04
10: iconst_0 // 0x03 0
11: ireturn // 0xac
}

The program pushes its in-parameter, number and the constant 1 onto the evaluation stack. The values are then popped, ANDed together, and the result is pushed on the stack. The ifne instruction is a conditional branch that pops its operand from the stack and branches if it is not zero. The iconst_0 operation pushes the constant 0 onto the evaluation stack. It has the opcode value 0x3 in bytecode and takes no operands. In a similar fashion iconst_1 pushes the constant 1. The constants are used for the boolean return value.

Compare and jump instructions, for example ifne (branch on not equal, bytecode 0x9a), generally take two bytes of operand data (enough for a 16 bit jump offset).

Note

For example, if a conditional jump should move the instruction pointer 10,000 bytes forward in the case of a true condition, the operation would be encoded as 0x9a 0x27 0x10 (0x2710 is 10,000 in hexadecimal. All values in bytecode are big-endian).

Other more complex constructs such as table switches also exist in bytecode with an entire jump table of offsets following the opcode in the bytecode.

The constant pool

A program requires data as well as code. Data is used for operands. The operand data for a bytecode program can, as we have seen, be kept in the bytecode instruction itself. But this is only true when the data is small enough, or commonly used (such as the constant 0).

Larger chunks of data, such as string constants or large numbers, are stored in a constant pool at the beginning of the .class file. Indexes to the data in the pool are used as operands instead of the actual data itself. If the string aVeryLongFunctionName had to be separately encoded in a compiled method each time it was operated on, bytecode would not be compact at all.

Furthermore, references to other parts of the Java program in the form of method, field, and class metadata are also part of the .class file and stored in the constant pool.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset