There's nothing remarkable about it. All one has to do is hit the right keys at the right time and the instrument plays itself. | ||
--Johann Sebastian Bach |
A program starts as a sequence of characters contained in a file—the source code. Interpreting those characters, according to the rules of a given language, is the job of the compiler, or interpreter. Some characters will represent the names of variables, others will be special keywords used by the language, still others will be operators or “punctuation” characters used to separate the other elements. All of these textual constructs form the lexical elements of the program. These lexical elements must be identified as keywords, comments, literals, variables, operators, or whatever else is appropriate for the given language. In this chapter we look at the basic lexical elements of a Java program, the literal values that can be expressed and the different kinds of variables that can hold those values.
One of the first phases of compilation is the scanning of the lexical elements into tokens. This phase ignores whitespace and comments that appear in the text—so the language must define what form whitespace and comments take. The remaining sequence of characters must then be parsed into tokens.
Most programmers are familiar with source code that is prepared using one of two major families of character representations: ASCII
and its variants (including Latin-1) and EBCDIC
. Both character sets contain characters used in English and several other Western European languages.
The Java programming language, on the other hand, is written in a 16-bit encoding of Unicode. The Unicode standard originally supported a 16-bit character set, but has expanded to allow for up to 21-bit characters with a maximum value of 0x10ffff. The characters above the value 0x00ffff are termed the supplementary characters. Any particular 21-bit value is termed a code point. To allow all characters to be represented by 16-bit values, Unicode defines an encoding format called UTF
-16, and this is how the Java programming language represents text. In UTF
-16 all the values between 0x0000 and 0xffff map directly to Unicode characters. The supplementary characters are encoded by a pair of 16-bit values: The first value in the pair comes from the high-surrogates range, and the second comes from the low-surrogates range. Methods that want to work with individual code point values can either accept a UTF
-16 encoded char[]
of length two, or a single int
that holds the code point directly. An individual char
in a UTF
-16 sequence is termed a code unit.
The first 256 characters of Unicode are the Latin-1 character set, and most of the first 128 characters of Latin-1 are equivalent to the 7-bit ASCII
character set. Current environments read ASCII
or Latin-1 files, converting them to Unicode on the fly.[1]
Few existing text editors support Unicode characters, so you can use the escape sequence u
xxxx
to encode Unicode characters, where each x
is a hexadecimal digit (0
–9
, and a
–f
or A
–F
to represent decimal values 10–15). This sequence can appear anywhere in code—not only in character and string constants but also in identifiers. More than one u
may appear at the beginning; thus, the character can be written as u0b87
or uuu0b87
.[2] Also note that if your editor does support Unicode characters (or a subset), you may need to tell your compiler if your source code contains any character that is not part of the default character encoding for your system—such as through a command-line option that names the source character set.
Exercise 7.1: Just for fun, write a “Hello, World” program entirely using Unicode escape sequences.
Comments within source code exist for the convenience of human programmers. They play no part in the generation of code and so are ignored during scanning. There are three kinds of comments:
| Characters from |
| All characters between |
| All characters between These documentation comments come immediately before identifier declarations and are included in automatically generated documentation. These comments are described in Chapter 19. |
Comments can include any valid Unicode character, such as yin-yang (u262f
), asterism (u2042
), interrobang (u203d
), won (u20a9
), scruple (u2108
), or a snowman (u2603
).[3]
Comments do not nest. This following tempting code does not compile:
/* Comment this out for now: not implemented /* Do some really neat stuff */ universe.neatStuff(); */
The first /*
starts a comment; the very next */
ends it, leaving the code that follows to be parsed; and the invalid, stand-alone */
is a syntax error. The best way to remove blocks of code from programs is either to put a //
at the beginning of each line or use if(false)
like this:
This technique requires that the code to be removed is complete enough to compile without error. In this case we assume that the dwim
method is defined somewhere.
The tokens of a language are its basic words. A parser breaks source code into tokens and then tries to figure out which statements, identifiers, and so forth make up the code. Whitespace (spaces, tabs, newlines, and form feeds) is not significant except to separate tokens or as the contents of character or string literals. You can take any valid code and replace any amount of intertoken whitespace (whitespace outside strings and characters) with a different amount of whitespace (but not none) without changing the meaning of the program.
Whitespace must be used to separate tokens that would otherwise constitute a single token. For example, in the statement
return 0;
you cannot drop the space between return
and 0
because that would create
return0;
consisting of the single identifier return0
. Use extra whitespace appropriately to make your code human-readable, even though the parser ignores it. Note that the parser treats comments as whitespace.
The tokenizer is a “greedy” tokenizer. It grabs as many characters as it can to build up the next token, not caring if this creates an invalid sequence of tokens. So because ++
is longer than +
, the expression
is interpreted as the invalid expression
instead of the valid
Identifiers, used for names of declared entities such as variables, constants, and labels, must start with a letter, followed by letters, digits, or both. The terms letter and digit are broad in Unicode: If something is considered a letter or digit in a human language, you can probably use it in identifiers. “Letters” can come from Armenian, Korean, Gurmukhi, Georgian, Devanagari, and almost any other script written in the world today. Thus, not only is kitty
a valid identifier, but , , , , and are, too.[4] Letters also include any currency symbol (such as $
, ¥
, and £
) and connecting punctuation (such as _
).
Any difference in characters within an identifier makes that identifier unique. Case is significant: A
, a
, á
, À
, Å
, and so on are different identifiers. Characters that look the same, or nearly the same, can be confused. For example, the Latin capital letter n “N” and the Greek capital ν “”” look alike but are different characters (u004e
and u039d
, respectively). The only way to avoid confusion is to write each identifier in one language—and thus in one known set of characters—so that programmers trying to type the identifier will know whether you meant E
or E
.[5]
Identifiers can be as long as you like, but use some taste. Identifiers that are too long are hard to use correctly and actually obscure your code.
Language keywords cannot be used as identifiers because they have special meaning within the language. The following table lists the keywords (keywords marked with a † are reserved but currently unused):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Although they appear to be keywords, null
, true,
and false
are formally literals, just like the number 12, so they do not appear in the above table. However, you cannot use null
, true
, or false
as identifiers, just as you cannot use 12
as an identifier. These words can be used as parts of identifiers, as in annulled
, construe
, and falsehood
.
Every expression has a type that determines what values the expression can produce. The type of an expression is determined by the types of values and variables used within that expression. Types are divided into the primitive types and the reference types.
The primitive data types are:
| either |
| 16-bit Unicode |
| 8-bit signed two's-complement integer |
| 16-bit signed two's-complement integer |
| 32-bit signed two's-complement integer |
| 64-bit signed two's-complement integer |
| 32-bit |
| 64-bit |
Each primitive data type has a corresponding class type in the java.lang
package. These wrapper classes—Boolean
, Character
, Byte
, Short
, Integer
, Long
, Float
, and Double
—also define useful constants and methods. For example, most wrapper classes declare constants MIN_VALUE
and MAX_VALUE
that hold the minimum and maximum values for the associated primitive type.
The Float
and Double
classes also have NaN
, NEGATIVE_INFINITY
, and POSITIVE_INFINITY
constants. Both also provide an isNaN
method that tests whether a floating-point value is “Not a Number”—that is, whether it is the result of a floating-point expression that has no valid result, such as dividing zero by zero. The NaN
value can be used to indicate an invalid floating-point value; this is similar to the use of null
for object references that do not refer to anything. The wrapper classes are covered in detail in Chapter 8.
There is no unsigned integer type. If you need to work with unsigned values originating outside your program, they must be stored in a larger signed type. For example, unsigned bytes produced by an analog-to-digital converter, can be read into variables of type short
.
The reference types are class types, interface types, and array types. Variables of these types can refer to objects of the corresponding type.
Each type has literals, which are the way that constant values of that type are written. The next few subsections describe how literal (unnamed) constants for each type are specified.
The only literal object reference is null
. It can be used anywhere a reference is expected. Conventionally, null
represents an invalid or uncreated object. It has no class, not even Object
, but null
can be assigned to any reference variable.
Character literals appear with single quotes: 'Q'
. Any valid Unicode character can appear between the quotes. You can use u
xxxx
for Unicode characters inside character literals just as you can elsewhere. Certain special characters can be represented by an escape sequence:
| newline |
| tab |
backspace | |
| return |
| form feed |
| backslash itself |
| single quote |
| double quote |
| a |
Octal character constants can have three or fewer digits and cannot exceed 377
(u00ff
)—for example, the character literal '12'
is the same as '
'
. Supplemental characters can not be represented in a character literal.
Integer constants are strings of octal, decimal, or hexadecimal digits. The start of a constant declares the number's base: A 0
(zero) starts an octal number (base 8); a 0x
or 0X
starts a hexadecimal number (base 16); and any other digit starts a decimal number (base 10). All the following numbers have the same value:
29 035 0x1D 0X1d
Integer constants are long
if they end in L
or l
, such as 29L
; L
is preferred over l
because l
(lowercase L
) can easily be confused with 1
(the digit one). Otherwise, integer constants are assumed to be of type int
. If an int
literal is directly assigned to a short
, and its value is within the valid range for a short
, the integer literal is treated as if it were a short
literal. A similar allowance is made for integer literals assigned to byte
variables. In all other cases you must explicitly cast when assigning int
to short
or byte
(see “Explicit Type Casts” on page 219).
Floating-point constants are expressed in either decimal or hexadecimal form. The decimal form consists of a string of decimal digits with an optional decimal point, optionally followed by an exponent—the letter e
or E
, followed by an optionally signed integer. At least one digit must be present. All these literals denote the same floating-point number:
18. 1.8e1 .18E+2 180.0e-1
The hexadecimal form consists of 0x (or 0X), a string of hexadecimal digits with an optional hexadecimal point, followed by a mandatory binary exponent—the letter p
or P
, followed by an optionally signed integer. The binary exponent represents scaling by two raised to a power. All these literals also denote the same floating-point number (decimal 18.0):
0x12p0 0x1.2p4 0x.12P+8 0x120p-4
Floating-point constants are of type double
unless they are specified with a trailing f
or F
, which makes them float
constants, such as 18.0f
. A trailing d
or D
specifies a double
constant. There are two zeros: positive (0.0
) and negative (-0.0
). Positive and negative zero are considered equal when you use ==
but produce different results when used in some calculations. For example, if dividing by zero, the expression 1d/0d
is +∞, whereas 1d/-0d
is -
∞. There are no literals to represent either infinity or NaN, only the symbolic constants defined in the Float
and Double
classes—see Chapter 8.
A double
constant cannot be assigned directly to a float
variable, even if the value of the double
is within the valid float
range. The only constants you may directly assign to float
variables and fields are float
constants.
String literals appear with double quotes: "along"
. Any character can be included in string literals, with the exception of newline and "
(double quote). Newlines are not allowed in the middle of strings. If you want to embed a newline character in the string, use the escape sequence
. To embed a double quote use the escape sequence "
. A string literal references an object of type String
. To learn more about strings, see Chapter 13.
Characters in strings can be specified with the octal digit syntax, but all three octal digits should be used to prevent accidents when an octal value is specified next to a valid octal digit in the string. For example, the string " 116"
is equivalent to " 6"
, whereas the string "116"
is equivalent to "N"
.
Every type (primitive or reference) has an associated instance of class Class
that represents that type. These instances are often referred to as the class object for a given type. You can name the class object for a type directly by following the type name with ".class"
, as in
The first two of these class literals refer to the same instance of class Class
because String
and java.lang.String
are two different names for the same type. The third class literal is a reference to the Class
instance for the Iterator
interface mentioned on page 129. The last is the Class
instance that represents the primitive type boolean
.
Since class Class
is generic, the actual type of the class literal for a reference type T
is Class<T>
, while for primitive types it is Class<W>
where W
is the wrapper class for that primitive type. But note, for example, that boolean.class
and Boolean.class
are two different objects of type Class<Boolean>
. Generic types are discussed in Chapter 11, and the class Class
is discussed in Chapter 16.
Exercise 7.2: Write a class that declares a field for each of the primitive numeric types, and try to assign values using the different literal forms—for example, try to assign 3.5f
to an int
field. Which literals can be used with which type of field? Try changing the magnitude of the values used to see if that affects things.
A variable is a storage location[6] —something that can hold a value—to which a value can be assigned. Variables include fields, local variables in a block of code, and parameters. A variable declaration states the identifier (name), type, and other attributes of a variable. The type part of a declaration specifies which kinds of values and behavior are supported by the declared entity. The other attributes of a variable include annotations and modifiers. Annotations can be applied to any variable declaration and are discussed in Chapter 15.
Fields and local variables are declared in the same way. A declaration is broken into three parts: modifiers, followed by a type, followed by a list of identifiers. Each identifier can optionally have an initializer associated with it to give it an initial value.
There is no difference between variables declared in one declaration or in multiple declarations of the same type. For example:
float x, y;
is the same as
float x; float y;
Any initializer is expressed as an assignment (with the =
operator) of an expression of the appropriate type. For example:
float x = 3.14f, y = 2.81f;
is the same as the more readable
float x = 3.14f, y = 2.81f;
is the same as the preferred
float x = 3.14f; float y = 2.81f;
Field variables are members of classes, or interfaces, and are declared within the body of that class or interface. Fields can be initialized with an initializer, within an initialization block, or within a constructor, but need not be initialized at all because they have default initial values, as discussed on page 44. Field initialization and the modifiers that can be applied to fields were discussed in Chapter 2.
Local variables can be declared anywhere within a block of statements, not just at the start of the block, and can be of primitive or reference type. As a special case, a local variable declaration is also permitted within the initialization section of a for
loop—see “for” on page 236. A local variable must be assigned a value before it is used.[7] There is no default initialization value for local variables because failure to assign a starting value for one is usually a bug. The compiler will refuse to compile code that doesn't ensure that assignment takes place before a variable is used:
int x; // uninitialized, can't use int y = 2; x = y * y; // now x has a value int z = x; // okay, safe to use x
Local variables cease to exist when the flow of control reaches the end of the block in which they were declared—though any referenced object is subject to normal garbage collection rules.
Apart from annotations, the only modifier that can be applied to a local variable is final
. This is required when the local variable will be accessed by a local or anonymous inner class—see also the discussion of final variables below.
Parameter variables are the parameters declared in methods, constructors, or catch
blocks—see “try, catch, and finally” on page 286. A parameter declaration consists of an optional modifier, a type name, and a single identifier.
Parameters cannot have explicit initializers because they are implicitly initialized with the value of the argument passed when the method or constructor is invoked, or with a reference to the exception object caught in the catch
block. Parameter variables cease to exist when the block in which they appear completes.
As with local variables, the only modifiers that can be applied to a parameter are annotations, or the final
modifier.
The final
modifier declares that the value of the variable is set exactly once and will thereafter always have the same value—it is immutable. Any variable—fields, local variables, or parameters—can be declared final
. Variables that are final
must be initialized before they are used. This is typically done directly in the declaration:
You can defer the initialization of a final field or local variable. Such a final variable is called a blank final. A blank final field must be initialized within an initialization block or constructor (if it's an instance field) while a blank final local variable, like any local variable, must be initialized before it is used.
Blank final fields are useful when the value of the field is determined by a constructor argument:
or when you must calculate the value in something more sophisticated than an initializer expression:
static final int[] numbers = numberList(); static final int maxNumber; // max value in numbers static { int max = numbers[0]; for (int num : numbers) { if (num > max) max = num; } maxNumber = max; } static int[] numberList() { // ... }
The compiler will verify that all static final fields are initialized by the end of any static initializer blocks, and that non-static final fields are initialized by the end of all construction paths for an object. A compile-time error will occur if the compiler cannot determine that this happens.
Blank final local variables are useful when the value to be assigned to the variable is conditional on the value of other variables. As with all local variables, the compiler will ensure that a final local variable is initialized before it is used.
Local variables and parameters are usually declared final only when they will be accessed by a local, or anonymous inner, class—though some people advocate always making parameters final, both as a matter of style, and to avoid accidentally assigning a value to a parameter, when a field or other variable was intended. Issues regarding when you should, and should not, use final
on fields were discussed on page 46.
Arrays provide ordered collections of elements. Components of an array can be primitive types or references to objects, including references to other arrays. Arrays themselves are objects and extend Object
. The declaration
declares an array named ia
that initially refers to an array of three int
values.
Array dimensions are omitted in the type declaration of an array variable. The number of components in an array is determined when it is created using new
, not when an array variable is declared. An array object's length is fixed at its creation and cannot be changed. Note that it is the length of the array object that is fixed. In the example, a new array of a different size could be assigned to the array variable ia
at any time.
You access array elements by their position in the array. The first element of an array has index 0 (zero), and the last element has index length–1. You access an element by using the name of the array and the index enclosed between [
and ]
. In our example, the first element of the array is ia[0]
and last element of the array is ia[2]
. Every index use is checked to ensure that it is within the proper range for that array, throwing an ArrayIndexOutOfBoundsException
if the index is out of bounds.[8] The index expression must be of type int
—this limits the maximum size of an array.
The length of an array is available from its length
field (which is implicitly public
and final
). In our example, the following code would loop over the array, printing each value:
An array with length zero is said to be an empty array. There is a big difference between a null
array reference and a reference to an empty array—an empty array is a real object, it simply has no elements. Empty arrays are useful for returning from methods instead of returning null
. If a method can return null
, then users of the method must explicitly check the return value for null
before using it. On the other hand, if the method returns an array that may be empty, no special checking is needed provided the user always uses the array length to check valid indices.
If you prefer, you can put the array brackets after the variable name instead of after the type:
This code is equivalent to the original definition of ia
. However, the first style is preferable because it places the type declaration entirely in one place.
The normal modifiers can be applied to array variables, depending on whether the array is a field or local variable. The important thing to remember is that the modifiers apply to the array variable not to the elements of the array the variable references. An array variable that is declared final
means that the array reference cannot be changed after initialization. It does not mean that array elements cannot be changed. There is no way to apply any modifiers (specifically final
and volatile
) to the elements of an array.
You can have arrays of arrays. The code to declare and print a two-dimensional matrix, for example, might look like this:
float[][] mat = new float[4][4]; setupMatrix(mat); for (int y = 0; y < mat.length; y++) { for (int x = 0; x < mat[y].length; x++) System.out.print(mat[y][x] + " "); System.out.println(); }
The first (left-most) dimension of an array must be specified when the array is created. Other dimensions can be left unspecified, to be filled in later. Specifying more than the first dimension is a shorthand for a nested set of new
statements. Our new
creation could have been written more explicitly as:
One advantage of arrays of arrays is that each nested array can have a different size. You can emulate a 4×4 matrix, but you can also create an array of four int
arrays, each of which has a different length sufficient to hold its own data.
When an array is created, each element is set to the default initial value for its type—zero for the numeric types, 'u0000'
for char
, false
for boolean
, and null
for reference types. When you declare an array of a reference type, you are really declaring an array of variables of that type. Consider the following code:
Attr[] attrs = new Attr[12]; for (int i = 0; i < attrs.length; i++) attrs[i] = new Attr(names[i], values[i]);
After the initial new
of the array, attrs
has a reference to an array of 12 variables that are initialized to null
. The Attr
objects themselves are created only when the loop is executed.
You can initialize arrays with comma separated values inside braces following their declaration. The following array declaration creates and initializes an array:
The following code gives the same result:
String[] dangers = new String[3]; dangers[0] = "Lions"; dangers[1] = "Tigers"; dangers[2] = "Bears";
When you initialize an array within its declaration, you don't have to explicitly create the array using new
—it is done implicitly for you by the system. The length of the array to create is determined by the number of initialization values given. You can use new
explicitly if you prefer, but in that case you have to omit the array length, because again it is determined from the initializer list.
This form of array creation expression allows you to create and initialize an array anywhere. For example, you can create and initialize an array when you invoke a method:
An unnamed array created with new
in this way is called an anonymous array.
The last value in the initializer list is also allowed to have a comma after it. This is a convenience for multiline initializers so you can reorder, add, or remove values, without having to remember to add a comma to the old last line, or remove it from the new last line.
Arrays of arrays can be initialized by nesting array initializers. Here is a declaration that initializes an array to the top few rows of Pascal's triangle, with each row represented by its own array:
Indices in an array of arrays work from the outermost inward. For example, in the above array, pascalsTriangle[0]
refers to the int
array that has one element, pascalsTriangle[1]
refers to the int
array that has two elements, and so forth.
For convenience, the System
class provides an arraycopy
method that allows you to assign the values from one array into another, instead of looping through each of the array elements—this is described in more detail in “Utility Methods” on page 665.
Arrays are implicit extensions of Object
. Given a class X
, classes Y
and Z
that extend X
, and arrays of each, the class hierarchy looks something like this:
This class relationship allows polymorphism for arrays. You can assign an array to a variable of type Object
and cast it back. An array of objects of type Y
is usable wherever an array of objects of its supertype X
is required. This seems natural but can require a run time check that is sometimes unexpected. An array of X
can contain either Y
or Z
references, but an array of Y
cannot contain references to X
or Z
objects. The following code would generate an ArrayStoreException
at run time on either of its final two lines, which violate this rule:
Y[] yArray = new Y[3]; // a Y array X[] xArray = yArray; // valid: Y is assignable to X xArray[0] = new Y(); xArray[2] = new X(); // INVALID: can't store X in Y[] xArray[1] = new Z(); // INVALID: can't store Z in Y[]
If xArray
were a reference to a real X[]
object, it would be valid to store both an X
and a Z
object into it. But xArray
actually refers to a Y[]
object so it is not valid to store either an X
reference or a Z
reference in it. Such assignments are checked at run time if needed to ensure that no improper reference is stored into an array.
Like any other object, arrays are created and are subject to normal garbage collection mechanisms. They inherit all the methods of Object
and additionally implement the Cloneable
interface (see page 101) and the Serializable
interface (see “Object Serialization” on page 549). Since arrays define no methods of their own, but just inherit those of Object
, the equals
method is always based on identity, not equivalence. The utility methods of the java.util.Arrays
class—see “The Arrays Utility Class” on page 607—allow you to compare arrays for equivalence, and to calculate a hash code based on the contents of the array.
The major limitation on the “object-ness” of arrays is that they cannot be extended to add new methods. The following construct is not valid:
In a sense, arrays behave like final classes.
Exercise 7.3: Write a program that calculates Pascal's triangle to a depth of 12, storing each row of the triangle in an array of the appropriate length and putting each of the row arrays into an array of 12 int
arrays. Design your solution so that the results are printed by a method that prints the array of arrays using the lengths of each array, not a constant 12. Now change the code to use a constant other than 12 without modifying your printing method.
Identifiers give names to a range of things within our programs—types, variables, fields, methods, and so forth. When you use a particular name in your program, the compiler has to determine what that name refers to, so that it can decide if you are using the name correctly and so that it can generate the appropriate code. The rules for determining the meaning of a name trade off convenience with complexity. At one extreme the language could require that every name in a program be unique—this makes things simple for the compiler but makes life very inconvenient for the programmer. If names are interpreted based on the context in which they are used, the programmer gets the convenience of reusing names (such as always using the name i
for a for
loop counter), but the compiler has to be able to determine what each name means—and so does any human being reading the code.
Name management is achieved with two mechanisms. First, the namespace is partitioned to give different namespaces for different kinds of names. Second, scoping is used to control the visibility of names declared in one part of a program to other parts. Different namespaces allow you to give the same name to a method and a field (not that we recommend doing this), and scoping allows you to use the same name for all your for
loop counters.
There are six different namespaces:
package names,
type names,
field names,
method names,
local variable names (including parameters), and
labels
When a name is used in a program, its context helps determine what kind of name it is. For example, in the expression x.f=
3
, we know that f
must be a field—it can't be a package, type, method, or label because we are assigning a value to it, and it can't be a local variable because we are accessing it as a member of x
. We know that x
must be a typename, or a field, or a local variable that is an object reference—exactly which one is determined by searching the enclosing scope for an appropriate declaration, as you will see.
The use of separate namespaces gives you greater flexibility when writing code (especially when combining code from different sources) but can be abused. Consider this pathological, but perfectly valid, piece of code:
package Reuse; class Reuse { Reuse Reuse(Reuse Reuse) { Reuse: for (;;) { if (Reuse.Reuse(Reuse) == Reuse) break Reuse; } return Reuse; } }
Every declaration of a name has a scope in which that name can be used. The exact rules differ depending on the kind of name—type name, member name, local variable, and so on. For example, the scope of a parameter in a method is the entire body of that method; the scope of a local variable is the block in which the local variable is declared; the scope of a loop variable declared in the initialization section of a for
loop is the rest of that for
loop.
A name cannot be used outside its scope—for example, one method in a class cannot refer to the parameter of another method. However, scopes also nest and an inner scope has access to all names declared in the outer scope before the inner scope is entered. For example, the body of a for
loop can access the local variables of the method in which it was declared.
When a name that could be a variable is used, the meaning of the name is determined by searching the current and enclosing scopes for declarations of that name in the different namespaces. The search order is:
Local variables declared in the code block, for
loop, or as parameters to the catch
clause of a try
statement. Then local variables declared in any enclosing code block. This applies recursively up to the method containing the block, or until there is no enclosing block (as in the case of an initialization block).
If the code is in a method or constructor, the parameters to the method or constructor.
A field of the class or interface, including any accessible inherited fields.
If the type is a nested type, a variable in the enclosing block or field of the enclosing class. If the type is a static nested type, only static fields of an enclosing type are searched. This search rule is applied successively to any enclosing blocks and types further out.
A static field of a class, or interface, specifically declared in a static import statement.
A static field of a class, or interface, declared in a static import on demand statement.
For method names a similar process as for fields is followed, but starting at step 3, searching for methods in the current class or interface. There are special rules for determining how members of a class are accessed, as you'll see in “Member Access” on page 223.
The order of searching determines which declaration will be found. This implies that names declared in outer scopes can be hidden by names declared in inner scopes. And that means, for example, that local variable names can hide class member names, that nested class members can hide enclosing instance members, and that locally declared class members can hide inherited class members—as you have already seen.[9]
Hiding is generally bad style because a human reading the code must check all levels of the hierarchy to determine which variable is being used. Yet hiding is permitted in order to make local code robust. If hiding outer variables were not allowed, adding a new field to a class or interface could break existing code in subtypes that used variables of the same name. Scoping is meant as protection for the system as a whole rather than as support for reusing identifier names.
To avoid confusion, hiding is not permitted in nested scopes within a code block. This means that a local variable in a method cannot have the same name as a parameter of that method; that a for
loop variable cannot have the same name as a local variable or parameter; and that once there is a local variable called, say, über
, you cannot create a new, different variable with the name über
in a nested block.
{ int über = 0; { int über = 2; // INVALID: already defined // ... } }
However, you can have different (non-nested) for
loops in the same block, or different (non-nested) blocks in the same method, that do declare variables with the same name.
If a name appears in a place where a type name is expected, then the different type scopes must be searched for that name. Type scopes are defined by packages. The search order is as follows:
Again, hiding of type names is possible, but a type can always be explicitly referred to by its fully qualified name, which includes package information, such as java.lang.String
. Packages and type imports are discussed in Chapter 18.
In order to make an apple pie from scratch, you must first create the universe. | ||
--Carl Sagan, Cosmos |
[1] The Java programming language tracks the Unicode standard. See “Further Reading” on page 755 for reference information. The currently supported Unicode version is listed in the documentation of the Character
class.
[2] There is a good reason to allow multiple u's.
When translating a Unicode file into an ASCII
file, you must translate Unicode characters that are outside the ASCII
range into an escape sequence. Thus, you would translate into u0b87.
When translating back, you make the reverse substitution. But what if the original Unicode source had not contained but had used u0b87
instead? Then the reverse translation would not result in the original source (to the parser, it would be equivalent, but possibly not to the reader of the code). The solution is to have the translator add an extra u
when it encounters an existing u
xxxx
,
and have the reverse translator remove a u
and, if there aren't any left, replace the escape sequence with its equivalent Unicode character.
[3] These characters are , , , , , and , respectively.
[4] These are the word “cat” or “kitty” in English, Serbo-Croatian, Russian, Persian, Tamil, and Japanese, respectively.
[5] One is a Cyrillic letter, the other is ASCII
. Guess which is which and win a prize.
[6] Type variables are not storage locations and are excluded from this discussion. They apply only to generic type declarations and are discussed in Chapter 11.
[7] In technical terms there is a concept of a variable being “definitely assigned.” The compiler won't allow the use of a local variable unless it can determine that it has been definitely assigned a value.
[8] The range check can often be optimized away when, for example, it can be proved that a loop index variable is always within range, but you are guaranteed that an index will never be used if it is out of range.
[9] Technically, the term hiding is reserved for this last case—when an inherited member is hidden by a locally declared member—and the other situations are referred to as shadowing. This distinction is not significant for this book so we simply refer to “hiding.”