This chapter is a terse but comprehensive introduction to Java syntax. It is written primarily for readers who are new to the language but have some previous programming experience. Determined novices with no prior programming experience may also find it useful. If you already know Java, you should find it a useful language reference. The chapter includes some comparisons of Java to C and C++ for the benefit of programmers coming from those languages.
This chapter documents the syntax of Java programs by starting at the very lowest level of Java syntax and building from there, covering increasingly higher orders of structure. It covers:
The characters used to write Java programs and the encoding of those characters.
Literal values, identifiers, and other tokens that comprise a Java program.
The data types that Java can manipulate.
The operators used in Java to group individual tokens into larger expressions.
Statements, which group expressions and other statements to form logical chunks of Java code.
Methods, which are named collections of Java statements that can be invoked by other Java code.
Classes, which are collections of methods and fields. Classes are the central program element in Java and form the basis for object-oriented programming. Chapter 3 is devoted entirely to a discussion of classes and objects.
Packages, which are collections of related classes.
Java programs, which consist of one or more interacting classes that may be drawn from one or more packages.
The syntax of most programming languages is complex, and Java is no exception. In general, it is not possible to document all elements of a language without referring to other elements that have not yet been discussed. For example, it is not really possible to explain in a meaningful way the operators and statements supported by Java without referring to objects. But it is also not possible to document objects thoroughly without referring to the operators and statements of the language. The process of learning Java, or any language, is therefore an iterative one.
Before we begin our bottom-up exploration of Java syntax, let’s take a moment for a top-down overview of a Java program. Java programs consist of one or more files, or compilation units, of Java source code. Near the end of the chapter, we describe the structure of a Java file and explain how to compile and run a Java program. Each compilation unit begins with an optional package
declaration followed by zero or more import
declarations. These declarations specify the namespace within which the compilation unit will define names, and the namespaces from which the compilation unit imports names. We’ll see package
and import
again later in this chapter in “Packages and the Java Namespace”.
The optional package
and import
declarations are followed by zero or more reference type definitions. We will meet the full variety of possible reference types in Chapters 3 and 4, but for now, we should note that these are most often either class
or interface
definitions.
Within the definition of a reference type, we will encounter members such as fields, methods, and constructors. Methods are the most important kind of member. Methods are blocks of Java code comprised of statements.
With these basic terms defined, let’s start by approaching a Java program from the bottom up by examining the basic units of syntax—often referred to as lexical tokens.
This section explains the lexical structure of a Java program. It starts with a discussion of the Unicode character set in which Java programs are written. It then covers the tokens that comprise a Java program, explaining comments, identifiers, reserved words, literals, and so on.
Java programs are written using Unicode. You can use Unicode characters anywhere in a Java program, including comments and identifiers such as variable names. Unlike the 7-bit ASCII character set, which is useful only for English, and the 8-bit ISO Latin-1 character set, which is useful only for major Western European languages, the Unicode character set can represent virtually every written language in common use on the planet.
u
xxxx
, in other words, a backslash and a lowercase u, followed by four hexadecimal characters. For example, u0020
is the space character, and u03c0
is the character π.Java has invested a large amount of time and engineering effort in ensuring that its Unicode support is first class. If your business application needs to deal with global users, especially in non-Western markets, then the Java platform is a great choice.
Java is a case-sensitive language. Its keywords are written in lowercase and must always be used that way. That is, While
and WHILE
are not the same as the while
keyword. Similarly, if you declare a variable named i
in your program, you may not refer to it as I
.
Java ignores spaces, tabs, newlines, and other whitespace, except when it appears within quoted characters and string literals. Programmers typically use whitespace to format and indent their code for easy readability, and you will see common indentation conventions in the code examples of this book.
Comments are natural-language text intended for human readers of a program. They are ignored by the Java compiler. Java supports three types of comments. The first type is a single-line comment, which begins with the characters //
and continues until the end of the current line. For example:
int
i
=
0
;
// Initialize the loop variable
The second kind of comment is a multiline comment. It begins with the characters /*
and continues, over any number of lines, until the characters */
. Any text between the /*
and the */
is ignored by javac
. Although this style of comment is typically used for multiline comments, it can also be used for single-line comments. This type of comment cannot be nested (i.e., one /* */
comment cannot appear within another). When writing multiline comments, programmers often use extra *
characters to make the comments stand out. Here is a typical multiline comment:
/*
* First, establish a connection to the server.
* If the connection attempt fails, quit right away.
*/
The third type of comment is a special case of the second. If a comment begins with /**
, it is regarded as a special doc comment. Like regular multiline comments, doc comments end with */
and cannot be nested. When you write a Java class you expect other programmers to use, use doc comments to embed documentation about the class and each of its methods directly into the source code. A program named javadoc
extracts these comments and processes them to create online documentation for your class. A doc comment can contain HTML tags and can use additional syntax understood by javadoc
. For example:
/**
* Upload a file to a web server.
*
* @param file The file to upload.
* @return <tt>true</tt> on success,
* <tt>false</tt> on failure.
* @author David Flanagan
*/
See Chapter 7 for more information on the doc comment syntax and Chapter 13 for more information on the javadoc
program.
Comments may appear between any tokens of a Java program, but may not appear within a token. In particular, comments may not appear within double-quoted string literals. A comment within a string literal simply becomes a literal part of that string.
The following words are reserved in Java (they are part of the syntax of the language and may not be used to name variables, classes, and so forth):
abstract const final int public throw assert continue finally interface return throws boolean default float long short transient break do for native static true byte double goto new strictfp try case else if null super void catch enum implements package switch volatile char extends import private synchronized while class false instanceof protected this
We’ll meet each of these reserved words again later in this book. Some of them are the names of primitive types and others are the names of Java statements, both of which are discussed later in this chapter. Still others are used to define classes and their members (see Chapter 3).
Note that const
and goto
are reserved but aren’t actually used in the language, and that interface
has an additional variant form—@interface
, which is used when defining types known as annotations. Some of the reserved words (notably final
and default
) have a variety of different meanings depending on context.
An identifier is simply a name given to some part of a Java program, such as a class, a method within a class, or a variable declared within a method. Identifiers may be of any length and may contain letters and digits drawn from the entire Unicode character set. An identifier may not begin with a digit. In general, identifiers may not contain punctuation characters. Exceptions include the ASCII underscore (_
) and dollar sign ($
) as well as other Unicode currency symbols such as £
and ¥
.
javac
. By avoiding the use of currency symbols in your own identifiers, you don’t have to worry about collisions with automatically generated identifiers.Formally, the characters allowed at the beginning of and within an identifier are defined by the methods isJavaIdentifierStart()
and isJavaIdentifierPart()
of the class java.lang.Character
.
The following are examples of legal identifiers:
i
x1
theCurrentTime
the_current_time
獺
Note in particular the example of a UTF-8 identifier—獺
. This is the Kanji character for “otter” and is perfectly legal as a Java identifier. The usage of non-ASCII identifiers is unusual in programs predominantly written by Westerners, but is sometimes seen.
Literals are values that appear directly in Java source code. They include integer and floating-point numbers, single characters within single quotes, strings of characters within double quotes, and the reserved words true
, false
, and null
. For example, the following are all literals:
1
1.0
'1'
"one"
true
false
null
The syntax for expressing numeric, character, and string literals is detailed in “Primitive Data Types”.
Java also uses a number of punctuation characters as tokens. The Java Language Specification divides these characters (somewhat arbitrarily) into two categories, separators and operators. The twelve separators are:
(
)
{
}
[
]
...
@
::
;
,
.
+
—
*
/
%
&
|
^
<<
>>
>>>
+=
-=
*=
/=
%=
&=
|=
^=
<<=
>>=
>>>=
=
==
!=
<
<=
>
>=
!
~
&&
||
++
--
?
:
->
We’ll see separators throughout the book, and will cover each operator individually in “Expressions and Operators”.
Java supports eight basic data types known as primitive types as described in Table 2-1. The primitive types include a Boolean type, a character type, four integer types, and two floating-point types. The four integer types and the two floating-point types differ in the number of bits that represent them and therefore in the range of numbers they can represent.
Type | Contains | Default | Size | Range |
---|---|---|---|---|
|
|
|
1 bit |
NA |
|
Unicode character |
|
16 bits |
|
|
Signed integer |
0 |
8 bits |
-128 to 127 |
|
Signed integer |
0 |
16 bits |
-32768 to 32767 |
|
Signed integer |
0 |
32 bits |
-2147483648 to 2147483647 |
|
Signed integer |
0 |
64 bits |
-9223372036854775808 to 9223372036854775807 |
|
IEEE 754 floating point |
0.0 |
32 bits |
1.4E-45 to 3.4028235E+38 |
|
IEEE 754 floating point |
0.0 |
64 bits |
4.9E-324 to 1.7976931348623157E+308 |
The next section summarizes these primitive data types. In addition to these primitive types, Java supports nonprimitive data types known as reference types, which are introduced in “Reference Types”.
The boolean
type represents truth values. This type has only two possible values, representing the two Boolean states: on or off, yes or no, true or false. Java reserves the words true
and false
to represent these two Boolean values.
Programmers coming to Java from other languages (especially JavaScript) should note that Java is much stricter about its Boolean values than other languages—in particular, a boolean
is neither an integral nor an object type, and incompatible values cannot be used in place of a boolean
. In other words, you cannot take shortcuts such as the following in Java:
Object
o
=
new
Object
();
int
i
=
1
;
if
(
o
)
{
while
(
i
)
{
//...
}
}
Instead, Java forces you to write cleaner code by explicitly stating the comparisons you want:
if
(
o
!=
null
)
{
while
(
i
!=
0
)
{
// ...
}
}
The char
type represents Unicode characters. Java has a slightly unique approach to representing characters—javac
accepts identifiers as UTF-8 (a variable-width encoding) in input, but represents chars internally as a fixed-width encoding that is 16 bits wide.
These distinctions do not normally need to concern the developer, however. In most cases, all that is required is to remember the rule that to include a character literal in a Java program, simply place it between single quotes (apostrophes):
char
c
=
'A'
;
You can, of course, use any Unicode character as a character literal, and you can use the u
Unicode escape sequence. In addition, Java supports a number of other escape sequences that make it easy both to represent commonly used nonprinting ASCII characters such as newline
and to escape certain punctuation characters that have special meaning in Java. For example:
char
tab
=
' '
,
nul
=
'
000
'
,
aleph
=
'u05D0'
,
slash
=
''
;
Table 2-2 lists the escape characters that can be used in char
literals. These characters can also be used in string literals, which are covered in the next section.