Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103^rd Street, Indianapolis, IN 46290 or at support@mcp .com.

Notice: This material is excerpted from Special Edition Using Java, ISBN: 0-7897-0604-0. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

Chapter 8 - Tokens in Depth

by Jay Cross

Tokens are to computer language as words and punctuation are to human language. William of Ockham (a noted 14th Century Scholar famous for his support of simplicity) in his Summa Logicae went to great lengths to describe his theory of terms. While it is beyond the scope of this book to explain Ockham fully, the sense of it is that (for example) the word "chair" is not a chair, but rather a symbol-a reader or listener conjures up the thought of a chair when he reads or hears the word.

Using the same analogy, tokens are terms in source languages for computers. If a programmer declares a token "counter" to represent a short integer (a sixteen bit number described later in this chapter), then the compiler recognizes the token "counter" every time it is used in that context as referring to a specific 16 bits of memory somewhere. Any operations performed on "counter" are done with the value contained in those 16 bits; not with the token (the characters c, o, u, n, t, e, r), but with what that token represents to the compiler.

To accurately describe a task to a compiler, a description language needs to have a strict and unambiguous grammar structure. Java's grammar is fairly simple and elegant. You can begin understanding Java by learning about the tokens from which the more complex forms of expression are composed. These include keywords, identifiers, literals, separators, and operators. A Java program may also contain white space and comments that have no meaning to the compiler but are permitted for the sake of making the code's meaning clear to human readers-especially its author(s).

In this chapter you will learn:

What are the reserved words in the Java language.
How to create user defined names and labels.
The many ways to express values for constants.
What are the limits in the ranges of the types of numbers that may be expressed.
What separators are.
What the various operators do.
Why even though the compiler ignores them, comments and white space are important.

Keywords

There are certain sequences of characters that have special meaning in Java; these sequences are called keywords. Some of them are like verbs, some like adjectives, some like pronouns. Some of them are tokens that are saved for later versions of the language, and one goto is a vile oath from ancient procedural tongues that may never be uttered in polite Java.

The following is a list of the 56 keywords you can use in Java. When you know the meanings of all these terms, you will be well on your way to being a Java programmer.

Table 9.1 The 56 Keywords Used in Java

abstract	boolean	break	byte
case	cast	catch	char
class	const	continue	default
do	double	else	extends
final	finally	float	for
future	generic	goto	if
implements	import	inner	instanceof
int	interface	long	native
new	null	operator	outer
package	private	protected	public
rest	return	short	static
super	switch	synchronized	this
throw	throws	transient	try
var	void	volatile	while

The keywords byvalue, cast, const, future, generic, goto, inner, operator, outer, rest, and var are reserved, but have no meaning in Java 1.0. Programmers experienced with other languages such as C, C++, Pascal, or SQL may know what these terms might eventually be used for. For the time being, you won't use these terms, and Java is much simpler and easier to maintain without them.

The tokens true and false are not on this list; technically, they are literal values for boolean variables or constants (boolean and other literals are described in the section on literals later in this chapter). As such, programmers should refrain from using them as identifiers (user defined names or labels).

Because these terms have specific meaning in Java, you can't use them as identifiers for something else, such as variables, constants, class names, and so on. However, they can be used as part of a longer token, for example:

public int abstract_int;

Also, because Java is case sensitive, if a programmer is bent on using one of these words as an identifier of some sort, you can use an initial uppercase letter. While this is possible, it is a very bad idea in terms of human readability, and it results in wasted man-hours when the code must be improved later to this:

public short Long;

It can be done, but for the sake of clarity and mankind's future condition, please don't do it.

There are numerous Classes defined in the standard packages. While their names are not keywords, the overuse of these names may make your meaning unclear to future people working on your application or applet.

Identifiers

Identifiersare terms chosen by the programmer that become tokens representing variables, constants, classes, objects, labels (which are like nouns), and methods (which are like verbs). As noted in the previous section, identifiers cannot be identical to Java keywords.

Identifiers in Java are a sequence of Unicode letters and digits of unlimited length. (Actually, the length may be limited by the maximum file size on the applet or application developer's system. Practically, this would limit an identifier to being less than two billion characters.) The first character of an identifier must be a letter. All subsequent characters must be letters or numerals. They do not need to be Latin letters or digits; they could be from any alphabet that Unicode supports, such as Arabic-Indic, Devanagari, Bengali, Tamil, Thai, or many others. For various historical and practical considerations, the underscore (_) and the dollar sign ($) are considered letters and may be used as any character in an identifier, including the first one.

Two tokens are the same identifier only if they are of equal length and if each character in the first token is exactly the same as its counterpart in the second token. This is case-sensitive and language-sensitive. This means that Latin letters are different from matching Greek letters, and letters with accents are different from letters without.

Most application developers are forever walking the line of compromise between choosing identifiers that are short enough to be quickly and easily typed without error and those that are long enough to be descriptive and easily read. Either way, in a large application it is useful to choose a naming convention that reduces the likelihood of accidental reuse of a particular identifier.

Legal identifiers	Not legal identifiers
HelloWorld	9HelloWorld
counter	count&add
HotJava$	Hot Java
ioc_Queue3	65536
ErnestLawrenceThayersFamousPoemOfJune1888	non-plussed

Table 9.2Examples of legal and illegal Identifiers

In the above illegal examples, the first is forbidden because it begins with a numeral. The second has an illegal character (&) in it. The third also has inappropriate character-the blank space. The fourth is a literal number (216) and cannot be used as an identifier. The last one contains yet another bad character-the hyphen or minus sign. Java would try to treat this last case as an expression containing two identifiers and an operation to be performed on them.

Literals

Literals are tokens representing values to be stored in bytes, shorts, ints, longs, floats, doubles, booleans, and chars. In addition, literals are used to represent values to be stored in string types. The following statements contain literals:

int j=0;
long GrainOfSandOnTheBeachNum=1L;
short Mask1=0x007f;
static String FirstName = "Ernest";
static Char TibetanNine = '\u1049'
boolean UniverseWillExpandForever = true;

Clearly, there are several types of literals. In fact, the Java Language Specification gives five major types of literals, some of which have subtypes. The five major types are:

Boolean literals
Character literals
Floating point literals
Integer literals
String literals

The following five sections of this chapter give more information about the different types of literals.

Boolean Literals

There are two boolean literals: true and false. There is no null value, and there is no numeric equivalent.

Character Literals

Character literals are enclosed in single quotes. This is true whether the character value is Latin alpha-numeric, an escape sequence, or any other Unicode character. Single characters are any printable character except hyphen (-) or backslash (\). Some examples of these literals are 'a', 'A', '9', '+' '_', and '~'.

The escape sequence character literals are of the form '\b'. That is within single quotes, a backslash followed by one of the following:

another character (b, t, n, f, r, ", ', or \)
a series of octal digits
a u followed by a series of hex digits expressing a non-line-terminating Unicode character

The meaning of the items from the first bulleted item above is probably familiar to C and C++ programmers, and anyone else should quickly recognize as needing a special way to represent the following:

Escape Literal Meaning

'\b' \u0008 backspace
'\t' \u0009 horizontal tab
'\n' \u000a linefeed
'\f' \u000c form feed
'\r' \u000d carriage return
'\"' \u0022 double quote
'\'' \u0027 single quote
'\\' \u005c backslash

Character literals mentioned in the second bulleted item above are called octal escape literals. They can be used to represent any Unicode value from '\u0000' to '\u00ff' (the traditional ASCII range). In octal (base 8), these values are from \000 to \377. Note that octal numerals are from 0 to 7 inclusive. Some examples of these octal literals are:

Octal Literal Meaning

'\007' \u0007 bell
'\101' \u0041 'A'
'\141' \u0061 'a'
'\071' \u0039 '9'
'\042' \u0022 double quote

Character literals of the type in the last bulleted item above are interpreted very early by javac. As a result, using the escape Unicode literals to express a line termination character such as carriage return or line feed results in an end-of-line appearing before the terminal single quote mark. The result is a compile-time error. Examples of this type of character literal appear as the first six characters of each listing under the "Meaning" heading above.

Don't use the \u format to express an end-of-line character. Use the \n or \r characters instead.

Floating Point Literals

Floating point literals have several parts. They appear in the following order:

Part    Is it Required? Examples

Whole Number Part Not if fractional part 0, 1, 2, ..., 9, 12345
is present.
Decimal Point Not if exponent is .
present. Must be there
if there is a
fractional part.
Fractional Part Can't be present if 0, 1, 14159, 718281828,
there is no decimal 41421, 9944
point. Must be there
if there is no whole
number part.
Exponent Only if there is no e23, E-19, E6, e+307,
decimal point. e-1

Separators

Separators are single-character tokens, which (as their name implies) are found between other tokens. There are nine separators, which are loosely described below:

( Used both to open a parameter list for a method and to establish a precedence for operations in an expression.
) Used both to close a parameter list for a method and to establish a precedence for operations in an expression.
{ Used to begin a block of statements, or an initialization list.
} Used to close a block of statements, or an initialization list.
[ Precedes an expression used as an array index.
] Follows an expression used as an array index.
; Used both to end an expression statement and to separate the parts of a 'for' statement.
, Used as a list delimiter in many contexts.
. Used both as a decimal point and to separate such things as package name from class name from method or variable name.

Operators

Operators express which operation is to be performed on a given value or values. Here they are described in several related categories.

There are 37 character sequences that are tokens used as operators. (C and C++ users will find most of them very familiar.) There are the five arithmetic operators (+, -, *, /, %), six assignment operators (=, +=, *=, -=, /=, %=), a decrement operator (--), an increment operator (++), four bitwise arithmetic operators (&, |, ^, ~), three bitwise shifting operators (<<, >>, >>>), six bitwise assignment operators (&=, |=, ^=, <<=, >>=, >>>=), six comparison operators (==, !=, <. >, <=, >=), three logical comparison operators (&&, ||, !), and two that act as an if-then-else when used together (?, :).

Arithmetic Operators

The arithmetic operators take two values, integer or floating point, and return a third value whose type can be determined as follows: two integer types (byte, short, int, or long) produce an int or a long (long if and only if one of the operands was a long, or the result can only be expressed as a long). Two floating point types produce a floating point type (if either are a double, they produce a double). An integer and a floating point produce a floating point result. Note that the plus-sign operator also acts as the string concatenation operator.

+ addition operator
- subtraction operator
* multiplication operator
/ division operator
% modulus operator (Gives the remainder of a division.)

The following code fragment shows these operators in an integer context. The use of the operators is syntactically the same for floating point numbers.

Listing 9.1  Examples Using Arithmetic Operators
byte j = 60;                   // set the byte j's value to 60
short k = 24;
int l = 30;
long m = 12L;
long result = 0L;

result = j + k;                   // result gets 84: (60 plus 24)
result = result / m;              // result gets 7: (84 divided by 12)
result = j - (2*k + result);      // result gets 5: (60 minus (48 plus 7))
result = k % result;              // result gets 4: (remainder 24 div by 5)

Arithmetic Assignment Operators

With the exception of the (direct) assignment operator (=), the arithmetic assignment operators are a little bit of a shortcut. Like the arithmetic operators above, they can be used with both integers and floating point values. With each of these operators, the result is placed in the left operand.

=assignment operator
+= add and assign operator
-= subtract and assign operator
*= multiply and assign operator
/= divide and assign operator
%= modulus and assign operator

The following code fragment shows these operators in an integer context. The use of the operators is syntactically the same for floating point numbers.

Listing 9.2  Examples Using Arithmetic Assignment Operators
byte j = 60;                // set the byte j's value to 60
short k = 24;
int l = 30;
long m = 12L;
long result = 0L;

result += j;                    // result gets 60: (0 plus 60)
result += k;                    // result gets 84: (60 plus 24)
result /= m;                    // result gets 7: (84 divided by 12)
result -= l;                    // result gets -23: (7 minus 30))
result = -result;               // result gets 23: (-(-23))
result %= m;                    // result gets 11: (remainder 23 div by 12)

Increment/Decrement Operators

The increment and decrement operators are used with one integer or floating point operand (they are unary operators). The increment operator (++) adds one to the operand. If the operator appears before the operand, the increment occurs before the value is taken for the expression. If it appears after the operand, the addition occurs after the value is taken. Similarly, the decrement operator (--) subtracts one from the operand, and the timing of this is in relation to evaluation of the expression that it occurs in.

++ increment operator
- - decrement operator

The following code fragment shows these operators in an integer context. The use of the operators is syntactically the same for floating point numbers.

Listing 9.3  Examples Using Increment and Decrement Operators
long counter = 1000000; // start at a million.
double fpcounter = 0;      // start with a clean slate.
double fpsum = 0;          // no additions so far.

// compute the sum of all the numbers from one to a million.
while (counter-- > 0) {    // using a million on the first iteration.
     fpsum += ++fpcounter; // fpcounter incremented before use.
}

In the above example, counter is decremented as the test to get out of the loop. In the first iteration, it has a value of a million; in the last iteration, it starts with a value of one. Within the loop, fpcounter is incremented before adding it to the fpsum variable, so the very first iteration has a value of 1-even though it is initialized to 0. In the last iteration, it has a value of a million. The value of fpsum is a little less (due to round-off error) than 500,000,500,000.0, which would have been more easily, but less instructively, computed using n(n+1)/2.

Bitwise Arithmetic Operators

Bitwise arithmetic is not complicated. If you are unfamiliar with it, there may be some new ideas here that will be difficult to learn from a short section in a book. If this is important but difficult for you, you might try reading a general computer science book for a section on this subject. For starters, let's just say that bitwise arithmetic is used for setting and testing single bits and combinations of individual bits within a variable. Generally, it is not good programming style to do this without a very good reason. Most of these reasons involve communicating with hardware devices or storing information as densely as possible. In the following examples, you will be using variables of type byte because they are the simplest to see. It is assumed that you understand the meaning of hexadecimal numbers. Bitwise arithmetic is defined for the four integer and char types, but not for the floating point, or boolean types.

& bitwise arithmetic And operator
| bitwise arithmetic Or operator
^ bitwise arithmetic Xor operator
~ bitwise arithmetic Compliment operator

First, a bit of elementary computer science: Ignoring sign for a moment, a byte is composed of 8 bits. Each bit has a value of 1 or 0. You assign values to each of the bits as 128, 64, 32, 16, 8, 4, 2, 1, (27, 26, 25, 24, 23, 22, 21, 20-if the low seven bits are set, the byte has the value of 1+2+4+8+16+32+64 = 127. In hexadecimal, we call this 0x7f [7 is 0111 (4+2+1), and f is 1111 (8+4+2+1)].

The four bitwise arithmetic operators are called And, Or, Xor, and Compliment. If you And (&) one byte with another and put the result in a third byte, the resulting byte has bits = 1 only when both of the operands had bits in that position = 1. Thus, if 0x7f (0111_1111) is Anded with 0x34 (0011_0100), the result is 0x34 because all the one bits in 0x34 were set to one in the other number. Similarly:

0x4f & 0x22 = 0x02     (0100_1111) & (0010_0010) = (0000_0010)
0x3c & 0xa5 = 0x24     (0011_1100) & (1010_0101) = (0010_0100)

As you can see, the Anding process always result in the same of fewer bits set to one.

If you Or (|) one byte with another and put the result in a third byte, the resulting byte has bits = 1 when either of the operands had a bit in that position = 1. So. as shown previously:

0x4f | 0x22 = 0x6f     (0100_1111) | (0010_0010) = (0110_1111)
0x3c | 0xa5 = 0xbd     (0011_1100) | (1010_0101) = (1011_1101)

The process of Oring always results in the same or more bits set to one.

If you Xor (^) (exclusive or) one byte with another and put the result in a third byte, the resulting byte has bits = 1 when the corresponding bit is set in exactly one of the two operand bytes. Thus:

0x4f ^ 0x22 = 0x6d     (0100_1111) ^ (0010_0010) = (0110_1101)
0x3c ^ 0xa5 = 0x99     (0011_1100) ^ (1010_0101) = (1001_1001)

Complimenting (~) is a unary bitwise operation. When a byte is complimented, all the bits are inverted. Thus, ~(0x7f) = 0x80.

Bitwise Shifting Operators

Bitwise shifting operations rotate the bits in an integer. The bits of the first operand are rotated by the number of positions given in the second operand. In the case of the shift left, it is always a zero that is shifted in at the right; it is the equivalent of multiplying by 2 to the second operand power. The normal shift right operator propagates the sign bit. This is like dividing by a power of 2. The shift right with zero fill propagates a zero from the left.

<< bitwise shift left operator
>> bitwise shift right operator
>>> bitwise shift right with zero fill operator

The following examples may be instructive:

0x4f << 1 = 0x9e     (0100_1111) << 1 = (1001_1110)
0x3c << 2 = 0xf0     (0011_1100) << 2 = (1111_0000)
0x4f >> 1 = 0x27     (0100_1111) >> 1 = (0010_0111)
0xf0 >> 2 = 0xfc     (1111_0000) >> 2 = (1111_1100)

0x4f >>> 1 = 0x27     (0100_1111) >>> 1 = (0010_0111)
0xf0 >>> 2 = 0x3c     (1111_0000) >>> 2 = (0011_1100)

Bitwise Assignment Operators

By now it should be fairly obvious what the bitwise assignment operators do. They take a value, do the appropriate bitwise operation with the second operand, and place the result as the contents of the first operand.

&= bitwise assignment And operator
|= bitwise assignment Or operator
^= bitwise assignment Xor operator
<<= bitwise assignment shift left operator
>>= bitwise assignment shift right operator
>>>= bitwise assignment shift right with zero fill operator

Comparison Operators

Comparison operators compare two integers or floating point numbers and return a boolean value (true or false) depending on the relationship between the operands.

== equality operator
!= inequality operator
< less than operator
> greater than operator
<= less than or equal operator
>= greater than or equal operator

Variables of type char are have their values treated as unsigned 16 bit integers (values 0 to 65535)for use by comparison operators. See the chapter on Types for more detail.

These operators are often used in "if" or "while" statements that require a boolean value to determine whether to execute a block of code. The following code fragment shows the use of one of these operators:

Listing 9.4  An Example Using a Comparison Operator
int big = 100;
int small = 2;

if (big >= small) {
     System.out.println("All is right with the world");
}

Note that the equality operator is two successive equal signs. A single equal sign is the assignment operator and has a very different meaning.

Logical Comparison Operators

The logical comparison operators take boolean operands and produce boolean results. The logical Not operator is unary.

&& logical And operator
|| logical Or operator
! logical Not operator

The logical And operator returns true only if both operands are true. The logical Or operator returns false only if both operands are false. The logical Not operator returns true only if the operand is false.

Oddly, in most computer languages, including Java, there is no logical Xor operator.

If-Then-Else Operators

This operator is part of the C language. Enough people use it that it was not expunged in the stripping down of C and C++ to make clean, new Java. It is a shorthand for the If () {} else {} construct. Perhaps these code fragments will make usage of these operators clear:

Listing 9.5  An Example Using the If-Then-Else Operator
int j = 5;
int k = 10;
long max = 0;

max = k>j ? k : j;      // for easily understood Java code,
                        // use this construct sparingly.

In this example, max is assigned the value k if the boolean expression preceding the question mark (?) is true. It is assigned the value of j if that expression is false. The result in this case is that it always gets the greater of the two.

White Space

Technically, white space is not a token. White space can be inserted into a Java application's source code without affecting the meaning of the code to the compiler. White space is composed of space characters, horizontal tab characters, new line characters, carriage returns, and form feeds. These characters can be anywhere except within a token.

White space is optional, but because proper use of it has a big impact on readability and consequently maintainability of the source code for an application or applet, its use is highly recommended. Let's take a look at the ever popular HelloWorld App written with minimal use of white space:

public class HelloWorld{public static void main(String 
args[]){System.out.println("Hello World");}}

Clearly, it is a little harder to ferret out what this application does, or even that you have started at the beginning and ended at the end. Choose a scheme for applying meaningful white space, and follow it. Then you stand a better chance of knowing which close curly brace (}) matches which open curly brace ({).

Comments

Java supports three styles of comments. Comments are not tokens and neither are any of their contents. These comments are referred to here as Traditional comments (from the C language tradition), javadoc comments ( a minor modification of the above), and C++ style comments for the additional style introduced in C++.

Traditional Comments

The first is the traditional C-style comment that begins with a slash-star (/*) and ends with a star-slash (*/). These can begin and end anywhere except within a string literal, character literal, or another comment.

Comments of this sort can span many lines or be contained in a single line (outside of a token). Comments cannot be nested. Thus if you try to nest them, the opening of the inner one is not detected by the compiler, and the closing of the inner one ends the comment, and subsequent text is interpreted as tokens. This usually results in a compile-time error. Two example of comments of this sort are seen in the following code fragment:

Listing 9.6  An Example With Two Traditional Comments
/* The following is a code fragment
 * that is here only for the purpose 
 * of demonstrating a style of comment.
 */

double pi = 3.141592654  /* close enough for now */ ;

javadoc Comments

The second style of comment in Java is a special case of the first. It has the properties mentioned above, but the contents of the comment may be used in automatically generated documentation by the javadoc tool. Avoid inadvertent use of this style if you plan to use javadoc.

For more details about javadoc, see Chapter 4.

C++ Style Comments

The third style of comment begins with a slash-slash (//), and ends when the current source code line ends. These comments are especially useful for describing the intended meaning of the current line of code. The following instructive code fragment demonstrates the use of this style of comment:

Listing 9.7  An Example Using Traditional and C++ Style Comments
for (int j = 0, boolean Bad = false;    // initialize outer loop
j < MAX_ROW;                               // repeat for all rows
j++) {
     for (int k = 0;                       // initialize inner loop
     k < MAX_COL;                          // repeat for all columns
     k++) {
          if (NumeralArray[j][k] > '9') {  // > highest numeric?
               Bad = true;                 // mark bad
          } /* close if > '9' */
          if (NumeralArray[j][k] < '0') {  // < lowest numeric?
               Bad = true;                 // mark bad
          } /* close if < '0' */
     } /* close inner loop */
} /* close outer loop */

QUE Home Page

For technical support for our books and software contact support@mcp.com