Notice: This material is excerpted from Special Edition Using Java, ISBN: 0-7897-0604-0. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.
by Jay Cross
Tokens are to computer language as words and punctuation are to human language. William of Ockham (a noted 14th Century Scholar famous for his support of simplicity) in his Summa Logicae went to great lengths to describe his theory of terms. While it is beyond the scope of this book to explain Ockham fully, the sense of it is that (for example) the word "chair" is not a chair, but rather a symbol-a reader or listener conjures up the thought of a chair when he reads or hears the word.
Using the same analogy, tokens are terms in source languages for computers. If a programmer declares a token "counter" to represent a short integer (a sixteen bit number described later in this chapter), then the compiler recognizes the token "counter" every time it is used in that context as referring to a specific 16 bits of memory somewhere. Any operations performed on "counter" are done with the value contained in those 16 bits; not with the token (the characters c, o, u, n, t, e, r), but with what that token represents to the compiler.
To accurately describe a task to a compiler, a description language needs to have a strict and unambiguous grammar structure. Java's grammar is fairly simple and elegant. You can begin understanding Java by learning about the tokens from which the more complex forms of expression are composed. These include keywords, identifiers, literals, separators, and operators. A Java program may also contain white space and comments that have no meaning to the compiler but are permitted for the sake of making the code's meaning clear to human readers-especially its author(s).
In this chapter you will learn:
There are certain sequences of characters that have special meaning in Java; these sequences are called keywords. Some of them are like verbs, some like adjectives, some like pronouns. Some of them are tokens that are saved for later versions of the language, and one goto is a vile oath from ancient procedural tongues that may never be uttered in polite Java.
The following is a list of the 56 keywords you can use in Java. When you know the meanings of all these terms, you will be well on your way to being a Java programmer.
Table 9.1 The 56 Keywords Used in Java
abstract | boolean | break | byte |
case | cast | catch | char |
class | const | continue | default |
do | double | else | extends |
final | finally | float | for |
future | generic | goto | if |
implements | import | inner | instanceof |
int | interface | long | native |
new | null | operator | outer |
package | private | protected | public |
rest | return | short | static |
super | switch | synchronized | this |
throw | throws | transient | try |
var | void | volatile | while |
The keywords byvalue, cast, const, future, generic, goto, inner, operator, outer, rest, and var are reserved, but have no meaning in Java 1.0. Programmers experienced with other languages such as C, C++, Pascal, or SQL may know what these terms might eventually be used for. For the time being, you won't use these terms, and Java is much simpler and easier to maintain without them.
The tokens true and false are not on this list; technically, they are literal values for boolean variables or constants (boolean and other literals are described in the section on literals later in this chapter). As such, programmers should refrain from using them as identifiers (user defined names or labels).
Because these terms have specific meaning in Java, you can't use them as identifiers for something else, such as variables, constants, class names, and so on. However, they can be used as part of a longer token, for example:
public int abstract_int;
Also, because Java is case sensitive, if a programmer is bent on using one of these words as an identifier of some sort, you can use an initial uppercase letter. While this is possible, it is a very bad idea in terms of human readability, and it results in wasted man-hours when the code must be improved later to this:
public short Long;
It can be done, but for the sake of clarity and mankind's future condition, please don't do it.
There are numerous Classes defined in the standard packages. While their names are not keywords, the overuse of these names may make your meaning unclear to future people working on your application or applet.
Identifiersare terms chosen by the programmer that become tokens representing variables, constants, classes, objects, labels (which are like nouns), and methods (which are like verbs). As noted in the previous section, identifiers cannot be identical to Java keywords.
Identifiers in Java are a sequence of Unicode letters and digits of unlimited length. (Actually, the length may be limited by the maximum file size on the applet or application developer's system. Practically, this would limit an identifier to being less than two billion characters.) The first character of an identifier must be a letter. All subsequent characters must be letters or numerals. They do not need to be Latin letters or digits; they could be from any alphabet that Unicode supports, such as Arabic-Indic, Devanagari, Bengali, Tamil, Thai, or many others. For various historical and practical considerations, the underscore (_) and the dollar sign ($) are considered letters and may be used as any character in an identifier, including the first one.
Two tokens are the same identifier only if they are of equal length and if each character in the first token is exactly the same as its counterpart in the second token. This is case-sensitive and language-sensitive. This means that Latin letters are different from matching Greek letters, and letters with accents are different from letters without.
Most application developers are forever walking the line of compromise between choosing identifiers that are short enough to be quickly and easily typed without error and those that are long enough to be descriptive and easily read. Either way, in a large application it is useful to choose a naming convention that reduces the likelihood of accidental reuse of a particular identifier.
Legal identifiers | Not legal identifiers |
---|---|
HelloWorld | 9HelloWorld |
counter | count&add |
HotJava$ | Hot Java |
ioc_Queue3 | 65536 |
ErnestLawrenceThayersFamousPoemOfJune1888 | non-plussed |
Table 9.2Examples of legal and illegal Identifiers
In the above illegal examples, the first is forbidden because it begins with a numeral. The second has an illegal character (&) in it. The third also has inappropriate character-the blank space. The fourth is a literal number (216) and cannot be used as an identifier. The last one contains yet another bad character-the hyphen or minus sign. Java would try to treat this last case as an expression containing two identifiers and an operation to be performed on them.
Literals are tokens representing values to be stored in bytes, shorts, ints, longs, floats, doubles, booleans, and chars. In addition, literals are used to represent values to be stored in string types. The following statements contain literals:
Clearly, there are several types of literals. In fact, the Java Language Specification gives five major types of literals, some of which have subtypes. The five major types are:
The following five sections of this chapter give more information about the different types of literals.
There are two boolean literals: true and false. There is no null value, and there is no numeric equivalent.
Character literals are enclosed in single quotes. This is true whether the character value is Latin alpha-numeric, an escape sequence, or any other Unicode character. Single characters are any printable character except hyphen (-) or backslash (\). Some examples of these literals are 'a', 'A', '9', '+' '_', and '~'.
The escape sequence character literals are of the form '\b'. That is within single quotes, a backslash followed by one of the following:
The meaning of the items from the first bulleted item above is probably familiar to C and C++ programmers, and anyone else should quickly recognize as needing a special way to represent the following:
Escape Literal Meaning
Character literals mentioned in the second bulleted item above are called octal escape literals. They can be used to represent any Unicode value from '\u0000' to '\u00ff' (the traditional ASCII range). In octal (base 8), these values are from \000 to \377. Note that octal numerals are from 0 to 7 inclusive. Some examples of these octal literals are:
Octal Literal Meaning
Character literals of the type in the last bulleted item above are interpreted very early by javac. As a result, using the escape Unicode literals to express a line termination character such as carriage return or line feed results in an end-of-line appearing before the terminal single quote mark. The result is a compile-time error. Examples of this type of character literal appear as the first six characters of each listing under the "Meaning" heading above.
Don't use the \u format to express an end-of-line character. Use the \n or \r characters instead.
Floating point literals have several parts. They appear in the following order:
Part Is it Required? Examples
Separators are single-character tokens, which (as their name implies) are found between other tokens. There are nine separators, which are loosely described below:
Operators express which operation is to be performed on a given value or values. Here they are described in several related categories.
There are 37 character sequences that are tokens used as operators. (C and C++ users will find most of them very familiar.) There are the five arithmetic operators (+, -, *, /, %), six assignment operators (=, +=, *=, -=, /=, %=), a decrement operator (--), an increment operator (++), four bitwise arithmetic operators (&, |, ^, ~), three bitwise shifting operators (<<, >>, >>>), six bitwise assignment operators (&=, |=, ^=, <<=, >>=, >>>=), six comparison operators (==, !=, <. >, <=, >=), three logical comparison operators (&&, ||, !), and two that act as an if-then-else when used together (?, :).
The arithmetic operators take two values, integer or floating point, and return a third value whose type can be determined as follows: two integer types (byte, short, int, or long) produce an int or a long (long if and only if one of the operands was a long, or the result can only be expressed as a long). Two floating point types produce a floating point type (if either are a double, they produce a double). An integer and a floating point produce a floating point result. Note that the plus-sign operator also acts as the string concatenation operator.
The following code fragment shows these operators in an integer context. The use of the operators is syntactically the same for floating point numbers.
Listing 9.1 Examples Using Arithmetic Operators byte j = 60; // set the byte j's value to 60 short k = 24; int l = 30; long m = 12L; long result = 0L; result = j + k; // result gets 84: (60 plus 24) result = result / m; // result gets 7: (84 divided by 12) result = j - (2*k + result); // result gets 5: (60 minus (48 plus 7)) result = k % result; // result gets 4: (remainder 24 div by 5)
With the exception of the (direct) assignment operator (=), the arithmetic assignment operators are a little bit of a shortcut. Like the arithmetic operators above, they can be used with both integers and floating point values. With each of these operators, the result is placed in the left operand.
The following code fragment shows these operators in an integer context. The use of the operators is syntactically the same for floating point numbers.
Listing 9.2 Examples Using Arithmetic Assignment Operators byte j = 60; // set the byte j's value to 60 short k = 24; int l = 30; long m = 12L; long result = 0L; result += j; // result gets 60: (0 plus 60) result += k; // result gets 84: (60 plus 24) result /= m; // result gets 7: (84 divided by 12) result -= l; // result gets -23: (7 minus 30)) result = -result; // result gets 23: (-(-23)) result %= m; // result gets 11: (remainder 23 div by 12)
The increment and decrement operators are used with one integer or floating point operand (they are unary operators). The increment operator (++) adds one to the operand. If the operator appears before the operand, the increment occurs before the value is taken for the expression. If it appears after the operand, the addition occurs after the value is taken. Similarly, the decrement operator (--) subtracts one from the operand, and the timing of this is in relation to evaluation of the expression that it occurs in.
The following code fragment shows these operators in an integer context. The use of the operators is syntactically the same for floating point numbers.
Listing 9.3 Examples Using Increment and Decrement Operators long counter = 1000000; // start at a million. double fpcounter = 0; // start with a clean slate. double fpsum = 0; // no additions so far. // compute the sum of all the numbers from one to a million. while (counter-- > 0) { // using a million on the first iteration. fpsum += ++fpcounter; // fpcounter incremented before use. }
In the above example, counter is decremented as the test to get out of the loop. In the first iteration, it has a value of a million; in the last iteration, it starts with a value of one. Within the loop, fpcounter is incremented before adding it to the fpsum variable, so the very first iteration has a value of 1-even though it is initialized to 0. In the last iteration, it has a value of a million. The value of fpsum is a little less (due to round-off error) than 500,000,500,000.0, which would have been more easily, but less instructively, computed using n(n+1)/2.
Bitwise arithmetic is not complicated. If you are unfamiliar with it, there may be some new ideas here that will be difficult to learn from a short section in a book. If this is important but difficult for you, you might try reading a general computer science book for a section on this subject. For starters, let's just say that bitwise arithmetic is used for setting and testing single bits and combinations of individual bits within a variable. Generally, it is not good programming style to do this without a very good reason. Most of these reasons involve communicating with hardware devices or storing information as densely as possible. In the following examples, you will be using variables of type byte because they are the simplest to see. It is assumed that you understand the meaning of hexadecimal numbers. Bitwise arithmetic is defined for the four integer and char types, but not for the floating point, or boolean types.
First, a bit of elementary computer science: Ignoring sign for a moment, a byte is composed of 8 bits. Each bit has a value of 1 or 0. You assign values to each of the bits as 128, 64, 32, 16, 8, 4, 2, 1, (27, 26, 25, 24, 23, 22, 21, 20-if the low seven bits are set, the byte has the value of 1+2+4+8+16+32+64 = 127. In hexadecimal, we call this 0x7f [7 is 0111 (4+2+1), and f is 1111 (8+4+2+1)].
The four bitwise arithmetic operators are called And, Or, Xor, and Compliment. If you And (&) one byte with another and put the result in a third byte, the resulting byte has bits = 1 only when both of the operands had bits in that position = 1. Thus, if 0x7f (0111_1111) is Anded with 0x34 (0011_0100), the result is 0x34 because all the one bits in 0x34 were set to one in the other number. Similarly:
0x4f & 0x22 = 0x02 (0100_1111) & (0010_0010) = (0000_0010) 0x3c & 0xa5 = 0x24 (0011_1100) & (1010_0101) = (0010_0100)
As you can see, the Anding process always result in the same of fewer bits set to one.
If you Or (|) one byte with another and put the result in a third byte, the resulting byte has bits = 1 when either of the operands had a bit in that position = 1. So. as shown previously:
0x4f | 0x22 = 0x6f (0100_1111) | (0010_0010) = (0110_1111) 0x3c | 0xa5 = 0xbd (0011_1100) | (1010_0101) = (1011_1101)
The process of Oring always results in the same or more bits set to one.
If you Xor (^) (exclusive or) one byte with another and put the result in a third byte, the resulting byte has bits = 1 when the corresponding bit is set in exactly one of the two operand bytes. Thus:
0x4f ^ 0x22 = 0x6d (0100_1111) ^ (0010_0010) = (0110_1101) 0x3c ^ 0xa5 = 0x99 (0011_1100) ^ (1010_0101) = (1001_1001)
Complimenting (~) is a unary bitwise operation. When a byte is complimented, all the bits are inverted. Thus, ~(0x7f) = 0x80.
Bitwise shifting operations rotate the bits in an integer. The bits of the first operand are rotated by the number of positions given in the second operand. In the case of the shift left, it is always a zero that is shifted in at the right; it is the equivalent of multiplying by 2 to the second operand power. The normal shift right operator propagates the sign bit. This is like dividing by a power of 2. The shift right with zero fill propagates a zero from the left.
The following examples may be instructive:
0x4f << 1 = 0x9e (0100_1111) << 1 = (1001_1110) 0x3c << 2 = 0xf0 (0011_1100) << 2 = (1111_0000) 0x4f >> 1 = 0x27 (0100_1111) >> 1 = (0010_0111) 0xf0 >> 2 = 0xfc (1111_0000) >> 2 = (1111_1100) 0x4f >>> 1 = 0x27 (0100_1111) >>> 1 = (0010_0111) 0xf0 >>> 2 = 0x3c (1111_0000) >>> 2 = (0011_1100)
By now it should be fairly obvious what the bitwise assignment operators do. They take a value, do the appropriate bitwise operation with the second operand, and place the result as the contents of the first operand.
Comparison operators compare two integers or floating point numbers and return a boolean value (true or false) depending on the relationship between the operands.
Variables of type char are have their values treated as unsigned 16 bit integers (values 0 to 65535)for use by comparison operators. See the chapter on Types for more detail.
These operators are often used in "if" or "while" statements that require a boolean value to determine whether to execute a block of code. The following code fragment shows the use of one of these operators:
Listing 9.4 An Example Using a Comparison Operator int big = 100; int small = 2; if (big >= small) { System.out.println("All is right with the world"); }
Note that the equality operator is two successive equal signs. A single equal sign is the assignment operator and has a very different meaning.
The logical comparison operators take boolean operands and produce boolean results. The logical Not operator is unary.
The logical And operator returns true only if both operands are true. The logical Or operator returns false only if both operands are false. The logical Not operator returns true only if the operand is false.
Oddly, in most computer languages, including Java, there is no logical Xor operator.
This operator is part of the C language. Enough people use it that it was not expunged in the stripping down of C and C++ to make clean, new Java. It is a shorthand for the If () {} else {} construct. Perhaps these code fragments will make usage of these operators clear:
Listing 9.5 An Example Using the If-Then-Else Operator int j = 5; int k = 10; long max = 0; max = k>j ? k : j; // for easily understood Java code, // use this construct sparingly.
In this example, max is assigned the value k if the boolean expression preceding the question mark (?) is true. It is assigned the value of j if that expression is false. The result in this case is that it always gets the greater of the two.
Technically, white space is not a token. White space can be inserted into a Java application's source code without affecting the meaning of the code to the compiler. White space is composed of space characters, horizontal tab characters, new line characters, carriage returns, and form feeds. These characters can be anywhere except within a token.
White space is optional, but because proper use of it has a big impact on readability and consequently maintainability of the source code for an application or applet, its use is highly recommended. Let's take a look at the ever popular HelloWorld App written with minimal use of white space:
public class HelloWorld{public static void main(String args[]){System.out.println("Hello World");}}
Clearly, it is a little harder to ferret out what this application does, or even that you have started at the beginning and ended at the end. Choose a scheme for applying meaningful white space, and follow it. Then you stand a better chance of knowing which close curly brace (}) matches which open curly brace ({).
Java supports three styles of comments. Comments are not tokens and neither are any of their contents. These comments are referred to here as Traditional comments (from the C language tradition), javadoc comments ( a minor modification of the above), and C++ style comments for the additional style introduced in C++.
The first is the traditional C-style comment that begins with a slash-star (/*) and ends with a star-slash (*/). These can begin and end anywhere except within a string literal, character literal, or another comment.
Comments of this sort can span many lines or be contained in a single line (outside of a token). Comments cannot be nested. Thus if you try to nest them, the opening of the inner one is not detected by the compiler, and the closing of the inner one ends the comment, and subsequent text is interpreted as tokens. This usually results in a compile-time error. Two example of comments of this sort are seen in the following code fragment:
Listing 9.6 An Example With Two Traditional Comments /* The following is a code fragment * that is here only for the purpose * of demonstrating a style of comment. */ double pi = 3.141592654 /* close enough for now */ ;
The second style of comment in Java is a special case of the first. It has the properties mentioned above, but the contents of the comment may be used in automatically generated documentation by the javadoc tool. Avoid inadvertent use of this style if you plan to use javadoc.
For more details about javadoc, see Chapter 4.
The third style of comment begins with a slash-slash (//), and ends when the current source code line ends. These comments are especially useful for describing the intended meaning of the current line of code. The following instructive code fragment demonstrates the use of this style of comment:
Listing 9.7 An Example Using Traditional and C++ Style Comments for (int j = 0, boolean Bad = false; // initialize outer loop j < MAX_ROW; // repeat for all rows j++) { for (int k = 0; // initialize inner loop k < MAX_COL; // repeat for all columns k++) { if (NumeralArray[j][k] > '9') { // > highest numeric? Bad = true; // mark bad } /* close if > '9' */ if (NumeralArray[j][k] < '0') { // < lowest numeric? Bad = true; // mark bad } /* close if < '0' */ } /* close inner loop */ } /* close outer loop */
For technical support for our books and software contact support@mcp.com
Copyright ©1996, Que Corporation