Chapter 2
Language Overview

An complete and operational REXX program can consist of as few as two lines. For example:

/* Comment */
SAY "Hello Word! I'm A New REXX Program"

OS/2 uses the same .CMD extension for REXX programs as it does for its native batch files so simply looking at the extension is not enough to tell OS/2 if the file contains a batch file or a REXX program. To overcome this, OS/2 requires that all REXX programs begin with the two characters "/*" without the quotation marks.

This is not as strange a requirement as it first sounds. REXX delimits the beginning of a comment with the "/*" symbols. Everything following those symbols is ignored as a comment by REXX until it encounters a set of "*/" symbols. Comments can span across lines. For example, the following would be a valid beginning for the simple program shown above:

/* NAME: HIWORLD.CMD
PURPOSE: Demonstrate Adding Comments To REXX Programs
VERSION: 1.00
DATE: October 25, 1993
COPYRIGHT: 1994 McGraw Hill */
SAY "Hello Word! I'm A New REXX Program"

Even though it now spans five lines, everything between the "/*" and the "*/" symbols is treated as a comment by REXX and ignored. The SAY on the last line is a REXX reserved word. It is the same as the ECHO command in a batch file, it tells REXX to display information on the screen.

Two comments about the above example are in order. First, REXX programs are designed to run under multiple platforms with a minimum of translation between platforms. The requirement that REXX programs begin with a comment is a requirement of OS/2 and not REXX itself. If you receive a REXX program from another platform, it may not begin with a comment. In that case, you will have to add a comment to the top of the program. Second, the requirement to use the .CMD extension is also an OS/2 requirement. The naming conventions for other platforms may be different.

Components Of A REXX Program

A REXX program consists of the following components:

Token

The most basic unit of a REXX program is a token. In its simplest form, a token is a unit of a REXX program that it does not make sense to break down any further. For example, the line:

SAY "Hello Word! I'm A New REXX Program"

contains two tokens, the SAY instruction and the character string it displays on the screen. This instruction could also be written as:

SAY"Hello Word! I'm A New REXX Program"

with no space between the SAY instruction and the phrase so there are only two tokens.

This instruction can be written as:

SAY Hello Word! I'm A New REXX Program

without the quotation marks and REXX will display the phrase in all uppercase. Now, the space between the SAY instruction and "Hello" is required since REXX would not know what "SAYHello" meant. However, any spaces on this line after the "Hello" are simply for human readability and are not required by REXX, so this line has three token: the SAY instruction, the space after the SAY instruction and the phrase "Hello Word! I'm A New REXX Program".

REXX has four different types of tokens:

  1. Literal Strings. These are phrases that are inside quotation marks. They begin with a single or double quotation mark and continue until a matching quotation mark is reached. A literal string can also be implied without quotation marks, as illustrated just above. These function just as if they had been defined using quotation marks. There are three types of literal strings, character or ASCII, hexadecimal andbinary.
  2. Operators. Any group of operator characters counts as a single token, even when they are separated by spaces. Operators are characters like "-", "+" and "*".
  3. Symbols. Any group of characters that is not a literal string is a symbol. Numbers, variable names and reserved words are examples of symbols.
  4. Special Characters. REXX uses a few symbols for special purposes. For example, text to be displayed with the SAY instruction can be split across multiple lines with a comma. These special character symbols count as a token.

Reserved Word

A reserved word is a special token, a word REXX reserves to itself and does not allow you to use. For example, the SAY instruction in the example above is a reserved word. Reserved words are only reserved at the beginning of a clause, so the following would be a valid program line:

SAY Say Don't I Know You

The second "Say" is not at the start of a clause so it is just part of the message this REXX program will display on the screen. Since the message is not in quotation marks, REXX will display it in all uppercase. This will be discussed in more detail later.

REXX is much more lenient than other programming languages. You might think that the instruction:

SAY = 123

would use the SAY reserved word to display "= 123" on the screen. However, REXX tries to anticipate what you are trying to do and in doing so "do the right thing" so it creates a variable called Say containing "= 123". This is, of course, very bad programming practices. Of course, if you really wanted to display "= 123" on the screen, you could use the nstruction:

SAY "= 123"

and everything would work out as you wished.

Clauses

A few tokens, like the EXIT instruction, make sense by themselves. However, most tokens must be combined with other tokens to form a complete instruction. When several tokens are combined together to form a complete REXX instruction, that instruction is called a Clause. An entire clause is executed if it is error-free. If a clause contains an error, none of that clause is executed.

In general, a clause consists of one or more tokens and zero or more spaces (which are ignored). A clause may begin with one or more spaces. If a clause begins with spaces, these spaces are ignored. For example, the instructions:

SAY "Hello";
SAY "Hello"

are identical. Clauses end with a semicolon (;) but REXX adds these automatically to the end of most lines so most OS/2 REXX programmers leave them out of their programs.

Under OS/2, clauses are limited to a maximum length of 500 characters. Cowlishaw does not specify a maximum length so this is implementation-specific. For example, the Personal REXX version from Quercus has a maximum length of 1,000.

When REXX encounters a clause, it scans it from left to right and performs a process called tokenization. It is this "tokenized" clause that REXX executes. We will discuss this in more detail later in the chapter.

Since comments are removed from a clause before it is executed, comments can be included anywhere in a statement. For example, the following program segment stores a value to a variable then performs a logic test on that variable:

X = 7
IF X=7 THEN SAY "X Is 7"

That code above would execute identically to this code:

X = 7
IF X=/* Comment */7 THEN SAY "X Is 7"

even though this code has a comment in the middle. They execute the same because tokenization removes the comment "/* Comment */" from the second code segment, leaving it the same as the first segment. Programmers often use this fact to add comments to the end of lines of code that might take a little explanation to understand. However, including comments inside code, as was done is this example above, is very bad programming form that, while it works, makes your programs very hard to follow. The code that follows is a much better approach to documenting specific lines:

X = 7 /* Assign A Value */
IF X=7 THEN SAY "X Is 7" /* Test That Value */

There are four types of clauses:

  1. Label. A label is a symbol followed by a colon, e.g. "END:". A label is used merely to mark a location in a program, the line itself does not execute. This location might be used to jump to like a GOTO command in a batch file or it might represent the name of a subroutine. Notice that batch files place the colon at the beginning of the label while REXX programs place it at the end of the label.
  2. Assignment>. Any clause that begins with a symbol token followed by an equal sign is an assignment clause. The contents of the token following the equal sign is assigned to the variable name in the token preceding the equal sign. For example:

    X = "Ronny"
    Y = 100

    If the first token is not a legal variable name or is not a variable name at all, then while the clause is still an assignment clause, it contains an error. For example, the following are all invalid assignment clauses:

    1 = "Ronny" /* 1 Is Not A Value Variable Name */
    "X" = 7 /* "X" Is A Literal, Not A Variable */

    Since an assignment clause begins with a variable name and not a reserved word, the instruction:

    SAY = 123

    is valid since REXX treats SAY as a token containing a valid variable name and not a token containing an instruction.

  3. Instruction. When the clause is not an assignment clause and the first token is a valid REXX instruction (reserved word) then the clause is an instruction clause.
  4. OS/2 Command. When the clause is not an assignment clause and the first token is not a valid REXX instruction then the command is passed to OS/2 for execution. In this fashion, REXX programs can execute other programs as easily as any batch file. The command is fully evaluated by REXX first, so the second command in the following:

    Message = "Hello From A REXX Program"
    ECHO Message

    would be passed to OS/2, but not before the variable Message had been expanded, so the command to OS/2 would be:

    ECHO Hello From A REXX Program

As we have seen, a clause is a series of tokens, usually ending at the end of a line. However, there are four different things that can end a clause:

  1. The end of a line, unless the last token on the line is a comma. Since the comma is the continuation punctuation, lines that end with a comma are generally treated as though they continue on the next line.
  2. A semicolon.
  3. The keyword Then provided the first token is an IF or WHEN instruction. Note that the THEN is treated as a separate clause and not part of the clause it ends. The keywords ELSE and OTHERWISE are also treated as clauses unto themselves when they occur in the appropriate place.
  4. A colon when it is the second token and therefore used to identify a label.

As complex as this sounds, it is generally true that the end of a line marks the end of a clause.

Statement

We defined a clause as "several clauses combined together to form a complete REXX instruction." Another way of looking at this is a clause is a complete line ending with an actual or implied semicolon. However, not all REXX instructions are complete when made up of just a single line. For example:

X = 7
IF X = 7 THEN
 SAY "X Is 7"
ELSE
 SAY "X Is Not 7"

consists of five lines and five clauses but not five logically independent actions. The first line is clearly an independent action, it assigns a value to a variable. That line would make sense in a program by itself. The two SAY lines would equally make sense in programs by themselves. However, the IF and ELSE lines are part-and-parcel of one overall logic test. In fact, the two SAY lines are equally part of this logic test. In addition to the IF instruction, the DO and SELECT instructions automatically generate more than one line of code.

A statement then is the basic unit of code in a REXX program. Unless a statement begins with an IF, DO or SELECT instruction, a statement and clause are identical. When a statement begins with an IF, DO or SELECT instruction, that statement can extend across multiple lines and thus include multiple clauses. Note that in this case a statement can contain other statements inside it. The statement:

IF X = 7 THEN
 SAY "X Is 7"
ELSE
 SAY "X Is Not 7"

contains two SAY statements (and clauses) embedded inside it.

My terminology is somewhat different than Cowlishaw used in The REXX Language and so it is somewhat different from books based on that work. My terminology is much closer to what Daney used in Programming in REXX, but again not exactly. I mention this in case you read either of these books (they are both excellent) you can be aware of the differences. Happily, except for explaining how REXX works, terminology is not a big deal. In fact, once you finish this chapter, this terminology will not be a concern for you at all.

File

The entire REXX program must be contained in a single file; unlike some languages like C and C++ that allow you to break a large program down into logical components and store each component in a separate file. That is not to say that one REXX program can not call another REXX program or procedure stored in a separate file, it can. It just means that these two files are treated as separate programs where one is calling another and (perhaps) passing some information to in and (perhaps) receiving some information from it rather then them being two pieces of the same program.

More On Clauses

As we saw above, a clause normally ends at the end of the line. In addition to the end of the line, three other things will terminate a clause:

  1. A semicolon.
  2. The keywords THEN, ELSE or OTHERWISE, when used appropriately.
  3. A colon when used as the second token.

Several conditions will cause a clause to span more than one line, so let's look at this issue in more depth.

A comment begins with the "/*" characters and end with the "*/" characters, without regard to the end of lines. Thus, both of the following are valid comments:

/* This is a comment */

/*
This
is
also
a
comment
*/

Comments can also be nested, so the following is also a valid comment:

/* This Is A /*Comment*/ Inside A Comment */

Of course, when used in this context it makes little sense. However, suppose you had the following section of code:

X = X + 1 /* Increment Counter */
SAY "X Is" X /* Display Value */

and you wanted to temporarily "turn off" this section of code. You don't want to remove it since you may want to turn it back on later. The easiest way to do this is to "comment out" the code as follows:

/*
X = X + 1 /* Increment Counter */
SAY "X Is" X /* Display Value */
*/

Even though this results in a nested comment, it is an acceptable way to temporarily turn off the code. This is typically done during debugging. Once the code is operational, this section can be turned back on by removing the extra comment markers.

Nested comments must have matched pair of beginning and ending comment markers. That is, the entire comment must have the same number of beginning and ending markers and when counting from start to end, the number of ending comment markers must never exceed the number of beginning comment markers. The following comments are both invalid:

/* This Is */ An */ Invalid Comment /*
/* This Is Also /* An /* Invalid */ Comment */

The first comment has two ending comment markers after "An" and only one beginning comment markers so the number of ending markers exceeds the number of beginning markers. The second comment has three beginning markers and only two ending markers.

Many REXX clauses can end up being very long. While the OS/2 Enhanced Editor has no problem working with very long lines, they can be hard to read and debug since you can not see the entire line on the screen at once. REXX solves this problem by allowing you to end a line with a comma. When the line is tokenized, the comma/line return are replaced with a single space and so the two lines are recombined. Thus, the instruction:

SAY "Ronny",
"Richardson",
"Wrote This Book"

would display the text "Ronny Richardson Wrote This Book" all on one line even though the instruction is spread out over three lines.

Three notes about using the comma for line continuation are important. First, string enclosed in quotation marks must be all on one line, so the instructions:

Message = "You Have Made,
  A Mistake"
Message = "You Have Made",
  "Another Mistake"

are both invalid because they split a string definition across two lines. The fact that the first splits the quotation marks and the second does not does not make any difference.

Second, as will be discussed in detail in Chapter 7, two strings can be combined using the string concatenation operator "||". So, the instruction:

Message = "You Have Made "||"A Mistake"

would be a valid instruction. Therefore, the instruction

Message = "You Have Made ",
  ||"A Mistake"

is also valid since you are splitting the concatenation operation and not the string definition operation.

Third, you must be careful when using a comma to split a line in the middle of a call for a function or procedure. Assuming you had a function called LocateCursor that required a row and column number and then positioned the cursor at that screen position. When run on a single line, you might call this function with the instruction:

CALL LocateCursor 12,20

If you were to split this to two lines using the instruction:

CALL LocateCursor 12,
20

it would be invalid because the single comma functions as the line separator and it and the line feed are replaced by a space. To split the line in this fashion, you must have one comma for the line split character and a second comma to separate the function arguments, as shown below:

CALL LocateCursor 12,,
20

Of course, this is not good programming practice.

Statement Tokenization

Before we discuss tokenization, we need to look at the way REXX classifies characters. Characters fall into one of four categories:

  1. Symbol Characters. These are characters like A-Z that you can use in a variable name. The first character must be A-Z, a-z, an exclamation point, question mark or underscore. REXX translates lowercase letters to uppercase before using it. The rest of the variable name may also use a numeral 0-9. Periods may also be used in a variable name but it has a special meaning and should be avoided until you are familiar with the rules for forming compound variables. (This is discussed in detail in Chapter 5.)
  2. Operating Characters. These are characters like the plus and minus sign that indicate mathematical operations and logical characters like the greater-than sign.
  3. Punctuation Characters. These are characters like the comma and semicolon that indicate REXX punctuation.
  4. Invalid Characters. If a character is not one of the above three categories, then it is an invalid character and will generate an error message.

When REXX tokenizes a statement, it performs the following:

  1. All leading spaces are stripped off the statement.
  2. If the first character is a "symbol character" then the line begins with either a symbol or number. Every character up to the first non-symbol character is combined together with any letters converted to uppercase.
  3. If the first character is an "operator character" then the line begins with a REXX operation. Spaces around the operator are removed and every character up to the first non- operator character is combined into a single operator.
  4. If the first character is a "special character" then it is treated as a token as special characters are always treated as tokens. Spaces around the special character are removed, except for a space preceding an open parenthesis or after a close parenthesis.
  5. Comments are removed. They can occur in the middle of multi-character operators (a bad idea) but a comment marks the end of a token so they should not be used inside variable names, character strings and the like.
  6. A single or double quotation mark marks the beginning of a string literal. Either quotation mark is acceptable. Any character (except the type of quotation mark used to begin the string) is allowable inside quotation marks. If the same quotation mark used to start the string literal is needed inside it, use that mark twice. For example, SAY 'Don''t Do That!' A non-doubled quotation mark of the same type that was used to start a string literal is used to terminate the string literal. A "b" or "x" immediately following a string literal, it causes the string literal to be treated as either binary or hexadecimal.
  7. A few special cases, such as scientific notation of numbers, are handled on an ad hoc basis.

Based on these rules, there are seven types of tokens that REXX recognizes:

  1. Binary Numbers. These are numbers expressed in base two. They are made up of zeros and ones. When storing a binary value to a variable, the ending quotation mark must be immediately followed by the letter b. Blanks may be used inside the string to make it more readable. Except for the first character set, the characters must be grouped in sets of four when spaces are used.
  2. Character String Literals. These are series of any characters enclosed in single or double quotation marks.
  3. Hexadecimal Numbers. These are numbers expressed in base sixteen. They are made up of the digits 0-9 and A-F. Capitalization of the letters does not matter. When storing a hexadecimal value to a variable, the ending quotation mark must be immediately followed by the letter x. Blanks may be used inside the string to make it more readable. Except for the first character, the characters must be grouped in pairs when spaces are used.
  4. Numbers. Numbers are special strings that contain only an optional plus or negative sign, 0-9, zero or one period and zero or one exponential suffix (e) followed by a plus or negative sign and one or more digits.
  5. Operators. These are one or more operator characters.
  6. Symbols. A symbol is a stream of one or more symbol characters. If it is the first token, it may be a reserved word; although, that is not required. If it is not the first token, it may still be a reserved word, like THEN, that is not required to be at the beginning of a clause. If it is not a reserved word, then it is either a system command or a variable name.
  7. Syntax Symbols. These are symbols, like the colon in a label name and the semicolon, that are treated as tokens by themselves.

Given all the above information, we are now ready to see how REXX pulls together tokens into clauses:

It Runs Faster The Second Time!

The first time you run a new or newly modified program, REXX recognizes that fact. After tokenizing the program, REXX stores the tokenized version in the OS/2 extended attributes. It does this without modifying the original ASCII version. Now, when you run this program again, the pre-tokenized version is loaded and run rather than the ASCII version. Once a REXX program has been executed once, this saves the time required for the tokenization process.

You can see this by creating a small REXX program and using the DIR command before and after running it. If you have a High Performance File System [HPFS] then the size of the extended attributes show up automatically as the second size number. If you have a file allocation table [FAT] system, then you must use a /N after the DIR command to see the extended attributes. Figure 2-1 shows the DIR command for the program TYPING.CMD. The first time is on an HPFS system. Here, you see the file itself is 3,929-bytes and the extended attributes are 6,979- bytes. The second DIR command is on a FAT system without the /N switch and it only shows the size of the file.

Summary

We saw that the most basic unit of a REXX program is a token. In its simplest form, a token is a unit of a REXX program that it does not make sense to break down any further. We saw that REXX has four different types of tokens: literal strings, operators, symbols and special characters.

We saw that a reserved word (also called a keyword) is a special token, a word REXX reserves to itself and does not allow you to use. We saw that this restriction only applies when to the first token in a clause. We also saw that REXX allows you to use keywords as variable names in assignment statements.

We saw that when several clauses are combined together they form a complete REXX instruction, called a Clause. We saw that there are four types of clauses: labels, assignments, instructions and OS/2 commands. We saw that there are four things that can end a clause: the end of the line unless it ends with a comma, a semicolon, the keyword "THEN" when following an IF or WHEN instruction and a colon when it's the second token and therefore identifies a label.

We saw that sometimes it takes several clauses combined together to form a complete instruction. We called the complete instruction a statement. We saw that a statement is the basic unit of code in a REXX program. Unless a statement begins with an IF, DO or SELECT instruction, a statement and clause are identical. When a statement begins with an IF, DO or SELECT instruction, that statement can extend across multiple lines and thus include multiple clauses.

We saw that the entire REXX program must be contained in a single file; unlike some languages like C and C++ that allow you to break a large program down into logical components and store each component in a separate file. While a REXX program can call other REXX programs as subroutines or procedures, the interaction between the original and calling program is fairly limited.

We saw that a clause can cover more than one line when it is a comment and the comment covers multiple lines or when the statement ends in a comma.

We saw that characters fall into one of four categories: symbol characters, operating characters, punctuation characters and invalid characters.

We saw that when REXX tokenizes a statement, it performs the following steps, striping off leading spaces, combining leading symbol characters into a symbol or number, combining leading operator characters into operators, treating each leading special character as a separate token, removing comments, treating text inside quotation marks as string literals and treating a few special cases on an ad hoc basis.

We saw that there are seven types of tokens that REXX recognizes: binary numbers, character string literals, hexadecimal numbers, numbers, operators, symbols and syntax symbols. We saw that REXX creates four different types of statements: labels, assignments, keyword instructions and commands for the operating system.

Finally, we saw that the first time a new or modified REXX program is run, REXX stores the tokenized version in the extended attributes so it can load and run the program faster the next time.

 

 

© 2002 by Ronny Richardson, All Rights Reserved