An complete and operational REXX program can consist of as few as two lines. For example:
/* Comment */
SAY "Hello Word! I'm A New REXX Program"
OS/2 uses the same .CMD extension for REXX programs as it does for its native batch files so simply looking at the extension is not enough to tell OS/2 if the file contains a batch file or a REXX program. To overcome this, OS/2 requires that all REXX programs begin with the two characters "/*" without the quotation marks.
This is not as strange a requirement as it first sounds. REXX delimits the beginning of a comment with the "/*" symbols. Everything following those symbols is ignored as a comment by REXX until it encounters a set of "*/" symbols. Comments can span across lines. For example, the following would be a valid beginning for the simple program shown above:
/* NAME: HIWORLD.CMD
PURPOSE: Demonstrate Adding Comments To REXX Programs
VERSION: 1.00
DATE: October 25, 1993
COPYRIGHT: 1994 McGraw Hill */
SAY "Hello Word! I'm A New REXX Program"
Even though it now spans five lines, everything between the "/*" and the "*/" symbols is treated as a comment by REXX and ignored. The SAY on the last line is a REXX reserved word. It is the same as the ECHO command in a batch file, it tells REXX to display information on the screen.
Two comments about the above example are in order. First, REXX programs are designed to run under multiple platforms with a minimum of translation between platforms. The requirement that REXX programs begin with a comment is a requirement of OS/2 and not REXX itself. If you receive a REXX program from another platform, it may not begin with a comment. In that case, you will have to add a comment to the top of the program. Second, the requirement to use the .CMD extension is also an OS/2 requirement. The naming conventions for other platforms may be different.
A REXX program consists of the following components:
The most basic unit of a REXX program is a token. In its simplest form, a token is a unit of a REXX program that it does not make sense to break down any further. For example, the line:
SAY "Hello Word! I'm A New REXX Program"
contains two tokens, the SAY instruction and the character string it displays on the screen. This instruction could also be written as:
SAY"Hello Word! I'm A New REXX Program"
with no space between the SAY instruction and the phrase so there are only two tokens.
This instruction can be written as:
SAY Hello Word! I'm A New REXX Program
without the quotation marks and REXX will display the phrase in all uppercase. Now, the space between the SAY instruction and "Hello" is required since REXX would not know what "SAYHello" meant. However, any spaces on this line after the "Hello" are simply for human readability and are not required by REXX, so this line has three token: the SAY instruction, the space after the SAY instruction and the phrase "Hello Word! I'm A New REXX Program".
REXX has four different types of tokens:
A reserved word is a special token, a word REXX reserves to itself and does not allow you to use. For example, the SAY instruction in the example above is a reserved word. Reserved words are only reserved at the beginning of a clause, so the following would be a valid program line:
SAY Say Don't I Know You
The second "Say" is not at the start of a clause so it is just part of the message this REXX program will display on the screen. Since the message is not in quotation marks, REXX will display it in all uppercase. This will be discussed in more detail later.
REXX is much more lenient than other programming languages. You might think that the instruction:
SAY = 123
would use the SAY reserved word to display "= 123" on the screen. However, REXX tries to anticipate what you are trying to do and in doing so "do the right thing" so it creates a variable called Say containing "= 123". This is, of course, very bad programming practices. Of course, if you really wanted to display "= 123" on the screen, you could use the nstruction:
SAY "= 123"
and everything would work out as you wished.
A few tokens, like the EXIT instruction, make sense by themselves. However, most tokens must be combined with other tokens to form a complete instruction. When several tokens are combined together to form a complete REXX instruction, that instruction is called a Clause. An entire clause is executed if it is error-free. If a clause contains an error, none of that clause is executed.
In general, a clause consists of one or more tokens and zero or more spaces (which are ignored). A clause may begin with one or more spaces. If a clause begins with spaces, these spaces are ignored. For example, the instructions:
SAY "Hello";
SAY "Hello"
are identical. Clauses end with a semicolon (;) but REXX adds these automatically to the end of most lines so most OS/2 REXX programmers leave them out of their programs.
Under OS/2, clauses are limited to a maximum length of 500 characters. Cowlishaw does not specify a maximum length so this is implementation-specific. For example, the Personal REXX version from Quercus has a maximum length of 1,000.
When REXX encounters a clause, it scans it from left to right and performs a process called tokenization. It is this "tokenized" clause that REXX executes. We will discuss this in more detail later in the chapter.
Since comments are removed from a clause before it is executed, comments can be included anywhere in a statement. For example, the following program segment stores a value to a variable then performs a logic test on that variable:
X = 7
IF X=7 THEN SAY "X Is 7"
That code above would execute identically to this code:
X = 7
IF X=/* Comment */7 THEN SAY "X Is 7"
even though this code has a comment in the middle. They execute the same because tokenization removes the comment "/* Comment */" from the second code segment, leaving it the same as the first segment. Programmers often use this fact to add comments to the end of lines of code that might take a little explanation to understand. However, including comments inside code, as was done is this example above, is very bad programming form that, while it works, makes your programs very hard to follow. The code that follows is a much better approach to documenting specific lines:
X = 7 /* Assign A Value */
IF X=7 THEN SAY "X Is 7" /* Test That Value */
There are four types of clauses:
X
= "Ronny"
Y = 100
If the first token is not a legal variable name or is not a variable name at all, then while the clause is still an assignment clause, it contains an error. For example, the following are all invalid assignment clauses:
1 = "Ronny" /* 1 Is Not A Value Variable Name */
"X" = 7 /* "X" Is A Literal, Not A Variable */
Since an assignment clause begins with a variable name and not a reserved word, the instruction:
SAY = 123
is valid since REXX treats SAY as a token containing a valid variable name and not a token containing an instruction.
Message =
"Hello From A REXX Program"
ECHO Message
would be passed to OS/2, but not before the variable Message had been expanded, so the command to OS/2 would be:
ECHO Hello From A REXX Program
As we have seen, a clause is a series of tokens, usually ending at the end of a line. However, there are four different things that can end a clause:
As complex as this sounds, it is generally true that the end of a line marks the end of a clause.
We defined a clause as "several clauses combined together to form a complete REXX instruction." Another way of looking at this is a clause is a complete line ending with an actual or implied semicolon. However, not all REXX instructions are complete when made up of just a single line. For example:
X = 7 IF X = 7 THEN SAY "X Is 7" ELSE SAY "X Is Not 7"
consists of five lines and five clauses but not five logically independent actions. The first line is clearly an independent action, it assigns a value to a variable. That line would make sense in a program by itself. The two SAY lines would equally make sense in programs by themselves. However, the IF and ELSE lines are part-and-parcel of one overall logic test. In fact, the two SAY lines are equally part of this logic test. In addition to the IF instruction, the DO and SELECT instructions automatically generate more than one line of code.
A statement then is the basic unit of code in a REXX program. Unless a statement begins with an IF, DO or SELECT instruction, a statement and clause are identical. When a statement begins with an IF, DO or SELECT instruction, that statement can extend across multiple lines and thus include multiple clauses. Note that in this case a statement can contain other statements inside it. The statement:
IF X = 7 THEN SAY "X Is 7" ELSE SAY "X Is Not 7"
contains two SAY statements (and clauses) embedded inside it.
My terminology is somewhat different than Cowlishaw used in The REXX Language and so it is somewhat different from books based on that work. My terminology is much closer to what Daney used in Programming in REXX, but again not exactly. I mention this in case you read either of these books (they are both excellent) you can be aware of the differences. Happily, except for explaining how REXX works, terminology is not a big deal. In fact, once you finish this chapter, this terminology will not be a concern for you at all.
The entire REXX program must be contained in a single file; unlike some languages like C and C++ that allow you to break a large program down into logical components and store each component in a separate file. That is not to say that one REXX program can not call another REXX program or procedure stored in a separate file, it can. It just means that these two files are treated as separate programs where one is calling another and (perhaps) passing some information to in and (perhaps) receiving some information from it rather then them being two pieces of the same program.
As we saw above, a clause normally ends at the end of the line. In addition to the end of the line, three other things will terminate a clause:
Several conditions will cause a clause to span more than one line, so let's look at this issue in more depth.
A comment begins with the "/*" characters and end with the "*/" characters, without regard to the end of lines. Thus, both of the following are valid comments:
/* This is a comment */
/*
This
is
also
a
comment
*/
Comments can also be nested, so the following is also a valid comment:
/* This Is A /*Comment*/ Inside A Comment */
Of course, when used in this context it makes little sense. However, suppose you had the following section of code:
X = X + 1 /* Increment Counter */
SAY "X Is" X /* Display Value */
and you wanted to temporarily "turn off" this section of code. You don't want to remove it since you may want to turn it back on later. The easiest way to do this is to "comment out" the code as follows:
/*
X = X + 1 /* Increment Counter */
SAY "X Is" X /* Display Value */
*/
Even though this results in a nested comment, it is an acceptable way to temporarily turn off the code. This is typically done during debugging. Once the code is operational, this section can be turned back on by removing the extra comment markers.
Nested comments must have matched pair of beginning and ending comment markers. That is, the entire comment must have the same number of beginning and ending markers and when counting from start to end, the number of ending comment markers must never exceed the number of beginning comment markers. The following comments are both invalid:
/* This Is */ An */ Invalid Comment /*
/* This Is Also /* An /* Invalid */ Comment */
The first comment has two ending comment markers after "An" and only one beginning comment markers so the number of ending markers exceeds the number of beginning markers. The second comment has three beginning markers and only two ending markers.
Many REXX clauses can end up being very long. While the OS/2 Enhanced Editor has no problem working with very long lines, they can be hard to read and debug since you can not see the entire line on the screen at once. REXX solves this problem by allowing you to end a line with a comma. When the line is tokenized, the comma/line return are replaced with a single space and so the two lines are recombined. Thus, the instruction:
SAY "Ronny",
"Richardson",
"Wrote This Book"
would display the text "Ronny Richardson Wrote This Book" all on one line even though the instruction is spread out over three lines.
Three notes about using the comma for line continuation are important. First, string enclosed in quotation marks must be all on one line, so the instructions:
Message = "You Have Made, A Mistake" Message = "You Have Made", "Another Mistake"
are both invalid because they split a string definition across two lines. The fact that the first splits the quotation marks and the second does not does not make any difference.
Second, as will be discussed in detail in Chapter 7, two strings can be combined using the string concatenation operator "||". So, the instruction:
Message = "You Have Made "||"A Mistake"
would be a valid instruction. Therefore, the instruction
Message = "You Have Made ", ||"A Mistake"
is also valid since you are splitting the concatenation operation and not the string definition operation.
Third, you must be careful when using a comma to split a line in the middle of a call for a function or procedure. Assuming you had a function called LocateCursor that required a row and column number and then positioned the cursor at that screen position. When run on a single line, you might call this function with the instruction:
CALL LocateCursor 12,20
If you were to split this to two lines using the instruction:
CALL LocateCursor 12,
20
it would be invalid because the single comma functions as the line separator and it and the line feed are replaced by a space. To split the line in this fashion, you must have one comma for the line split character and a second comma to separate the function arguments, as shown below:
CALL LocateCursor 12,,
20
Of course, this is not good programming practice.
Before we discuss tokenization, we need to look at the way REXX classifies characters. Characters fall into one of four categories:
When REXX tokenizes a statement, it performs the following:
Based on these rules, there are seven types of tokens that REXX recognizes:
Given all the above information, we are now ready to see how REXX pulls together tokens into clauses:
The first time you run a new or newly modified program, REXX recognizes that fact. After tokenizing the program, REXX stores the tokenized version in the OS/2 extended attributes. It does this without modifying the original ASCII version. Now, when you run this program again, the pre-tokenized version is loaded and run rather than the ASCII version. Once a REXX program has been executed once, this saves the time required for the tokenization process.
You can see this by creating a small REXX program and using the DIR command before and after running it. If you have a High Performance File System [HPFS] then the size of the extended attributes show up automatically as the second size number. If you have a file allocation table [FAT] system, then you must use a /N after the DIR command to see the extended attributes. Figure 2-1 shows the DIR command for the program TYPING.CMD. The first time is on an HPFS system. Here, you see the file itself is 3,929-bytes and the extended attributes are 6,979- bytes. The second DIR command is on a FAT system without the /N switch and it only shows the size of the file.
We saw that the most basic unit of a REXX program is a token. In its simplest form, a token is a unit of a REXX program that it does not make sense to break down any further. We saw that REXX has four different types of tokens: literal strings, operators, symbols and special characters.
We saw that a reserved word (also called a keyword) is a special token, a word REXX reserves to itself and does not allow you to use. We saw that this restriction only applies when to the first token in a clause. We also saw that REXX allows you to use keywords as variable names in assignment statements.
We saw that when several clauses are combined together they form a complete REXX instruction, called a Clause. We saw that there are four types of clauses: labels, assignments, instructions and OS/2 commands. We saw that there are four things that can end a clause: the end of the line unless it ends with a comma, a semicolon, the keyword "THEN" when following an IF or WHEN instruction and a colon when it's the second token and therefore identifies a label.
We saw that sometimes it takes several clauses combined together to form a complete instruction. We called the complete instruction a statement. We saw that a statement is the basic unit of code in a REXX program. Unless a statement begins with an IF, DO or SELECT instruction, a statement and clause are identical. When a statement begins with an IF, DO or SELECT instruction, that statement can extend across multiple lines and thus include multiple clauses.
We saw that the entire REXX program must be contained in a single file; unlike some languages like C and C++ that allow you to break a large program down into logical components and store each component in a separate file. While a REXX program can call other REXX programs as subroutines or procedures, the interaction between the original and calling program is fairly limited.
We saw that a clause can cover more than one line when it is a comment and the comment covers multiple lines or when the statement ends in a comma.
We saw that characters fall into one of four categories: symbol characters, operating characters, punctuation characters and invalid characters.
We saw that when REXX tokenizes a statement, it performs the following steps, striping off leading spaces, combining leading symbol characters into a symbol or number, combining leading operator characters into operators, treating each leading special character as a separate token, removing comments, treating text inside quotation marks as string literals and treating a few special cases on an ad hoc basis.
We saw that there are seven types of tokens that REXX recognizes: binary numbers, character string literals, hexadecimal numbers, numbers, operators, symbols and syntax symbols. We saw that REXX creates four different types of statements: labels, assignments, keyword instructions and commands for the operating system.
Finally, we saw that the first time a new or modified REXX program is run, REXX stores the tokenized version in the extended attributes so it can load and run the program faster the next time.
© 2002 by Ronny Richardson, All Rights Reserved