SWAGOLX.EXE (c) 1993 GDSOFT ALL RIGHTS RESERVED 00008 PARSING/TOKENIZING ROUTINES 1 05-28-9313:54ALL SWAG SUPPORT TEAM PARSENUM.PAS IMPORT 55 D¢} Typeπ RW_toKEN = Recordπ token_str :String[9];π token_cod :toKEN_CODE;π end;ππ RW_Type = Array[0..9] of RW_toKEN;π RWT_PTR = ^RW_Type;ππConstπ NULL = '';ππ Rw_2 :RW_Type = ((token_str : 'do'; token_cod : tdo),π (token_str : 'if'; token_cod : tif),π (token_str : 'in'; token_cod : tin),π (token_str : 'of'; token_cod : tof),π (token_str : 'or'; token_cod : tor),π (token_str : 'to'; token_cod : tto),π (token_str : NULL; token_cod : NO_toKEN),π (token_str : NULL; token_cod : NO_toKEN),π (token_str : NULL; token_cod : NO_toKEN),π (token_str : NULL; token_cod : NO_toKEN)π );ππ ...the difference being the explicit declaration of the Constantπ Record fields. (I'm used to Array Constants, not Recordπ Constants - I was unaware of the requirement)ππ PARSinG NUMBERSππ Now we'll concentrate on parsing Integer and Real numbers.ππ The Pascal definition of a number begins With an UNSIGNEDπ Integer. An unsigned Integer consists of one or more consecutiveπ DIGITS. The simplest Form of a number token is an unsignedπ Integer:ππ 1 9 120 12654ππ A number token can also be an unsigned Integer (the whole part)π followed by a fraction part. A fraction part consists of aπ decimal point followed by an unsigned Integer, such as:ππ 123.45 0.9987564ππ These numbers have whole parts 123 and 0 respectively, andπ fraction parts .45 and .9987564 respectively.ππ A number token can also be a whole part followed by an EXPONENTπ part. An exponent part consists of an "E" (or "e") followed byπ an unsigned Integer. An optional exponent sign, + or -, canπ appear between the letter and the first exponent digit.π Examples:ππ 134e2 2E99 123e-45 73623E+4ππ Finally, a number token can be a whole part followed by aπ fraction part and an exponent part, in that order:ππ 2.3498E7 0.00034e-66ππ I arbitrarily limit the number of digits to 20, and the exponentπ value from -37 to +37 - the exact value necessary to limit thisπ value is dependant on how Real values are represented on theπ Computer.ππ The "get_number" Function is likely to be the biggest Functionπ in your scanner, but it should be relatively straighForward toπ code...in light of what has already been done With the scanner/π tokenizer module, and the definition of a number.ππ EXERCISE #1ππ Write the get_number Function to parse Integers and Realπ numbers.ππ You will need to add the following Types and Variables to yourπ global data segment:ππ Type { add "Real"s to list... }ππ LITERAL_Type = (Integer_LIT, Real_LIT, String_LIT);ππ LITERAL_REC = Recordπ Case lType:LITERAL_Type ofπ Integer_LIT: (ivalue :Integer);π Real_LIT : (rvalue :Real );π String_LIT : (svalue :String );π end;ππ Varππ digit_count :Word;π count_error :Boolean;ππ-------------- PART 2 ---------------------------------------ππ The rest of this post will cover two simple topics - parsingπ Strings inside quotes, and parsing comments.ππ PARSinG COMMENTS {}ππ The Compiler should ignore the input between two curly bracesπ ({}), and the curly braces themselves. My scanner is written soπ the entire comment is replace by a Single blank (" "), althoughπ you could possibly Write the scanner so that comments areπ _totally_ ignored.ππ EXERCISE #2:ππ Integrate COMMENT detection into the get_Char routine, so thatπ when your Character fetching routine will ignore comments andπ pass a blank when a comment is encountered, skipping the commentπ entirely For the next fetch.ππ Make sure that the routine keeps reading Until the right curlyπ brace is detected, even past the end-of-line. if the end-of-Fileπ is encountered beFore the right curly brace is found, anπ "unexpected end" error should be generated.ππ PARSinG StringS (QUOTES) ''ππ The quote Character delimits Strings, any Character between theπ Strings is ignored by the Compiler, except to stored as a Stringπ LITERAL. if you wish a ' (quote) to be included in the literal,π and extra ' must precede it.ππ One possible tricky area is the {} (comment) Character. You mustπ be careful not to inadvertently trigger the comment routine withinπ the quote routine While reading a String, otherwise you willπ have a BUG.ππ EXERCISE #3:ππ Add a quote routine to the get_token routine within your module,π to fetch Strings, as a LITERAL IDENTifIER when the QUOTEπ Character is detected.ππ The following mods to your Types are required:ππ Eof_Char = #$7F;ππTypeπ Char_CODE = (LETTER, DIGIT, QUOTE, SPECIAL, Eof_CODE);ππ { The following code init's the Character maping table: }ππVarπ ch :Byte;πbeginπ For ch := 0 to 255 doπ Char_table[ch] := SPECIAL;π For ch := ord('0') to ord('9') doπ Char_table[ch] := DIGIT;ππ For ch := ord('A') to ord('Z') doπ Char_table[ch] := LETTER;π For ch := ord('a') to ord('z') doπ Char_table[ch] := LETTER;ππ Char_table[ord(Eof_Char)] := Eof_CODE;ππ Char_table[39] := QUOTE;πend;ππ ----------------------------------------------------------------ππ PLEASE, please let me know what you think about these posts,π even if they're negative - I want to have some feedback on theπ difficulties, and whether or not people are having troubleπ following the material - I _can_ be more concise at the cost ofπ being more verbose - if it's needed!ππ if you are having problems With your source code, and want me toπ do a detailed examination of your code, expecially if it'sπ written in a language other than Pascal, send me email via theπ Internet - to avoid "carpet bombing" the conference withπ undesired material.πππ NEXT POST:ππ Error codes, and putting your code to the test - our firstπ utility (other than the lister) : a source Program Compactorπ (not cruncher).ππ FUTURE POSTS:ππ - Review and (hopefully) a status report from "students"π - Symbol tableπ - YA utility (cross - referencer)π - YA utility (source Program CRUNCHer)π - YA utility (source Program UNcruncher)π - Parsing simple expressionsπ - Utility : CALC, using infix-to-postfix conversions and stackπ ops.π - Parsing statementsπ - Utility: Pascal syntax checker part Iπ - Parsing declarations (Var, Type, etc)π incl's: much improved (and much more Complex) symbol tableπ - Utility: Declarations analyzer.π - Syntax Checker part IIπ - Parsing Program, Procedure, and Function declarationsπ (routines).π - Syntax checker Part IIIππ - Review and discussion?π 2 05-28-9313:54ALL SWAG SUPPORT TEAM PARSEWRD.PAS IMPORT 33 Dû/ Program PARSER;ππ{The Object of this Program is to accept a sentence from the user then to break theπ sentence into its Component Words and to display each Word on a separate line.π}ππUses Crt; {Required by Turbo Pascal}ππConstπ maxWord = 15;π maxsentence = 15;π space = CHR(32);π first = 1;ππTypeπ Strng = Array[1..maxWord] of Char;π Word = Recordπ body : Strng;π length : Integerπ end;ππVarπ sentence : Array[1..maxsentence] of Word;π row, col, nextcol, count : Integer;π demarker : Boolean;π ans : Char;ππProcedure SpaceTrap;π{ Insures that there is ony 1 space between Words }πbeginπ Repeatπ READ(sentence[row].body[first])π Until sentence[row].body[first] <> spaceπend;ππProcedure StringWrite(Var phrase : Word);π{Writes only the required length of each Character String.πThis is required when using 32 col. mode.}πVarπ letter : Integer;πbeginπ For letter := first to phrase.length doπ Write(phrase.body[letter])π end; {Procedure StringWrite}ππ Procedure StringRead;π Var I : Integer;π beginπ {π Intitialize the Variablesπ }π count := 1;π row := first;π col := first;π nextcol := col + 1;π demarker := False;π For I := first to maxsentence doπ sentence[I].length := 1;π Write('Type a sentence > ');π {READLN;} {Clears the buffer of EOLN}π {Required by HiSoft Pascal}π While (not EOLN) and (row < maxsentence) doπ beginπ READ(sentence[row].body[col]);π if sentence[row].body[first] = space then SpaceTrap;π if sentence[row].body[col] = space thenπ demarker := True;π if (not demarker) and (nextcol < maxWord) thenπ beginπ col := col + 1;π nextcol := nextcol + 1π endπ elseπ beginπ sentence[row].length := col;π count := count + 1;π row := row + 1;π col := first;π nextcol := col + 1;π demarker := Falseπ end; {if...then...else}π if EOLN then sentence[row].length := col - 1π {Accounts For the last Word entered less the EOLN marker.}π end {While loop}π end; {Procedure StringRead}ππ Procedure PrintItOut;π Varπ subsequent : Integer;π beginπ subsequent := first + 1;π Write('Parsing > ');π StringWrite(sentence[first]);π WriteLN;π if count >= subsequent thenπ beginπ For row := subsequent to count doπ beginπ Write(' ');π StringWrite(sentence[row]);π WriteLNπ endπ endπ end; {Procedure PrintItOut}ππ Procedure SongandDance;π beginπ {PAGE;} {HiSoft Pascal = Turbo Pascal ClrScr}π ClrScr;π WriteLN(' Parser');π WriteLN;π WriteLN(' Program By David Solly');π WriteLN;π WriteLN(' The Object of this Program');π WriteLN('is to accept a sentence from');π WriteLN('the user then to break the');π WriteLN('sentence down into its');π WriteLN('Component Words and to display');π WriteLN('each Word on a seperate line.');π WriteLN;π WriteLN;π end; {Procedure SongandDance}ππ begin {Main Program}π SongandDance;π StringRead;π WriteLN;π PrintItOut;π WriteLN;π WriteLN('end of Demonstration.');π READLN(ans);π end. {Main Program}π 3 08-17-9308:50ALL RYAN THOMPSON Command Line Parsing IMPORT 37 DQc ===========================================================================π BBS: Canada Remote SystemsπDate: 08-10-93 (01:00) Number: 33744πFrom: RYAN THOMPSON Refer#: NONEπ To: TERRY GRANT @ 912/701 Recvd: NO πSubj: RE: COMMAND LINE PARSING Conf: (1221) F-PASCALπ---------------------------------------------------------------------------π>>> Quoting message from Terry Grant @ 912/701 to Allπ>>> Original sent 07 Aug 93 20:36:00 about Command Line ParsingππTG> Hello All!πTG>πTG> After working on this for awhile, I thought mabe someone else could helpπTG> me out a little here. All I need this to do is Parse the command line forπTG> seven parameters,πTG>πTG> The BaudRate (/B),πTG> :πTG> and Overlay Size (/O).πTG>πTG> My Main problem here is, it will SEE the command line, But WILL NOT allowπTG> me to use anything AFTER the Switch ? Like /B2400 !ππ Sure thing! I once wrote a unit which among other things has some neatπparsing for the command line. Here's a snippet:ππ{- Top -}ππ Function SwitchNum(S : String) : Integer;π { If a switch character specified exists, return which position }π { it is in on the command line. Used internally. }π Varπ Temp : String;π X,π Y : Integer;π Beginπ Temp:= '';π X:= ParamCount;π Y:= 0;π while (X > 0) and (Y = 0) do beginπ Temp:= ParamStr(X);π if (Temp[1] = '/') or (Temp[1] = '-') thenπ if UpCase(Temp[2]) = UpString(S) then Y:= X;π Dec(X);π end;π SwitchNum:= Y;π End;πππ Function SwitchThere(S : String) : Boolean;π { Returns TRUE if a switch of the character specified exists. }π Beginπ If SwitchNum(S) = 0 then SwitchThere:= Falseπ else SwitchThere:= True;π End;πππ Function SwitchData(S : String) : String;π { Return the data following a switch: /B2400 returns 2400. }π Varπ Temp : String;π Beginπ If SwitchNum(S) > 0 then beginπ Temp:= ParamStr(SwitchNum(S));π Delete(Temp, 1, 2);π endπ else Temp:= '';π SwitchData:= Temp;π End;πππ Function Parameter(N : Byte) : String;π { Returns the Nth command line parameter. Parameters in quotes }π { are returned with the spaces in between: /D Test "One Two" }π { Returns >Test< for Parameter(1) and >"One Two< for Parameter(2) }π { This allows you to, if you like, see what type of quote was }π { used, for perhaps literal vs. translate to ALL CAPS. }π Varπ X,π Count : Byte;π Parm,π Temp : String;π Beginπ X:= 0;π Count:= 0;π Parm:= '';π If ParamCount > 0 then repeatπ Inc(X);π Temp:= ParamStr(X);π If (Temp[1] = '"') or (Temp[1] = '''') then beginπ Parm:= Temp;π If X < ParamCount then repeatπ Inc(X);π Parm:= Parm + ' ' + ParamStr(X);π until (Parm[Length(Parm)] = '"') orπ (Parm[Length(Parm)] = '''') or (X = ParamCount);π Inc(Count);π endπ else if (Temp[1] <> '/') and (Temp[1] <> '-')π then beginπ Inc(Count);π Parm:= Temp;π end;π until (X = ParamCount) or (Count = N);π If Count = N then Parameter:= Parmπ else Parameter:= '';π End;πππ Function Parameters : Byte;π { Return the number of non-switch parameters on the command line. }π Varπ X : Byte;π Beginπ X:= 0;π If ParamCount > 0 then beginπ Repeatπ Inc(X)π Until Parameter(X) = '';π Parameters:= X - 1;π endπ else Parameters:= 0;π End;ππ{- Fin -}ππ A few examples:ππ If SwitchThere('?') then DisplayHelp;π If SwitchThere('B') then BaudString:= SwitchData('B');π If Parameters < 1 then begin WriteLn('Too few parms'); Halt; end;π For X:= 1 to Parameters doπ beginπ Param[X]:= Parameter(X);π end;ππ Sample command lines:ππ TESTPROG /D /F TEST /B2400 "This is a test" /M-ππ Parameters returns 2,π Parameter(1) returns TESTπ Parameter(2) returns "This is a testπ SwitchThere('L') returns Falseπ SwitchData('M') returns -π SwitchData('G') returns null.ππ I hope this helps you out! It could be optimized a lot by simply readingπall of the parameters into an array in your initialization code, to eliminateπall of the redundant parsing, but I don't think that parsing time for a fewπhundred characters at most is a limiting factor of any sort. ;-)ππbyeπRyanππ--- Renegade v07-17 Betaπ 4 09-26-9309:12ALL MARTIN RICHARDSON Check for CmdLine switch IMPORT 7 D"s {*****************************************************************************π * Function ...... IsSwitch()π * Purpose ....... To test for the presence of a switch on the command lineπ * Parameters .... sSwitch Switch to scan the command line forπ * Returns ....... .T. if the switch was foundπ * Notes ......... Uses functions Command and UpperCaseπ * Author ........ Martin Richardsonπ * Date .......... September 28, 1992π *****************************************************************************}πFUNCTION IsSwitch( sSwitch: STRING ): BOOLEAN;πBEGINπ IsSwitch := (POS( '/'+sSwitch, UpperCase(Command) ) <> 0) ORπ (POS( '-'+sSwitch, UpperCase(Command) ) <> 0);πEND;π 5 09-26-9309:22ALL MARTIN RICHARDSON Parse out tokens IMPORT 16 DY▌ {*****************************************************************************π * Function ...... ParseCount()π * Purpose ....... To count the number of tokens in a stringπ * Parameters .... cString String to count tokens inπ * cChar Token separatorπ * Returns ....... Number of tokens in <cString>π * Notes ......... Uses function StripCharπ * Author ........ Martin Richardsonπ * Date .......... September 30, 1992π *****************************************************************************}πFUNCTION ParseCount( cString: STRING; cChar: CHAR ): INTEGER;πBEGINπ ParseCount := LENGTH(cString) - LENGTH(StripChar(cString, cChar)) + 1;πEND;ππ{*****************************************************************************π * Function ...... Parse()π * Purpose ....... To parse out tokens from a stringπ * Parameters .... cString String to parseπ * nIndex Token number to returnπ * cChar Token separatorπ * Returns ....... Token <nIndex> extracted from <cString>π * Notes ......... If <nIndex> is greater than the number of tokens inπ * <cString> then a null string is returned.π * . Uses function Left, Right, and ParseCountπ * Author ........ Martin Richardsonπ * Date .......... September 30, 1992π *****************************************************************************}πFUNCTION Parse( cString: STRING; nIndex: INTEGER; cChar: CHAR ): STRING;πVAR π i: INTEGER;π cResult: STRING;πBEGINπ IF nIndex > ParseCount( cString, cChar ) THENπ cResult := ''π ELSE BEGINπ cString := cString + cChar;π FOR i := 1 TO nIndex DO BEGINπ cResult := Left( cString, POS( cChar, cString ) - 1 );π cString := Right(cString, LENGTH(cString) - POS(cChar, cString));π END { Next I };π END { IF };π Parse := cResult;πEND;ππ 6 10-28-9311:35ALL RYAN THOMPSON Command Line Parsing IMPORT 31 Dîδ {===========================================================================π BBS: Canada Remote SystemsπFrom: RYAN THOMPSONπSubj: RE: COMMAND LINE PARSINGππ>>> Quoting from Chet Kress to Frans Van Duinen about Command Line ParsingππCK> FVD>I want to pass to my BP 7 program a few parameters, one of whichπCK> FVD>has embedded (or even trailing) blanks. The naive approach ofπCK> FVD>PROCFAX PROCFAX.CFG \PCB\MAIN\MSGS58 "FAX MAIL" does not work.πCK> FVD>Currently I pick up FAX and MAIL as two parameters andπCK> FVD>string, but I want to allow multiple embedded/trailing blanks.ππ Here's a set of routines to do just what you want.ππ Parameters Returns the number of parameters on the command line. Doesπ not include switches.π Parameter(n) Returns the nth parameter, ignoring switches and passingπ strings in quotes as " or ' followed by the entire stringπ including any imbedded spaces.π SwitchThere(x) Returns True if the switch specified by the characterπ passed is present on the command line.π SwitchData(x) Returns the data following the switch character if theπ switch character specified is present on the command line.π SwitchNum(x) Returns the position on the command line of the switchπ specified. Skips parameters. }πππ Function SwitchNum(S : String) : Integer;π Varπ Temp : String;π X,π Y : Integer;π Beginπ Temp:= '';π X:= ParamCount;π Y:= 0;π while (X > 0) and (Y = 0) do beginπ Temp:= ParamStr(X);π if (Temp[1] = '/') or (Temp[1] = '-') thenπ if UpCase(Temp[2]) = UpString(S) then Y:= X;π Dec(X);π end;π SwitchNum:= Y;π End;πππ Function SwitchThere(S : String) : Boolean;π Beginπ SwitchThere:= not (SwitchNum(S) = 0);π End;πππ Function SwitchData(S : String) : String;π Varπ Temp : String;π Beginπ If SwitchNum(S) > 0 then beginπ Temp:= ParamStr(SwitchNum(S));π Delete(Temp, 1, 2);π endπ else Temp:= '';π SwitchData:= Temp;π End;πππ Function Parameter(N : Byte) : String;π Varπ X,π Count : Byte;π Parm,π Temp : String;π Beginπ X:= 0;π Count:= 0;π Parm:= '';π If ParamCount > 0 then repeatπ Inc(X);π Temp:= ParamStr(X);π If (Temp[1] = '"') or (Temp[1] = '''') then beginπ Parm:= Temp;π If X < ParamCount then repeatπ Inc(X);π Parm:= Parm + ' ' + ParamStr(X);π until (Parm[Length(Parm)] = '"') orπ (Parm[Length(Parm)] = '''') or (X = ParamCount);π Inc(Count);π endπ else if (Temp[1] <> '/') and (Temp[1] <> '-')π then beginπ Inc(Count);π Parm:= Temp;π end;π until (X = ParamCount) or (Count = N);π If Count = N then Parameter:= Parmπ else Parameter:= '';π End;πππ Function Parameters : Byte;π Varπ X : Byte;π Beginπ X:= 0;π If ParamCount > 0 then beginπ Repeatπ Inc(X)π Until Parameter(X) = '';π Parameters:= X - 1;π endπ else Parameters:= 0;π End;ππ{π For example, the command line:ππTESTPRG /C INPUT.DAT /X67 "first one"ππ Parameters returns 2π Parameter(1) returns INPUT.DATπ Parameter(2) returns "first oneπ SwitchThere('F') returns falseπ SwitchData('X') returns 67ππ Notice that in quoted parameters, the first quote is returned- this allowsπyou to check for " vs. ', which you could use as the difference between caseπsensitive and non-case-sensitive. A simple Delete(S,1,1) can remove it fromπthe string for use. }π 7 08-24-9413:49ALL FRANK DIACHEYSN Command Paramaters SWAG9408 éε 7 D {π Coded By Frank Diacheysn Of Gemini Softwareππ FUNCTION PARAMETERSππ Input......: Noneπ :π :π :π :ππ Output.....: Command Line Used To Execute The Current Programπ :π :π :π :ππ Example....: IF POS('/F',PARAMETERS) THENπ : WriteLn('/Full Option Enabled.')π : ELSEπ : WriteLn('/Full Option Disabled.');π :ππ Description: Function To Return The Entire Command Line That Was Used Toπ : Execute The Current Programπ :π :π :ππ}πFUNCTION PARAMETERS : STRING;πBEGINπ PARAMETERS := STRING( PTR( PREFIXSEG, $0080 )^ );πEND;π 8 08-24-9413:59ALL MARK OUELLET Parsing A String SWAG9408 ?≥▓σ 26 D {πRN> I have a routine in one of my programs that that reads a delimited stringπRN> from a configuration file, the string is defined such as: ~040~055~099~144πRN> etc. (these are message base area numbers)ππRN> In the program a check is done to see if the current area exists or doesπRN> not exist in the list via a simple Pos() function.ππRN> Works great! but.......ππRN> I have been asked to include the capabilty to include a RANGE of numbers inπRN> this list, this being due to the 255 char limit of a normal string.πππRN> So lets assume the list above will look like this:ππRN> ~040~055~060-080~099~144ππRN> How can I pull out the 060-080 and include all numbers between into theπRN> list or actually, do a check, possibly creating a Set?ππRN> OR would I have to create another function/configuration item to do this?ππRN> I hope my explanation of what I wish to accomplish can be understood. <g>ππRN> All replies are very welcomed!!ππTry this, the code is ugly but it works!π{Written, Tested and Compiled with BP 7.x}ππuses crt;ππtypeπ Str3 = string[3];πvarπ Area, RangeLo, RangeHi : str3;π List : String;ππfunction Found(List:string;Area:str3):boolean;πbeginπ if Pos(Area, List)>0 then beginπ Found := true;π end else beginπ {π Area not found yet, are there ranges??π }π if Pos('-', List)>0 then beginπ {π Yes! Process rangesπ }π while Pos('-', List) > 0 do beginπ RangeLo := Copy(List, Pos('-', List)-3, 3);π {π Area must be BETWEEN Lo and hi otherwise it would haveπ been found by the first POS check. So if RangeLo is > Areaπ No need to lose time extracting RangeHiπ }π if RangeLo<Area then beginπ RangeHi := Copy(List, Pos('-', List)+1, 3);π if RangeHi > Area then beginπ {π Lo < Area < hi, We found a Matchπ }π Found := true;π {π Kill list to exit while-loopπ }π List := '';π end else beginπ {π Kill this range's DASH, POS only reports the first matchπ }π Delete(List, Pos('-', List), 1);π end;π end else beginπ {π Kill this range's DASH, POS only reports the first matchπ }π Delete(List, Pos('-', List), 1);π end;π end;π {π Only two possibilities when we get hereπ 1- List = '' which means a match was found and list wasπ cleared to exit the while-loop.π 2- No match was found, in which case List is non-empty.π }π if List<>'' thenπ Found := false;π end else beginπ Found := false;π end;π end;πend;ππvarπ X : byte;ππbeginπ List := '~012~020~033~060-079~081~090~095-123~';π clrscr;π for X := 0 to 255 do beginπ Area := chr(48 + (X div 100)) +π chr(48 + ((X mod 100) div 10)) +π chr(48 + ((X mod 10)));π writeln(Area, ' ', List, ' ', Found(List, Area));π if (not boolean(x mod 24)) and (x>0) then beginπ while not keypressed do;π while keypressed do readkey;π clrscr;π end;π end;πend.ππ