Scanner/Tokenizer
│
English (en) │
français (fr) │
back to contents FPC internals
Scanner/Tokenizer
The scanner and tokenizer is used to construct an input stream of tokens which will be fed to the parser. It is in this stage that the preprocessing is done, that all read compiler directives change the internal state variables of the compiler, and that all illegal characters found in the input stream cause an error.
Info about how macros work: Macro internals
Architecture
The general architecture of the scanner is shown in the following figure:
http://www.pjh2.de/fpc/CompilerInternalsFigure02.png
Several types can be read from the input stream, a string, handled by readstring, a numeric value, handled by readnumeric, comments , compiler and preprocessor directives.
Input stream
(last updated for fpc version 1.0.x)
The input data is handled via the standard way of handling all the I/O in the compiler. That is to say, that it is a hook which can be overriden in comphook.pas (do_openinputfile), in case where another I/O method wants to be used.
The default hook uses a non-buffered dos stream contained in files.pas
Preprocessor
(last updated for fpc version 1.0.x)
The scanner resolves all preprocessor directives and only gives to the parser the visible parts of the code (such as those which are included in conditional compilation). Compiler switches and directives are also saved in global variables while in the preprocessor, therefore this is part is completely independent of the parser.
Conditional compilation (scandir.inc, scanner.pas)
(last updated for fpc version 1.0.x)
The conditional compilation is handled via a preprocessor stack, where each directive is pushed on a stack, and popped when it is resolved. The actual implementation of the stack is a linked list of preprocessor directive items.
Compiler switches (scandir.inc, switches.pas)
(last updated for fpc version 1.0.x)
The compiler switches are handled via a lookup table which is linearly searched. Then another lookup table takes care of setting the appropriate bit flags and variables in the switches for this compilation process.
Scanner interface
(last updated for fpc version 1.0.x)
The parser only receives tokens as its input, where a token is a enumeration which indicates the type of the token, either a reserved word, a special character, an operator, a numeric constant, string, or an identifier.
Resolution of the string into a token is done via lookup which searches the string table to find the equivalent token. This search is done using a binary search algorithm through the string table.
In the case of identifiers, constants (including numeric values), the value is returned in the pattern string variable , with the appropriate return value of the token (numeric values are also returned as non-converted strings, with any special prefix included). In the case of operators, and reserved words, only the token itself must be assumed to be preserved. The read input string is assmued to be lost.
Therefore the interface with the parser is with the readtoken() routine and the pattern variable.
Routines
ReadToken
Declaration: | procedure ReadToken; |
Description: | Sets the global variable token to the current token read, and sets the pattern variable appropriately (if required). |
Variables
Token
Declaration: | var Token: TToken; |
Description: | Contains the contain token which was last read by a call to ReadToken |
See also: | ReadToken |
Pattern
Declaration: | var Pattern: String; |
Description: | Contains the string of the last pattern read by a call to ReadToken |
See also: | ReadToken |
Assembler parser interface
(last updated for fpc version 1.0.x)
The inline assembler parser is completely separate from the pascal parser, therefore its scanning process is also completely independent. The scanner only takes care of the preprocessor part and comments, all the rest is passed character per character to the assembler parser via the AsmGetChar() scanner routine.
Routines
AsmGetChar
Declaration: | function AsmGetChar: Char; |
Description: | Returns the next character in the input stream. |
Next chapter: The parse tree