From Free Pascal wiki
Revision as of 14:16, 24 May 2020 by E-ric (talk | contribs) (Created page with "{{Scanner/Tokenizer}} Retour au contenu FPC internals = Scanner/Tokenizer = The scanner and tokenizer is used to construct an input stream of tokens wh...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

English (en) français (fr)

Retour au contenu FPC internals


The scanner and tokenizer is used to construct an input stream of tokens which will be fed to the parser. It is in this stage that the preprocessing is done, that all read compiler directives change the internal state variables of the compiler, and that all illegal characters found in the input stream cause an error.

Info about how macros work: Macro internals


The general architecture of the scanner is shown in the following figure:

Several types can be read from the input stream, a string, handled by readstring, a numeric value, handled by readnumeric, comments , compiler and preprocessor directives.

Flux d'entrée

(last updated for fpc version 1.0.x)

The input data is handled via the standard way of handling all the I/O in the compiler. That is to say, that it is a hook which can be overriden in comphook.pas (do_openinputfile), in case where another I/O method wants to be used.

The default hook uses a non-buffered dos stream contained in files.pas


(last updated for fpc version 1.0.x)

The scanner resolves all preprocessor directives and only gives to the parser the visible parts of the code (such as those which are included in conditional compilation). Compiler switches and directives are also saved in global variables while in the preprocessor, therefore this is part is completely independent of the parser.

Compilation conditionnelle (, scanner.pas)

(last updated for fpc version 1.0.x)

The conditional compilation is handled via a preprocessor stack, where each directive is pushed on a stack, and popped when it is resolved. The actual implementation of the stack is a linked list of preprocessor directive items.

Commutateurs du compiler (, switches.pas)

(last updated for fpc version 1.0.x)

The compiler switches are handled via a lookup table which is linearly searched. Then another lookup table takes care of setting the appropriate bit flags and variables in the switches for this compilation process.

Interface du scanner

(last updated for fpc version 1.0.x)

The parser only receives tokens as its input, where a token is a enumeration which indicates the type of the token, either a reserved word, a special character, an operator, a numeric constant, string, or an identifier.

Resolution of the string into a token is done via lookup which searches the string table to find the equivalent token. This search is done using a binary search algorithm through the string table.

In the case of identifiers, constants (including numeric values), the value is returned in the pattern string variable , with the appropriate return value of the token (numeric values are also returned as non-converted strings, with any special prefix included). In the case of operators, and reserved words, only the token itself must be assumed to be preserved. The read input string is assmued to be lost.

Therefore the interface with the parser is with the readtoken() routine and the pattern variable.



Declaration: procedure ReadToken;
Description: Sets the global variable token to the current token read, and sets the pattern variable appropriately (if required).



Declaration: var Token: TToken;
Description: Contains the contain token which was last read by a call to ReadToken
See also: ReadToken


Declaration: var Pattern: String;
Description: Contains the string of the last pattern read by a call to ReadToken
See also: ReadToken

Interface du parseur d'assembleur

(last updated for fpc version 1.0.x)

The inline assembler parser is completely separate from the pascal parser, therefore its scanning process is also completely independent. The scanner only takes care of the preprocessor part and comments, all the rest is passed character per character to the assembler parser via the AsmGetChar() scanner routine.



Declaration: function AsmGetChar: Char;
Description: Returns the next character in the input stream.

Prochain chapitre: L'arbre d'analyse