Files
bds.mr.dpg/doc/usersguide/src/lang/lang-sect.tex
T
2026-01-03 18:31:15 +01:00

250 lines
7.5 KiB
TeX
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
\section{Sections}
\subsection{unit}
The unit section specifies the unit name of the generated source file.
The syntax is identical to Object Pascal.
\subsection{uses}
The §uses{...}§ section is used to specify the units which must be
included in the interface's uses clause of the generated pascal
unit. Every unit name must be terminated by a semicolon. Repeated
units are included only once.
\begin{verbatim}
uses
{
Classes;
Windows;
}
\end{verbatim}
\subsection{const}
The §const{...}§ section is used to specify items that appear in
the interface's const clause of the generated pascal unit. The
content of this section is copied verbatim into the unit.
\begin{verbatim}
const
{
const1 = 12;
const2 = FOO;
}
\end{verbatim}
\subsection{type}
The §type{...}§ section is used to specify items that appear in
the interface's type clause of the generated pascal unit. The
content of this section is copied verbatim into the unit.
\begin{verbatim}
type
{
TmyType1 = integer;
TmyType2 = array [0..16] of TmyType1;
}
\end{verbatim}
\subsection{options}
The §options{...}§ section contains options for a given grammar
element. Options can be defined for lexer/parser classes, rules
and subrules.
\subsection{tokens}
If you need to define an ``imaginary'' token (i.e. one that has no
corresponding real input symbol) use the §tokens{...}§ section to
define them. You can also define literals in this section.
\begin{verbatim}
tokens
{
"procedure";
"function";
INTEGER;
}
\end{verbatim}
Strings defined in this way are treated just as if you had referenced them in
the parser. The formal syntax is:
\begin{verbatim}
tokenSpecification
: "tokens"
LCURLY
(tokenItem SEMI)*
RCURLY
;
tokenItem
: TOKEN
| STRING
;
\end{verbatim}
The §tokens{...}§ section is only valid in lexer grammars.
\subsection{memberdecl}
The §memberdecl{...}§ section contains additional member
declarations for the grammar class. It allows the expansion of the
grammar class with user defined members, so it is not necessary to
derive new classes from the generated class to implement
additional functionality. The content of this section is copied
verbatim into the class declaration of the generated grammar
class.
\begin{verbatim}
memberdecl
{
procedure proc1;
procedure proc2;
}
\end{verbatim}
\subsection{memberdef}
The §memberdef{...}§ section contains the implementation of the
classes' additional functionality. The content of this section is
copied verbatim into the implementation part of the generated
unit. This section may also contain the initialization and
finalization clauses.
\begin{verbatim}
memberdef
{
procedure TmyClass.proc1;
begin
...
end;
procedure TmyClass.proc2;
begin
...
end;
}
\end{verbatim}
\subsection{parser}
Parser rules must be associated with a parser class. Each parser
class specification precedes the options, and rule definitions of
the parser. Grammar files §.g§ can hold only one class definition.
A parser specification in a grammar file looks like:
\begin{verbatim}
unit myParser;
uses... // optional uses {...} section
const... // optional const {...} section
type... // optional type {...} section
parser TmyParser;
options... // optional options {...} section
memberdecl... // optional memberdecl {...} section
parser rules...
memberdef... // optional memberdef {...} section
\end{verbatim}
In the generated code, the parser class results in an Object
Pascal class, and the rules become member methods of the class.
Note, that the content of the §memberdecl{...}§ section is copied
verbatim into the class declaration part of the generated parser
class while the content of the §memberdef{...}§ section is copied
after the implementation of the member rules, so the
initialization and finalization clauses of a pascal unit can be
placed in the §memberdef{...}§ section.
\subsection{lexer}
To perform lexical analysis, you need to specify a lexer class that describes
how to break up the input character stream into a stream of tokens. The syntax
is similar to that of a parser class:
\begin{verbatim}
unit myLexer;
uses... // optional uses {...} section
const... // optional const {...} section
type... // optional type {...} section
lexer TmyLexer;
options... // optional options {...} section
tokens... // optional tokens {...} section
memberdecl... // optional memberdecl {...} section
lexer rules...
memberdef... // optional memberdef {...} section
\end{verbatim}
Lexical rules contained within a lexer class become member methods in the
generated class. A lexer grammar may have a §tokens{...}§ section to specify
imaginary tokens and string literals.
\subsection{rule definitions}
The structure of an input stream of atoms is specified by a set of
mutually-referenced rules. Each rule has a name and any of the
following optional attributes: a scope specifier; a set of
arguments; an init-action; a return value; local variable
definitions; an exception handler and an alternative or
alternatives. Each alternative contains a series of elements that
specify what to match and where. Scope can be specified by
private, protected, or public keywords. A rule has public scope by
default. The basic form of a rule is:
\begin{verbatim}
(scope) rulename
: alternative_1
| alternative_2
...
| alternative_n
;
\end{verbatim}
Parameters for a rule can be specified in the following form:
\begin{verbatim}
rulename [formal parameters] : ... ;
\end{verbatim}
If the rule returns a value, its type can be defined with the
returns keyword:
\begin{verbatim}
rulename returns [typename] : ... ;
\end{verbatim}
where §typename§ is a valid Object Pascal type specifier.
Local variables for a rule can be defined in the §local{...}§ section:
\begin{verbatim}
rule
local
{
foo: integer;
bar: string;
}
\end{verbatim}
Init-actions are specified before the colon. Init-actions differ from normal
actions because they are always executed regardless of guess mode.
\begin{verbatim}
rule
{
init-action
}
: ... ;
\end{verbatim}
\paragraph{Parser rules} apply structure to a stream of tokens, whereas
lexer rules apply structure to a stream of characters. Parser
rules, therefore, must not reference cha\-rac\-ter literals.
Double-quoted strings in parser rules are considered to be token
references. Note: all parser rules must begin with a lowercase
letter.
\paragraph{Lexer rules} defined within a lexer grammar must have a name beginning
with an uppercase letter. These rules implicitly match
cha\-rac\-ters on the input stream instead of tokens on the token
stream. Referenced grammar elements include token references
(implicit lexer rule references), cha\-rac\-ters and strings.
Lexer rules are processed in the same manner as parser rules, and
may also specify arguments and return values. A scope specifier
for a lexer rule has special meaning in lexer grammars. In the
generated Object Pascal unit, the lexer class has a §nextToken§
function which is the interface between the lexer and the parser.
This function is synthesized from the public lexer rules. It means
that non-public lexer rules don't modify the prediction logic of
the lexer. They are usually helper rules. If the lexer grammar has
no public rule at all, the §nextToken§ function returns EOF to the
parser.