| Asm::Preproc::Lexer - Lexer generator |
Asm::Preproc::Lexer - Lexer generator
use Asm::Preproc::Lexer; my @tokens = ( BLANKS => qr/\s+/, sub {()}, COMMENT => [qr/\/\*/, qr/\*\//], undef, QSTR => [qr/'/], sub { my($type, $value) = @_; [$type, substr($value, 1, length($value)-2)] }, QQSTR => [qr/"/, qr/"/], NUM => qr/\d+/, ID => qr/[a-z]+/, sub { my($type, $value) = @_; [$type, $value] }, SYM => qr/(.)/, sub { [$1, $1] }, ); my $lex = Asm::Preproc::Lexer->new(@tokens); my $lex2 = $lex->clone; $lex->from(sub {}); # read Asm::Preproc::Line from iterator $lex->from(@lines); # read Asm::Preproc::Line from list my $token = $lex->get; # isa Asm::Preproc::Token
This module creates a tokenizer based on the specification given to the
new constructor.
The tokenizer reads Asm::Preproc::Line objects and
splits them in Asm::Preproc::Token objects on each
get call. get returns undef on end of input.
Creates a new tokenizer object for the given token specification. Each token is specified by the following elements:
String to identify the token type, unused if the token is discarded (see
BLANKS and COMMENT above).
One of:
A single regular expression to match the token at the current input position.
A list of one regular expression, to match delimited tokens that use the
same delimiter for the start and the end.
The token can span multiple lines.
See see QSTR above for an example for multi-line single-quoted strings.
A list of two regular expressions, to match the start
of the token at the current input position, and the end of the token.
The token can span multiple lines.
See see COMMENT above for an example for multi-line comments.
The regular expression is matched where the previous match finished,
and each sub-expression cannot span multiple lines.
Parentheses may be used to capture sub-expressions in $1, $2, etc.
It is considered an error, and the tokeninzer dies with an error message
when reading input, if some input cannot be recognized by any of the
given regexp espressions. Therefore the SYM token above contains the
catch-all expression qr/(.)/.
The optional code reference is a transform subroutine. It receives
the type and value of the recognized token, and returns one of:
An array ref with two elements [$type, $value],
the new type and value to be
returned in the Asm::Preproc::Token object.
An empty array () to signal that this token shall be dicarded.
As an optimization, the transform subroutine code reference may be
set to undef, to signal that the token will be dicarded
and there is no use in accumulating it while matching.
This is usefull to discard comments upfront, instead of
collecting the whole comment, and then pass it to the transform subroutine
just to be discarded afterwards.
See see COMMENT above for an example of usage.
Creates a copy of this tokenizer object without compiling a new lexing subroutine. The copied object has all pending input cleared.
Asm::Preproc::Stream object from which new lines to process are read.
Inserts the given input at the head of the input queue to the tokenizer. The input is either a list of Asm::Preproc::Line objects, or an interator function that returns a Asm::Preproc::Line object on each call.
The input list and interator can also return plain scalar strings, that are converted to Asm::Preproc::Line on the fly, but the information on input file location for error messages will not be available.
The new inserted input is processed before continuing with whatever was already in the queue.
Retrieves the next token from the input strean as a Asm::Preproc::Token object.
Returns undef on end of input.
Dies with an error message indicating the location in the input if the source does not match any of the tokens.
Returns a Asm::Preproc::Stream object that will
return the result of get on each call.
| Asm::Preproc::Lexer - Lexer generator |