leex — parsetools v2.6 (2024)

Lexical analyzer generator for Erlang

A regular expression based lexical analyzer generator for Erlang, similar tolex or flex.

Note

The leex module was considered experimental when it was introduced.

Default Leex Options

The (host operating system) environment variable ERL_COMPILER_OPTIONS can beused to give default Leex options. Its value must be a valid Erlang term. If thevalue is a list, it is used as is. If it is not a list, it is put into a list.

The list is appended to any options given to file/2.

The list can be retrieved with compile:env_compiler_options/0.

Input File Format

Erlang style comments starting with a % are allowed in scanner files. Adefinition file has the following format:

<Header>Definitions.<Macro Definitions>Rules.<Token Rules>Erlang code.<Erlang code>

The Definitions., Rules., and Erlang code headings are mandatoryand must start at the beginning of a source line. The <Header>,<Macro Definitions>, and <Erlang code> sections are allowed to beempty, but there must be at least one rule.

Macro definitions have the following format:

NAME = VALUE

and there must be spaces around =. Macros can be used in the regularexpressions of rules by writing {NAME}.

Note

When macros are expanded in expressions, the macro calls are replaced by themacro value without any form of quoting or enclosing in parentheses.

Rules have the following format:

<Regexp> : <Erlang code>.

The <Regexp> must occur at the start of a line and not include any blanks; use\t and \s to include TAB and SPACE characters in the regular expression. If<Regexp> matches then the corresponding <Erlang code> is evaluated to generate atoken. With the Erlang code the following predefined variables are available:

  • TokenChars - A list of the characters in the matched token.

  • TokenLen - The number of characters in the matched token.

  • TokenLine - The line number where the token occurred.

  • TokenCol - The column number where the token occurred (column of thefirst character included in the token).

  • TokenLoc - Token location. Expands to {TokenLine,TokenCol} (even whenerror_location is set to line).

The code must return:

  • {token,Token} - Return Token to the caller.

  • {end_token,Token} - Return Token and is last token in a tokens call.

  • skip_token - Skip this token completely.

  • {error,ErrString} - An error in the token, ErrString is a stringdescribing the error.

It is also possible to push back characters into the input characters with thefollowing returns:

  • {token,Token,PushBackList}
  • {end_token,Token,PushBackList}
  • {skip_token,PushBackList}

These have the same meanings as the normal returns but the characters inPushBackList will be prepended to the input characters and scanned for thenext token. Note that pushing back a newline will mean the line numbering willno longer be correct.

Note

Pushing back characters gives you unexpected possibilities to cause thescanner to loop!

The following example would match a simple Erlang integer or float and return atoken which could be sent to the Erlang parser:

D = [0-9]{D}+ : {token,{integer,TokenLine,list_to_integer(TokenChars)}}.{D}+\.{D}+((E|e)(\+|\-)?{D}+)? : {token,{float,TokenLine,list_to_float(TokenChars)}}.

The Erlang code in the Erlang code. section is written into the output filedirectly after the module declaration and predefined exports declaration, makingit possible to add extra exports, define imports, and other attributes, which arevisible in the whole file.

Regular Expressions

The regular expressions allowed here is a subset of the set found in egrep andin the AWK programming language, as defined in the book The AWK ProgrammingLanguage by A. V. Aho, B. W. Kernighan, and P. J. Weinberger. They are composed ofthe following characters:

  • c - Matches the non-metacharacter c.

  • \c - Matches the escape sequence or literal character c.

  • . - Matches any character.

  • ^ - Matches the beginning of a string.

  • $ - Matches the end of a string.

  • [abc...] - Character class, which matches any of the charactersabc.... Character ranges are specified by a pair of characters separated bya -.

  • [^abc...] - Negated character class, which matches any character exceptabc....

  • r1 | r2 - Alternation. It matches either r1 or r2.

  • r1r2 - Concatenation. It matches r1 and then r2.

  • r+ - Matches one or more rs.

  • r* - Matches zero or more rs.

  • r? - Matches zero or one rs.

  • (r) - Grouping. It matches r.

The escape sequences allowed are the same as for Erlang strings:

The following examples define simplified versions of a few Erlang data types:

Atoms [a-z][0-9a-zA-Z_]*Variables [A-Z_][0-9a-zA-Z_]*Floats (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?

Note

Anchoring a regular expression with ^ and $ is not implemented in thecurrent version of leex and generates a parse error.

Types

error_info()

The standard error_info/0 structure that is returned from all I/O modules.ErrorDescriptor is formattable by format_error/1.

error_ret()

errors()

leex_ret()

ok_ret()

warnings()

Generated Scanner Exports

string(String)

Equivalent to string(String, 1).

string(String, StartLoc)

Scans String and returns either all the tokens in it or an error tuple.

token(Cont, Chars)

Equivalent to token(Cont, Chars, 1).

token(Cont, Chars, StartLoc)

This is a re-entrant call to try and scan a single token from Chars.

tokens(Cont, Chars)

Equivalent to tokens(Cont, Chars, 1).

tokens(Cont, Chars, StartLoc)

This is a re-entrant call to try and scan tokens from Chars.

Functions

file(FileName)

Equivalent to file(File, []).

file(FileName, Options)

Generates a lexical analyzer from the definition in the input file.

format_error(ErrorDescriptor)

Returns a descriptive string in English of an error reason ErrorDescriptorreturned by leex:file/1,2 when there is an error in a regularexpression.

Link to this type

View Source (not exported)

-type error_info() :: {erl_anno:line() | none, module(), ErrorDescriptor :: term()}.

The standard error_info/0 structure that is returned from all I/O modules.ErrorDescriptor is formattable by format_error/1.

Link to this type

View Source (not exported)

-type error_ret() :: error | {error, Errors :: errors(), Warnings :: warnings()}.

Link to this type

View Source (not exported)

-type errors() :: [{file:filename(), [error_info()]}].

Link to this type

View Source (not exported)

-type leex_ret() :: ok_ret() | error_ret().

Link to this type

View Source (not exported)

-type ok_ret() :: {ok, Scannerfile :: file:filename()} | {ok, Scannerfile :: file:filename(), warnings()}.

Link to this type

View Source (not exported)

-type warnings() :: [{file:filename(), [error_info()]}].

Link to this function

View Source

-spec string(String) -> StringRet when String :: string(), StringRet :: {ok, Tokens, EndLoc} | ErrorInfo, Tokens :: [Token], Token :: term(), ErrorInfo :: {error, error_info(), erl_anno:location()}, EndLoc :: erl_anno:location().

Equivalent to string(String, 1).

Link to this function

View Source

-spec string(String, StartLoc) -> StringRet when String :: string(), StringRet :: {ok, Tokens, EndLoc} | ErrorInfo, Tokens :: [Token], Token :: term(), ErrorInfo :: {error, error_info(), erl_anno:location()}, StartLoc :: erl_anno:location(), EndLoc :: erl_anno:location().

Scans String and returns either all the tokens in it or an error tuple.

StartLoc and EndLoc are either erl_anno:line()or erl_anno:location(), depending on theerror_location option.

Note

It is an error if not all of the characters in String are consumed.

Link to this function

View Source

-spec token(Cont, Chars) -> {more, Cont1} | {done, TokenRet, RestChars} when Cont :: [] | Cont1, Cont1 :: tuple(), Chars :: string() | eof, RestChars :: string() | eof, TokenRet :: {ok, Token, EndLoc} | {eof, EndLoc} | ErrorInfo, ErrorInfo :: {error, error_info(), erl_anno:location()}, Token :: term(), EndLoc :: erl_anno:location().

Equivalent to token(Cont, Chars, 1).

Link to this function

View Source

-spec token(Cont, Chars, StartLoc) -> {more, Cont1} | {done, TokenRet, RestChars} when Cont :: [] | Cont1, Cont1 :: tuple(), Chars :: string() | eof, RestChars :: string() | eof, TokenRet :: {ok, Token, EndLoc} | {eof, EndLoc} | ErrorInfo, ErrorInfo :: {error, error_info(), erl_anno:location()}, Token :: term(), StartLoc :: erl_anno:location(), EndLoc :: erl_anno:location().

This is a re-entrant call to try and scan a single token from Chars.

If there are enough characters in Chars to either scan a token ordetect an error then this will be returned with{done,...}. Otherwise {cont,Cont} will be returned where Cont isused in the next call to token() with more characters to try an scanthe token. This is continued until a token has been scanned. Cont isinitially [].

It is not designed to be called directly by an application, but isused through the I/O system where it can typically be called in anapplication by:

io:request(InFile, {get_until,unicode,Prompt,Module,token,[Loc]}) -> TokenRet

Link to this function

View Source

-spec tokens(Cont, Chars) -> {more, Cont1} | {done, TokensRet, RestChars} when Cont :: [] | Cont1, Cont1 :: tuple(), Chars :: string() | eof, RestChars :: string() | eof, TokensRet :: {ok, Tokens, EndLoc} | {eof, EndLoc} | ErrorInfo, Tokens :: [Token], Token :: term(), ErrorInfo :: {error, error_info(), erl_anno:location()}, EndLoc :: erl_anno:location().

Equivalent to tokens(Cont, Chars, 1).

Link to this function

View Source

-spec tokens(Cont, Chars, StartLoc) -> {more, Cont1} | {done, TokensRet, RestChars} when Cont :: [] | Cont1, Cont1 :: tuple(), Chars :: string() | eof, RestChars :: string() | eof, TokensRet :: {ok, Tokens, EndLoc} | {eof, EndLoc} | ErrorInfo, Tokens :: [Token], Token :: term(), ErrorInfo :: {error, error_info(), erl_anno:location()}, StartLoc :: erl_anno:location(), EndLoc :: erl_anno:location().

This is a re-entrant call to try and scan tokens from Chars.

If there are enough characters in Chars to either scan tokens ordetect an error then this will be returned with{done,...}. Otherwise {cont,Cont} will be returned where Cont isused in the next call to tokens() with more characters to try anscan the tokens. This is continued until all tokens have beenscanned. Cont is initially [].

This functions differs from token in that it will continue to scan tokens upto and including an {end_token,Token} has been scanned (see next section). Itwill then return all the tokens. This is typically used for scanning grammarslike Erlang where there is an explicit end token, '.'. If no end token isfound then the whole file will be scanned and returned. If an error occurs thenall tokens up to and including the next end token will be skipped.

It is not designed to be called directly by an application, but used through theI/O system where it can typically be called in an application by:

io:request(InFile, {get_until,unicode,Prompt,Module,tokens,[Loc]}) -> TokensRet

Link to this function

View Source

-spec file(FileName) -> leex_ret() when FileName :: file:filename().

Equivalent to file(File, []).

Link to this function

View Source (since OTP R16B02)

-spec file(FileName, Options) -> leex_ret() when FileName :: file:filename(), Options :: Option | [Option], Option :: {dfa_graph, boolean()} | {includefile, Includefile :: file:filename()} | {report_errors, boolean()} | {report_warnings, boolean()} | {report, boolean()} | {return_errors, boolean()} | {return_warnings, boolean()} | {return, boolean()} | {scannerfile, Scannerfile :: file:filename()} | {verbose, boolean()} | {warnings_as_errors, boolean()} | {deterministic, boolean()} | {error_location, line | column} | {tab_size, pos_integer()} | dfa_graph | report_errors | report_warnings | report | return_errors | return_warnings | return | verbose | warnings_as_errors.

Generates a lexical analyzer from the definition in the input file.

The input file has the extension .xrl. This is added to the filenameif it is not given. The resulting module is the Xrl filename withoutthe .xrl extension.

The current options are:

  • dfa_graph - Generates a .dot file which contains a description of theDFA in a format which can be viewed with Graphviz, www.graphviz.com.

  • {includefile,Includefile} - Uses a specific or customised prologue fileinstead of default lib/parsetools/include/leexinc.hrl which is otherwiseincluded.

  • {report_errors, boolean()} - Causes errors to be printed as they occur.Default is true.

  • {report_warnings, boolean()} - Causes warnings to be printed as theyoccur. Default is true.

  • {report, boolean()} - This is a short form for both report_errors andreport_warnings.

  • {return_errors, boolean()} - If this flag is set,{error, Errors, Warnings} is returned when there are errors. Default isfalse.

  • {return_warnings, boolean()} - If this flag is set, an extra fieldcontaining Warnings is added to the tuple returned upon success. Default isfalse.

  • {return, boolean()} - This is a short form for both return_errors andreturn_warnings.

  • {scannerfile, Scannerfile} - Scannerfile is the name of the file thatwill contain the Erlang scanner code that is generated. The default ("") isto add the extension .erl to FileName stripped of the .xrl extension.

  • {verbose, boolean()} - Outputs information from parsing the input fileand generating the internal tables.

  • {warnings_as_errors, boolean()} - Causes warnings to be treated aserrors.

  • {deterministic, boolean()} - Causes generated -file() attributes to onlyinclude the basename of the file path.

  • {error_location, line | column} - If set to column, error locationwill be {Line,Column} tuple instead of just Line. Also, StartLoc andEndLoc in string/2, token/3, andtokens/3 functions will be {Line,Column} tuple instead ofjust Line. Default is line. Note that you can use TokenLoc for tokenlocation independently, even if the error_location is set to line.

    Unicode characters are counted as many columns as they use bytes to represent.

  • {tab_size, pos_integer()} - Sets the width of \t character (onlyrelevant if error_location is set to column). Default is 8.

Any of the Boolean options can be set to true by stating the name of theoption. For example, verbose is equivalent to {verbose, true}.

Leex will add the extension .hrl to the Includefile name and the extension.erl to the Scannerfile name, unless the extension is already there.

Link to this function

View Source

-spec format_error(ErrorDescriptor) -> io_lib:chars() when ErrorDescriptor :: term().

Returns a descriptive string in English of an error reason ErrorDescriptorreturned by leex:file/1,2 when there is an error in a regularexpression.

leex — parsetools v2.6 (2024)
Top Articles
Nuclear Safety Management
QMSR: The end of 21 CFR part 820?
Rubratings Tampa
Koordinaten w43/b14 mit Umrechner in alle Koordinatensysteme
Women's Beauty Parlour Near Me
Craigslist - Pets for Sale or Adoption in Zeeland, MI
Umn Pay Calendar
Iron Drop Cafe
[PDF] INFORMATION BROCHURE - Free Download PDF
Calmspirits Clapper
A rough Sunday for some of the NFL's best teams in 2023 led to the three biggest upsets: Analysis - NFL
Conan Exiles Colored Crystal
Munich residents spend the most online for food
Wausau Obits Legacy
Edicts Of The Prime Designate
50 Shades Of Grey Movie 123Movies
Gia_Divine
Mahpeople Com Login
CVS Near Me | Columbus, NE
Best Transmission Service Margate
Craigslist St. Cloud Minnesota
Ou Class Nav
Klsports Complex Belmont Photos
Pensacola Tattoo Studio 2 Reviews
FAQ's - KidCheck
Trinket Of Advanced Weaponry
R/Mp5
The Monitor Recent Obituaries: All Of The Monitor's Recent Obituaries
Craig Woolard Net Worth
Devargasfuneral
"Pure Onyx" by xxoom from Patreon | Kemono
The Menu Showtimes Near Amc Classic Pekin 14
Old Peterbilt For Sale Craigslist
Cruise Ships Archives
آدرس جدید بند موویز
AP Microeconomics Score Calculator for 2023
Vanessa West Tripod Jeffrey Dahmer
The Boogeyman Showtimes Near Surf Cinemas
Ticketmaster Lion King Chicago
What Does Code 898 Mean On Irs Transcript
Encompass.myisolved
Anguilla Forum Tripadvisor
Academy Sports New Bern Nc Coupons
Vons Credit Union Routing Number
Unitedhealthcare Community Plan Eye Doctors
Walmart 24 Hrs Pharmacy
Ehc Workspace Login
Plumfund Reviews
San Diego Padres Box Scores
Puss In Boots: The Last Wish Showtimes Near Valdosta Cinemas
Buildapc Deals
Supervisor-Managing Your Teams Risk – 3455 questions with correct answers
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated:

Views: 6075

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.