FSharpCompiler.Lex 1.0.6

Suggested Alternatives

FslexFsyacc

Additional Details

Replace this package with the FslexFsyacc package, which implements all the features of this package and is more friendly.

There is a newer version of this package available.
See the version list below for details.
dotnet add package FSharpCompiler.Lex --version 1.0.6                
NuGet\Install-Package FSharpCompiler.Lex -Version 1.0.6                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="FSharpCompiler.Lex" Version="1.0.6" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add FSharpCompiler.Lex --version 1.0.6                
#r "nuget: FSharpCompiler.Lex, 1.0.6"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install FSharpCompiler.Lex as a Cake Addin
#addin nuget:?package=FSharpCompiler.Lex&version=1.0.6

// Install FSharpCompiler.Lex as a Cake Tool
#tool nuget:?package=FSharpCompiler.Lex&version=1.0.6                

词法分析器的生成程序。用法

Lex.generateDFA: input:string -> DFA<string>

输入的文本是用正则表达式表达的分组模式。函数返回编译后的确定有穷自动机DFA。DFA将用于构造词法分析器。DFA的类型定义如下:

type DFA<'tag when 'tag: comparison> =
    {
        transitions:Set<uint32*'tag*uint32>
        finalLexemes:(Set<uint32>*Set<uint32>) list
    }

'tag是表示输入词法符记的分类数据,经常是可区分联合的标签,是一个字符串。成员transitions是DFA的转换表穷尽,表示当看到一个标签时,从一个状态转换成另一个状态。finalLexemes列表元素的第一个分量表示接收状态集合,第二个状态表示有向前看状态时,回退给输入序列的状态。

首先,介绍输入规范。

输入规范

The Lex/Flex input file consists of 2 sections, separated by a line with just %% in it:

定义部分
%%
规则部分

The first section contains Lex definitions used in the regular expressions. The “%%” marks the beginning of the rules. The second section of the Lex file gives the regular expressions for each token to be recognized.

1 The definitions Section

The definitions section contains declarations of simple name definitions to simplify the scanner specification. Name definitions have the form:

name = definition

The “name” is a sequence of alphanumeric and underscore characters(\w+). The definition is taken to begin at the first non-white-space character following the equal-sign and continuing to the end of the line. The definition can subsequently be referred to using “name”, which will expand to “(definition)”. For example,

Boolean = true | false
NullableBoolean = null | {Boolean}

NullableBoolean is identical to

NullableBoolean = null | ( true | false )

命名模式语句又称为正则定义。必须在同一行,不可以跨行。空行将被忽略。正则定义不包括尾随上下文,尾随上下文只能用于规则。

2 The Rules Section

The rules section of the Lex/Flex input contains a series of rules of the form:

pattern

每行一个模式,不可以跨行。空行将被忽略。如果需要,模式可以用尾随上下文进一步限定。

Lex/Flex Patterns

The patterns in the input are written using an extended set of regular expressions. These are:

`x` 

match the token ‘x’

`[xyz]` 

a “token class”; in this case, the pattern matches either an ‘x’, a ‘y’, or a ‘z’

`r*` 

zero or more r’s, where r is any regular expression

`r+` 

one or more r’s

`r?` 

zero or one r’s (that is, “an optional r”)

`{name}` 

the expansion of the “name” definition (see above)

`(r)` 

match an r; parentheses are used to override precedence (see below)

`rs` 

the regular expression r followed by the regular expression s; called “concatenation”

`r|s` 

either an r or an s

`r/s` 

an r but only if it is followed by an s. The s is not part of the matched sequence. This type of pattern is called as “trailing context”.

The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom.

How the Input is Matched

When the generated scanner is run, it analyzes its input looking for strings which match any of its patterns. If it finds more than one match, it takes the one matching the most tokens(for trailing context rules, this includes the length of the trailing part, even though it will then be returned to the input). If it finds two or more matches of the same length, the rule listed first in the Lex/Flex input file is chosen.

If no match is found, then throw a exception. 你必须显式穷尽每种情况。

当匹配成功时,返回给程序的数据形状。

lex文件说明:

标识符区分大小写。

空白匹配正则表达式(\s+)将被忽略。

不支持单行注释。支持多行注释(/* .*? */),多行注释不可以嵌套。将被忽略。

词法符号字面量,为JSON字符串字面量,双引号包围。词法符号不可以为空字符串"",可以是任何非空字符串。当词法符号匹配正则表达式\w+时,可以省略包围的引号。

lex的用途:

合并有限深度的多个元素:比如,excel引用由三个部分组成sheet!x1,且每个部分都是叶节点。比如正则定义由三个部分组成为{id},且每个部分都是叶节点。

词法分析过程相当于对表达式进行了一次自底向上的合并运算。运算数必须是最底层的符记。

补充缺省符号或值:补充缺省符号可以减少歧义,降低语法解析的难度。比如JavaScript补充行尾分号。比如在多项式,将省略的乘号补充。正则表达式的字符连接运算符用@运算符表示。

不要把复杂性都推给解析器,在分词阶段产生更结构化的序列。降低解析难度。

不要使用空白作为运算符。比如,两个字符ab连续表示连接,我们可以在中间补充连接运算符a@b

词法分析器是一个函数。它返回输入的符记序列分组后的序列。每个分组包含1个以上的输入符记。这个分组还是词法符记。

描述符记排列的正则表达式。

Token是可区分联合。可区分联合的标签,可区分联合的负载数据。

兼具有上下文无关文法,可以是线性重复结构。不能是无限嵌套结构。

词法分析器的输入是一个符记流,输出是模式的流,每个模式匹配一个以上的连续符记,表达为符记的子数组。

符记是一个可区分联合,可区分联合的标签用于词法分析器的模式识别。可区分联合负载的数据不影响模式识别。

生成扫描器

将lex输入保存到任意的文本文件,假设文件路径保存在filePath变量中,则

let text = File.ReadAllText(filePath)
let dfa = Lex.generateDFA text

Lex.generateDFA函数返回数据类型DFA<'tag>,这个数据类型是扫描器所需的DFA数据。

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

update Dependencies