江天一色: rebol 的 parse 解析

parse 支持自顶向下解析，通过rebol的dialect支持实现。可替代正则(regex)

笔记 A Parse Tutorial Sort of (Open sourced Rebol)

例子	说明
`help sys/*parse-url/rules`	url解析
`parse input-string [ opt "big" "bird" ]`	opt 可选项，总是返回success
`parse input-string [ "black" space "dog" ]`	space 表示空格，此外还有newline、tab等关键字
`split "brown dog" " "`	拆分字符串，结果为 `[ "brown" "dog" ]`
`parse/case "ZZ" [ 2 "Z" ]`	加case表示区分大小写，默认不区分
`parse {1234567890} [ "123" 5 skip "90" end ]`	skip跳过5个字符
`parse "bird" [ not "big" "bird" ]`	not 不匹配

解析block

当解析对象是一个block，不是string时，会启动datatype的parse

parse [ 12/Dec/2012 2:30pm ] [ date! time! ]

parse [ <div> "Hello" http://rebol.com $1.00 </div> bob@test.com ] [ tag! "Hello" url! money! tag! email! ]

字符集 charset

charset 是字符集，属于bitset，所以匹配速度较快

可以针对charset做集合常见操作，例如union 并、intersection 交、exclude 差、complement 补。

>> digit: charset [#"0" - #"9"]
>> parse {2069} [4 digit]
== true

还可以增加内容，例如数字集合加一个.：digit-dot: insert copy digit "."

copy

注意，copy最终写入第1个参数的内容，取决于第2个参数匹配的情况

>> parse {123} [copy some-text skip to end]
== true
>> some-text
== "1"

set 与 copy 用法类似

>> parse [ $100 ] [set wallet money!]
== true
>> wallet
== $100

检查是否匹配时执行括号内相关代码

>> parse {123} [ "1" (print "found 1!") "A" end ]
found 1!
== false

参考 R3 Advanced Parse 中 sell/cost 费用的 total price解析例子

while

无限循环：parse input-string [ while [ any "dog" ] ]

while 内部的 subrule 匹配fail时，while循环停止。while自身的返回状态总是success。

OPT 也类似，返回状态总是 success

break 终止当前block匹配

parse [ 1 2 end 3 4 5 ] [ some [ integer! | 'end break ] ]

debug用`??`

>> parse "dog" [ "d" ?? "o" ?? "g" ]
"o": "og"
"g": "g"
== true

不含`|`的word

word-except-bar 不含|的word，用and组合实现

single-word: [ set item word! ]
word-except-bar: [ and not '| single-word ]

高级例子

产品收支的解析器：根据每条记录：解析，计算，求和
rebol/view的vid block 解析器
parse-analysis.r
load-parse-tree.r

笔记 REBOL 3 Concepts: Parsing

parse series rules

当series是一个string，就按character解析

当series是一个block，就按value解析

嵌套block解析，用into

例子：把 “Ukiah”, 10:30 提取到info变量

rule: [
    set date date!
    set info into [string! time!]]
]
data: [10-Jan-2000 ["Ukiah" 10:30]]
print parse data rule

print info

匹配文本 copy text to

to 一直跳到指定的字符串的首部
thru 一直跳到指定的字符串的尾部

page: read http://www.rebol.com/
parse page [thru <title> copy text to </title>]
print text
REBOL Technologies

替换文本

用change/part修改title字段

parse page [
    thru <title> begin: to </title> ending:
    (change/part begin "Word Reference Guide" ending)
]

用change把?全换成!

str: "Where is the turkey? Have you seen the turkey?"
parse str [some [to "?" mark: (change mark "!") skip]]
print str
Where is the turkey! Have you seen the turkey!

用remove / insert / :mark把 time 换成真正的时间

mark 取出对应的变量值

mark: 把mark置为当前的位置

:mark 表示把 mark指向的内容插入:mark所标记的位置

参考 chapter 15 - parsing

先匹配123，mark指到4开头，执行括号内容，mark指到6开头，匹配字符串

>> parse {1234567} ["123" mark: (mark: next next mark) :mark "67"]
== true

str: "at this time, I'd like to see the time change"
parse str [
    some [to "time"
        mark:
        (remove/part mark 4  mark: insert mark now/time)
        :mark
    ]
]
print str
at this 14:42:12, I'd like to see the 14:42:12 change

匹配的内容append到block!

page: read http://www.rebol.com/index.html
tables: make block! 20
parse page [
    any [to "<table" mark: thru ">"
        (append tables index? mark)
    ]
]

foreach table tables [
    print ["table found at index:" table]
]
; table found at index: 836
; table found at index: 2076
; table found at index: 3747

把匹配操作封装成对象

循环提取，append到数组中

tag-parser: make object! [
    tags: make block! 100
    text: make string! 8000
    html-code: [
        copy tag ["<" thru ">"] (append tags tag) |
        copy txt to "<" (append text txt)
    ]
    parse-tags: func [site [url!]] [
        clear tags clear text
        parse read site [to "<" some html-code]
        foreach tag tags [print tag]
        print text
    ]
]
tag-parser/parse-tags http://www.rebol.com

递归匹配

REBOL 3 Concepts: Parsing: Recursive Rules

一个四则运算的实现，简短，清晰，漂亮！

匹配次数

none 是不匹配
some 是1到多次匹配
any 是0到多次匹配

[3 "a" 2 "b"]
aaabb

[1 3 "a" "b"]
ab aab aaab

[some "a" "b"]
ab aab aaab aaaab

[any "a" "b"]
b ab aab aaab aaaab

[any "a" "b"]
b ab aab aaab aaaab

替换文本 change/remove/insert

parse page [
    thru <title> begin: to </title> ending:
    (change/part begin "Word Reference Guide" ending)
]
parse page [thru <title> copy text to </title>]
print text
; Word Reference Guide

str: "Where is the turkey? Have you seen the turkey?"
parse str [some [to "?" mark: (change mark "!") skip]]
print str
; Where is the turkey! Have you seen the turkey!

str: "at this time, I'd like to see the time change"
parse str [
    some [to "time"
        mark:
        (remove/part mark 4  mark: insert mark now/time)
        :mark
    ]
]
print str
; at this 14:42:12, I'd like to see the 14:42:12 change

拆分字符串 split

parse 默认自动拆分空格space、制表符tab、换行newline等等不可见字符、逗号comma、分号semicolon

parse/all 不自动空格等字符，会自动拆分;与,

parse "here there,everywhere; ok" none
["here" "there" "everywhere" "ok"]

parse "707-467-8000" "-"
["707" "467" "8000"]

parse/all "Harry, 1011 Main St., Ukiah" ","
; ["Harry" " 1011 Main St." " Ukiah"]

parse "Harry, 1011 Main St., Ukiah" ","
; ["Harry" "1011" "Main" "St." "Ukiah"]

parse "red#blue*green" "#*"
; == ["red" "blue" "green"]

字符集合

;补集
spacer: charset reduce [tab newline #" "]
non-space: complement spacer

;并集
digit: charset [#"0" - #"9"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
alphanum: union alpha digit

rules的元素组成

REBOL 3 Concepts: Parsing: Summary of Parse Operations

一堆总结列表，备查

笔记 REBOL Programming/Language Features/Parse/Parse expressions

rebol的parse是自顶向下解析，TDPL

解析表达式写成block，如果匹配，就更新input position

parse 有2种情况：

解析字符串，terminal symbols are characters
解析block, terminal symbols are Rebol values

NONE 空

parse "" [#[none]]
; == true
parse [] [#[none]]
; == true

Character 字符

parse "a" [#"a"]
; == true

在parse的rule block里可以用`()`执行代码

例子：打印 3 行 “great job”

rule: [
    set count integer!
    set str string!
    (loop count [print str])
]
parse [3 "great job"] rule

标志后面加`:`取出当前位置到末尾的值

>> parse "123" [ "1" mark: to end ]
== true
>> mark
== "23"

解析block

e1 e2 | e3 相当于 [ e1 e2 ] | e3

递归匹配

anbn: [ "a" anbn "b" | "ab" ]

一张parse idioms表格

怎么写parse expression更简洁，重点

参考 parseen.r

a: charset ",;"
a: [ #"," | #";" ]

a: [m n b]
a: [(l: min m n k: n - m) l b [k [b | c: fail] | :c]]

用到local变量

use-rule.r

evaluate.r

慎用 change / insert / remove

因为慢

江天一色

2025年2月12日星期三

rebol 的 parse 解析

笔记 A Parse Tutorial Sort of (Open sourced Rebol)

解析block

字符集 charset

copy

检查是否匹配时执行括号内相关代码

while

break 终止当前block匹配

debug用`??`

不含`|`的word

高级例子

笔记 REBOL 3 Concepts: Parsing

嵌套block解析，用into

匹配文本 copy text to

替换文本

匹配的内容append到block!

把匹配操作封装成对象

递归匹配

匹配次数

替换文本 change/remove/insert

拆分字符串 split

字符集合

rules的元素组成

笔记 REBOL Programming/Language Features/Parse/Parse expressions

NONE 空

Character 字符

在parse的rule block里可以用`()`执行代码

标志后面加`:`取出当前位置到末尾的值

解析block

递归匹配

一张parse idioms表格

用到local变量

慎用 change / insert / remove

没有评论:

发表评论

2025年2月12日星期三

rebol 的 parse 解析

笔记 A Parse Tutorial Sort of (Open sourced Rebol)

解析block

字符集 charset

copy

检查是否匹配时执行括号内相关代码

while

break 终止当前block匹配

debug用??

不含|的word

高级例子

笔记 REBOL 3 Concepts: Parsing

嵌套block解析，用into

匹配文本 copy text to

替换文本

匹配的内容append到block!

把匹配操作封装成对象

递归匹配

匹配次数

替换文本 change/remove/insert

拆分字符串 split

字符集合

rules的元素组成

笔记 REBOL Programming/Language Features/Parse/Parse expressions

NONE 空

Character 字符

在parse的rule block里可以用()执行代码

标志后面加:取出当前位置到末尾的值

解析block

递归匹配

一张parse idioms表格

用到local变量

慎用 change / insert / remove

没有评论:

发表评论

debug用`??`

不含`|`的word

在parse的rule block里可以用`()`执行代码

标志后面加`:`取出当前位置到末尾的值