parse 支持自顶向下解析,通过rebol的dialect支持实现。可替代正则(regex)
笔记 A Parse Tutorial Sort of (Open sourced Rebol)
例子 | 说明 |
---|---|
help sys/*parse-url/rules |
url解析 |
parse input-string [ opt "big" "bird" ] |
opt 可选项,总是返回success |
parse input-string [ "black" space "dog" ] |
space 表示空格,此外还有newline、tab等关键字 |
split "brown dog" " " |
拆分字符串,结果为 [ "brown" "dog" ] |
parse/case "ZZ" [ 2 "Z" ] |
加case表示区分大小写,默认不区分 |
parse {1234567890} [ "123" 5 skip "90" end ] |
skip跳过5个字符 |
parse "bird" [ not "big" "bird" ] |
not 不匹配 |
解析block
当解析对象是一个block,不是string时,会启动datatype的parse
parse [ 12/Dec/2012 2:30pm ] [ date! time! ]
parse [ <div> "Hello" http://rebol.com $1.00 </div> bob@test.com ] [ tag! "Hello" url! money! tag! email! ]
字符集 charset
charset 是字符集,属于bitset,所以匹配速度较快
可以针对charset做集合常见操作,例如union 并、intersection 交、exclude 差、complement 补。
>> digit: charset [#"0" - #"9"]
>> parse {2069} [4 digit]
== true
还可以增加内容,例如数字集合加一个.
:digit-dot: insert copy digit "."
copy
注意,copy最终写入第1个参数的内容,取决于第2个参数匹配的情况
>> parse {123} [copy some-text skip to end]
== true
>> some-text
== "1"
set 与 copy 用法类似
>> parse [ $100 ] [set wallet money!]
== true
>> wallet
== $100
检查是否匹配时执行括号内相关代码
>> parse {123} [ "1" (print "found 1!") "A" end ]
found 1!
== false
参考 R3 Advanced Parse 中 sell/cost 费用的 total price解析例子
while
无限循环:parse input-string [ while [ any "dog" ] ]
while 内部的 subrule 匹配fail时,while循环停止。while自身的返回状态总是success
。
OPT 也类似,返回状态总是 success
break 终止当前block匹配
parse [ 1 2 end 3 4 5 ] [ some [ integer! | 'end break ] ]
debug用??
>> parse "dog" [ "d" ?? "o" ?? "g" ]
"o": "og"
"g": "g"
== true
不含|
的word
word-except-bar 不含|
的word,用and
组合实现
single-word: [ set item word! ]
word-except-bar: [ and not '| single-word ]
高级例子
- 产品收支的解析器:根据每条记录:解析,计算,求和
- rebol/view的vid block 解析器
- parse-analysis.r
- load-parse-tree.r
笔记 REBOL 3 Concepts: Parsing
parse series rules
当series是一个string,就按character解析
当series是一个block,就按value解析
嵌套block解析,用into
例子:把 “Ukiah”, 10:30 提取到info变量
rule: [
set date date!
set info into [string! time!]]
]
data: [10-Jan-2000 ["Ukiah" 10:30]]
print parse data rule
print info
匹配文本 copy text to
- to 一直跳到指定的字符串的首部
- thru 一直跳到指定的字符串的尾部
page: read http://www.rebol.com/
parse page [thru <title> copy text to </title>]
print text
REBOL Technologies
替换文本
用change/part修改title字段
parse page [
thru <title> begin: to </title> ending:
(change/part begin "Word Reference Guide" ending)
]
用change把?
全换成!
str: "Where is the turkey? Have you seen the turkey?"
parse str [some [to "?" mark: (change mark "!") skip]]
print str
Where is the turkey! Have you seen the turkey!
用remove / insert / :mark
把 time 换成真正的时间
mark
取出对应的变量值
mark:
把mark置为当前的位置
:mark
表示把 mark指向的内容 插入:mark所标记的位置
先匹配123,mark指到4开头,执行括号内容,mark指到6开头,匹配字符串
>> parse {1234567} ["123" mark: (mark: next next mark) :mark "67"]
== true
str: "at this time, I'd like to see the time change"
parse str [
some [to "time"
mark:
(remove/part mark 4 mark: insert mark now/time)
:mark
]
]
print str
at this 14:42:12, I'd like to see the 14:42:12 change
匹配的内容append到block!
page: read http://www.rebol.com/index.html
tables: make block! 20
parse page [
any [to "<table" mark: thru ">"
(append tables index? mark)
]
]
foreach table tables [
print ["table found at index:" table]
]
; table found at index: 836
; table found at index: 2076
; table found at index: 3747
把匹配操作封装成对象
循环提取,append到数组中
tag-parser: make object! [
tags: make block! 100
text: make string! 8000
html-code: [
copy tag ["<" thru ">"] (append tags tag) |
copy txt to "<" (append text txt)
]
parse-tags: func [site [url!]] [
clear tags clear text
parse read site [to "<" some html-code]
foreach tag tags [print tag]
print text
]
]
tag-parser/parse-tags http://www.rebol.com
递归匹配
REBOL 3 Concepts: Parsing: Recursive Rules
一个四则运算的实现,简短,清晰,漂亮!
匹配次数
- none 是不匹配
- some 是1到多次匹配
- any 是0到多次匹配
[3 "a" 2 "b"]
aaabb
[1 3 "a" "b"]
ab aab aaab
[some "a" "b"]
ab aab aaab aaaab
[any "a" "b"]
b ab aab aaab aaaab
[any "a" "b"]
b ab aab aaab aaaab
替换文本 change/remove/insert
parse page [
thru <title> begin: to </title> ending:
(change/part begin "Word Reference Guide" ending)
]
parse page [thru <title> copy text to </title>]
print text
; Word Reference Guide
str: "Where is the turkey? Have you seen the turkey?"
parse str [some [to "?" mark: (change mark "!") skip]]
print str
; Where is the turkey! Have you seen the turkey!
str: "at this time, I'd like to see the time change"
parse str [
some [to "time"
mark:
(remove/part mark 4 mark: insert mark now/time)
:mark
]
]
print str
; at this 14:42:12, I'd like to see the 14:42:12 change
拆分字符串 split
parse 默认自动拆分空格space、制表符tab、换行newline等等不可见字符、逗号comma、分号semicolon
parse/all 不自动空格等字符,会自动拆分;
与,
parse "here there,everywhere; ok" none
["here" "there" "everywhere" "ok"]
parse "707-467-8000" "-"
["707" "467" "8000"]
parse/all "Harry, 1011 Main St., Ukiah" ","
; ["Harry" " 1011 Main St." " Ukiah"]
parse "Harry, 1011 Main St., Ukiah" ","
; ["Harry" "1011" "Main" "St." "Ukiah"]
parse "red#blue*green" "#*"
; == ["red" "blue" "green"]
字符集合
;补集
spacer: charset reduce [tab newline #" "]
non-space: complement spacer
;并集
digit: charset [#"0" - #"9"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
alphanum: union alpha digit
rules的元素组成
REBOL 3 Concepts: Parsing: Summary of Parse Operations
一堆总结列表,备查
笔记 REBOL Programming/Language Features/Parse/Parse expressions
rebol的parse是自顶向下解析,TDPL
解析表达式写成block,如果匹配,就更新input position
parse 有2种情况:
- 解析字符串,terminal symbols are characters
- 解析block, terminal symbols are Rebol values
NONE 空
parse "" [#[none]]
; == true
parse [] [#[none]]
; == true
Character 字符
parse "a" [#"a"]
; == true
在parse的rule block里可以用()
执行代码
例子:打印 3 行 “great job”
rule: [
set count integer!
set str string!
(loop count [print str])
]
parse [3 "great job"] rule
标志后面加:
取出当前位置到末尾的值
>> parse "123" [ "1" mark: to end ]
== true
>> mark
== "23"
解析block
e1 e2 | e3
相当于 [ e1 e2 ] | e3
递归匹配
anbn: [ "a" anbn "b" | "ab" ]
一张parse idioms表格
怎么写parse expression更简洁,重点
参考 parseen.r
a: charset ",;"
a: [ #"," | #";" ]
a: [m n b]
a: [(l: min m n k: n - m) l b [k [b | c: fail] | :c]]
用到local变量
慎用 change / insert / remove
因为慢
没有评论:
发表评论