特殊字符

用于编译正则表达式的字符串中，部分字符、字符串会被赋予特殊的含义。

若要恢复字符原义，可用【\】转义，每个【\】只能转义紧随其后的一个字符

通配符【.】非\n	\d	\D	\s	\S	\w	\W
	数字	非数字	空白	非空白	单词	非单词
合法字符	\t	\n	\r	\f	\a	\e
	制表符	换行符	回车符	换页符	报警符	Escape
边界匹配符	^	$	\A	\Z	\b	\B
	行的开头	行的结尾	string开头	string结尾	单词边界	非单词边界

d-digital, s-space, w-word, b-blank
空白字符包括：空格、制表符、回车符、换页符、换行符等
单词字符包括：0~9，英文字母，下划线
单行模式下，【^】=【\A】，【$】=【\Z】

def test(pattern):
    string  = 'App 8.0\n'
    a = re.findall(pattern,string)
    print(a)
test('\d')  #['8', '0']
test('\D')  #['A', 'p', 'p', ' ', '.', '\n']
test('\s')  #[' ', '\n']
test('\S')  #['A', 'p', 'p', '8', '.', '0']
test('\w')  #['A', 'p', 'p', '8', '0']
test('\W')  #[' ', '.', '\n']
test('.')   #['A', 'p', 'p', ' ', '8', '.', '0']

test('^A')    #['A']
test('\n$')   #['\n']
test('\AA')   #['A']
test('\n\Z')  #['\n']
test('p\\b')  #['p']       匹配右侧是空白的字符串，需要多一个【\】，原因不明
test('\Bp')   #['p', 'p']  匹配左侧不是空白的字符串
test('\Bp\B') #['p']       匹配两侧都不是空白的字符串

# 对特殊字符进行转义

escape(pattern)

作用：在特殊字符（ASCII字符、数字、下划线以外的字符）前加【\】

>>> a = re.escape('1-2')
>>> print(a)
1\-2

# flag 旗标

默认（flags=0）	更改设置	简写	行内旗标
通配符匹配所有Unicode字符	re.ASCII	re.A	(?a)
区分大小写	re.IGNORECASE	re.I	(?i)
单行模式	re.MOLTILINE	re.M	(?m)
【.】不匹配换行符	s.DOTALL	re.S	(?s)
不允许分行书写表达式	re.VERBOSE	re.X	(?x)
不显示Debug信息	re.DEBUG	-	-

re.findall('\w','España')         #['E', 's', 'p', 'a', 'ñ', 'a']
re.findall('\w','España',re.A)    #['E', 's', 'p', 'a', 'a']

re.findall('a','Abc')             #[]
re.findall('a','Abc',re.I)        #['A']

re.findall('^a','abc\nabc')       #['a']
re.findall('^a','abc\nabc',re.M)  #['a', 'a']
re.findall('c$','abc\nabc')       #['c']
re.findall('c$','abc\nabc',re.M)  #['c', 'c']

re.findall('.','abc\n')           #['a', 'b', 'c']
re.findall('.','abc\n', re.S)     #['a', 'b', 'c', '\n']

a = re.findall("""a #字母1
               b   #字母2
               c   #字母3""" ,'www.abc.com')
>>> []
a = re.findall("""a #字母1
               b   #字母2
               c   #字母3""" ,'www.abc.com',re.X)
>>> ['abc']

>>> re.findall('.','abc\n',re.DEBUG)
ANY None

0. INFO 4 0b0 1 1 (to 5)
5: ANY
6. SUCCESS
['a', 'b', 'c']

表达方式

表达方式	作用
\|	匹配两个字符中的一个
[express]	标记中括号表达式——匹配单个字符
(express)	标记组（子表达式）——匹配字符串
{n,m}，{n,}，{,m}	指定字符或组出现n~m次
{n}	字符或组必须刚好出现n次

或【|】

re.findall('a|b','abc') #['a', 'b']

中括号表达式[ ]

枚举。[abc]：a，b，c其中一个字符
范围。[a_z]：az中的一个字符
求否。[^abc]：a，b，c以外的一个字符

re.findall('[ab]','abc')    #['a', 'b']
re.findall('[a-c]','abc')   #['a', 'b', 'c']
re.findall('[^ab]','abc')   #['c']
re.findall('[^a-b]','abc')  #['c']

组()

组与普通字符串的区别

只有一个组时，直接返回第一个组的匹配结果

re.findall("ab","abab")        #['ab', 'ab']
re.findall("(ab)","abab")      #['ab', 'ab']
re.findall("(a)b","abab")      #['a', 'a']
re.findall("a(b)","abab")      #['b', 'b']

多个组时，每个组的匹配结果为【整个表达式匹配结果（元组）】中的一个元素

re.findall("(a)(b)","abab")        #[('a', 'b'), ('a', 'b')]
re.findall("(a)(b)(a)","abab")     #[('a', 'b', 'a')]
re.findall("(a)(b)(a)(b)","abab")  #[('a', 'b', 'a', 'b')]

对匹配结果的命名与引用

某字符串片段在整个字符串中重复出现时，可以将前期的匹配结果直接放到表达式中

语法	作用
(exp)	将匹配结果自动命名，依次为1，2，…
\i	引用第i个结果
(?Pexp)	将匹配结果命名为【name】
(?P=name)	引用指定的结果

默认名字
re.findall("(a)(b)\\1","abab")  #[('a', 'b')]
re.findall("(a)(b)a","abab")    #[('a', 'b')]
re.findall("(a)(b)\\2","abab")  #[]
re.findall("(a)(b)b","abab")    #[]

指定名字
re.findall("(?P<my_name>a)(b)(?P=my_name)","abab")  #[('a', 'b')]

位置限定组

限定匹配结果两侧的内容，两侧的内容写在子表达式中，不在匹配结果中出现

【 (?<=exp1) 或 (?<!exp2) 】 + 【主表达式】 + 【 (?=exp3) 或 (?!exp4) 】

子表达式	条件		相对位置
exp1	必须出现	在匹配内容的	左侧
exp2	必须不出现		左侧
exp3	必须出现		右侧
exp4	必须不出现		右侧

def test(pattern):
    a = re.findall(pattern, 'abc')
    print(a)
test('(?<=a).+?(?=c)') #['b']       b的左侧有a，右侧有c
test('(?<=a).+?(?!c)') #['bc']      bc的左侧有a，右侧没有c
test('(?<!a).+?(?=c)') #['ab']      ab的左侧没有a，右侧有c
test('(?<!a).+?(?!c)') #['a', 'c']  a、c的左侧没有a，右侧没有c

组的更多用法

分类	语法	作用
匹配exp的结果但不捕获	(?:exp)	后期不可引用
旗标组适用范围	(?aiLmsux)	整个正则表达式
	(?imsx-imsx:exp)	当前组
注释	(?#command)	解释说明，不影响正则表达式

频度限定

re表达式默认为贪婪模式，尽可能多地匹配字符。切换成勉强模式需要在频度限定后加【?】

符号	匹配次数
?	0~1次
*	0~N次
+	1~N次
{n,m}，{n,}，{,m}	n~m次
{n}	刚好n次

re模块下的函数

编译pattern——匹配sting（指定范围）——返回匹配结果——处理结果

返回匹配结果时，可以用不同的函数确定匹配的组、返回形式、返回内容（字符串或位置）

编译正则表达式

complie(pattern, flags=0)：将字符串编译成正则表达式，方便后期调用

>>> a = re.compile('1')  #编译正则表达式
>>> b = a.findall('123') #用正则表达式匹配字符串，返回结果
>>> print(b)
['1']

指定匹配范围

参数：pattern, string, flags=0

函数	匹配范围
match()	从字符串的开始位置开始匹配
fullmatch()	要求整个字符串与正则表达式匹配

a = re.match('1', '12')
b = re.match('1', '21')
print(a)  #<re.Match object; span=(0, 1), match='1'>
print(b)  #None

a = re.fullmatch('1', '12')
b = re.fullmatch('12','12')
print(a)  #None
print(b)  #<re.Match object; span=(0, 2), match='12'>

返回匹配结果

参数：pattern, string, flags=0

函数	返回的匹配对象	返回形式
search()	第一个	字符串
findall()	所有	列表
finditer()	所有	迭代器

pattern = '1'
string = '11'
a = re.search(pattern, string)
b = re.findall(pattern, string)
c = re.finditer(pattern, string)
print(a.group())  #1
print(b)          #['1', '1']
for i in c: print(i.group())
>>> 
1
1

search()，match()返回值的函数

返回匹配内容

函数	参数	匹配的组	返回形式
getitem()	g（不可缺省）	指定的一个组	字符串
group()	[group1],…	指定的若干个组	字符串
groups()	无	所有组	元组
groupdict()	无	所有组	字典，组名为key

group()：参数缺省或者为0时，返回所有组的匹配结果，参数只能是阿拉伯数字
groupdict()：仅作用于有自定义命名的组

a = re.search('(a)(b)','abab')
a.__getitem__(0)  #'ab'
a.__getitem__(1)  #'a'
a.__getitem__(2)  #'b'
a.group()         #'ab'
a.group(0)        #'ab'
a.group(1)        #'a'
a.group(2)        #'b'
a.group(1,2)      #('a', 'b')
a.groups()        #('a', 'b')
a.groupdict()     #{}

a = re.search('(?P<name1>a)(b)','abab')
a.groupdict()  #{'name1': 'a'}
a = re.search('(?P<name1>a)(?P<name2>b)','abab')
a.groupdict()  #{'name1': 'a', 'name2': 'b'}

返回匹配位置

函数	参数	返回指定组所匹配的
start()	[group1]	开始位置
end()	[group1]	结束位置
span()	[group1]	开始、结束位置

只接受一个参数，参数缺省或为0时，返回所有组的匹配位置

a = re.search('(ab)(cd)','abcd')
a.span()   #(0, 4)
a.span(0)  #(0, 4)
a.span(1)  #(0, 2)
a.span(2)  #(2, 4)

参数	含义
pos	string的开始位置
endpos	string的结束位置
re	匹配使用的正则表达式
string	匹配使用的字符串
lastindex	匹配的最后一个组的索引（整数）
lastgroup	匹配的最后一个组的名字

对匹配对象进行处理

替换sub

作用：将匹配结果换成repl，repl可以是字符串或函数

参数：pattern, repl, string, count=0, flags=0，其中，count为替换次数，默认为0，全部替换

分割split

作用：以列表形式，返回分割得到的子串

参数：pattern, string, maxsplit=0, flags=0，其中，maxsplit为分割次数

第10章内置模块之正则表达式 re

特殊字符

# 对特殊字符进行转义

# flag 旗标

表达方式

或【|】

中括号表达式[ ]

组()

组与普通字符串的区别

对匹配结果的命名与引用

位置限定组

组的更多用法

频度限定

re模块下的函数

编译正则表达式

指定匹配范围

返回匹配结果

search()，match()返回值的函数

返回匹配内容

返回匹配位置

更多参数

对匹配对象进行处理

替换sub

分割split

第10章 内置模块之正则表达式 re

特殊字符

# 对特殊字符进行转义

# flag 旗标

表达方式

或【|】

中括号表达式[ ]

组()

组与普通字符串的区别

对匹配结果的命名与引用

位置限定组

组的更多用法

频度限定

re模块下的函数

编译正则表达式

指定匹配范围

返回匹配结果

search()，match()返回值的函数

返回匹配内容

返回匹配位置

更多参数

对匹配对象进行处理

替换sub

分割split

第10章内置模块之正则表达式 re