python正则表达式的常用用法

网摘文苑 2022-12-22 发布于新疆

展开全文

先来一个正则表达式的表格，内容很多，但是用熟了就好了，下面是python3有关正则表达式的一些常用函数的使用方法

1.re.match()

该函数从字符串的开头部分开始匹配，如下

import retext = 'hello 123 world, hello new world'result = re.match('hello', text)print(result)(结果:)<re.Match object; span=(0, 5), match='hello'>'''span表示跨越的范围，表示在(0, 5)这个范围内匹配到了， match表示匹配结果 'hello''''result = re.match('world', text)print(result)(结果)None  # 不在开头的部分是匹配不到的###################################################################res = re.match(r'hello(\s\d\d\d)', text)print(res)(结果:) <re.Match object; span=(0, 9), match='hello 123'>'''()括号表示匹配的表达式\s 表示空白字符就是匹配了空格\d\d\d 则是匹配了三个数字其中\d\d\d也可写作\d{3}表示三个数字'''res2 = re.match(r'(\w{5}\s)123(\s\w{5})', text)print(res2)(结果:)<re.Match object; span=(0, 15), match='hello 123 world'>print(res2.group(0))  # 这里group(0)表示匹配到的一整句话，这里指的是'hello 123 world'print(res2.group(1))  # group(1)则表示匹配到的第一个结果 'hello 'print(res2.group(2))  # group(2)表示匹配到的第二个结果 ' world'#####################################################################res = re.match(r'hello(.*)world', text)print(res)(结果)<re.Match object; span=(0, 32), match='hello 123 world, hello new world'>'''可以看到.*是匹配了中间的所有字符，把整个text都匹配了下来，与下边的做对比'''res = re.match(r'hello(.*?)world', text)print(res)(结果)<re.Match object; span=(0, 15), match='hello 123 world'>'''可以看到这里的(.*?)只匹配了0-15范围的字符串，这是非贪婪匹配，就是说尽可能少的匹配，上边的那个匹配了整个字符串，是贪婪匹配，尽可能匹配多的字符串下面我们再来举一个例子'''res = re.match(r'hello (.*)(\d+) world', text)print(res.group(1))  # 打印第一个括号匹配的结果(结果)12print(res.group(2))  # 打印第二个括号匹配的结果(结果)3'''(\d+)这里的+号是匹配一个或者多个字符，而前边的(.*)是贪婪匹配，会匹配尽可能多的，所以留给(\d+)一个字符，他自己匹配了12个字符'''res = re.match(r'hello (.*?)(/d+) world', text)print(res.group(1))(结果) (空，什么结果也没有)print(res.group(2))(结果) 123'''(.*?)是非贪婪匹配，能少匹配就少匹配，所以后边的(\d+)可以匹配3个字符，那么(.*?)就偷个懒，不匹配了，这就是?的非贪婪匹配'''##################################################################text = '''hello 123 worldhello new world'''  #  我们来一个带回车的字符串来测试res = re.match(r'hello(.*)new world', text)print(res)(结果)None '''震惊，不是说(.*)可以匹配所有字符?，原来他是不可以匹配换行符的，我们加上一个匹配模式，就可以匹配了，就是加上re.S，看如下例子'''res = re.match(r'hello(.*)new world', text, re.S)print(res)(结果)<re.Match object; span=(0, 30), match='hello 123 world\nhello new world'>'''这就可以匹配到整个句子了re.S 可以让(.*)匹配换行符类似的还有re.I可以不区分大小写来匹配'''##################################################################

2.re.search()

如果说re.match()只能从开头匹配很鸡肋，那么re.search()就可以解决该问题，他可以从任何地方开始匹配，并返回第一个成功的匹配

import retext = 'hello 123 world, hello new world'res = re.search('world', text)print(res)(结果)<re.Match object; span=(10, 15), match='world'>'''可以看到他可以从任何位置开始匹配,并返回第一个world的位置'''###################################################################res = re.search(r'[a-z] world', text)print(res)(结果)<re.Match object; span=(24, 31), match='w world'>'''可以看到，[a-z]匹配到了一个字母w,如果我们想匹配多个字母，就这样[a-z]{3} 匹配3个字符'''res = re.search(r'[a-z]{3} world', text)print(res)(结果)<re.Match object; span=(22, 31), match='new world'>'''结果就是new world'''res = re.search(r'[0-9]{3} world', text)print(res)(结果)<re.Match object; span=(6, 15), match='123 world'>'''匹配到了123 world''''''re.search()方法和re.match()区别就是match只能从开头匹配，search可以任意位置，其他用法都一样'''

#### 3.re.findall() 如果你说，虽然re.search()解决了re.match()只能从开头匹配的问题，但是他只能返回一个结果，也很鸡肋，那么re.findall()则是re.search()的加强版，听他的名字就知道他可以找到所有的符合的表达式并返回

import retext = 'hello 123 world, hello new world'res = re.findall('world', text)print(res)(结果)['world', 'world']'''可以看到他返回了所有找到的结果，并以列表的形式返回，实为强大'''res = re.findall(r'[a-z0-9]{3}[\s]world',text)print(res)(结果)['123 world', 'new world']'''用法和上边两个没啥区别，就是可以返回所有的匹配结果'''

4.re.sub()

这个函数呢，就是一个用来做替换的函数，就是把 用正则表达式匹配到的结果 替换成别的数据

import retext = 'hello 123 world, hello new world'res = re.sub(r'[\d+]', 'x', text)print(res)(结果)hello xxx world, hello new world'''可以看到，text里的所有数字都被换成了x'''res = re.sub(r'[\d]{4}', 'x', text)print(res)(结果)hello x world, hello new world'''注意这两种写法的区别，[\d+]是将每一个数字作为一个个体，而[\d]{4}则是一个大整体来替代'''

5.re.compile()

最后再来一个比较鸡肋的方法，他是将正则表达式编译成正则表达式对象的方法，(啥玩意？),如下例子：

import retext = 'hello 123 world, hello new world'pattern = re.compile(r'[\d]{3}')res = re.search(pattern, text)print(res)(结果)<re.Match object; span=(6, 9), match='123'>'''怎么说， 就是把正则表达式那部分，抽取出来用的时候不用写那么一长串，好吧，我们提取出一个字符串也可以实现的好吗'''string = r'[\d]{3}'res = re.search(string, text)print(res)(结果)<re.Match object; span=(6, 9), match='123'>'''一样的，是不是很鸡肋，不过compile()也可以写作下边的形式'''res = mattern.search(text)print(res)(结果)<re.Match object; span=(6, 9), match='123'>'''这种形式的确会简单那么一丢丢......'''