配色: 字号:
Python正则表达式re模块
2016-11-10 | 阅:  转:  |  分享 
  
Python正则基本说明之前讲过关于Python正则的,都是理论的东西,现在讲讲Python正则re模块。导入re模块:importre?
查看帮助文档:printre._doc_?下面就是输出的帮助文档:Supportforregularexpressions
(RE).Thismoduleprovidesregularexpressionmatchingoperations
similartothosefoundinPerl.Itsupportsboth8-bitandUnico
destrings;boththepatternandthestringsbeingprocessedcanc
ontainnullbytesandcharactersoutsidetheUSASCIIrange.Regula
rexpressionscancontainbothspecialandordinarycharacters.Mo
stordinarycharacters,like"A","a",or"0",arethesimplestre
gularexpressions;theysimplymatchthemselves.Youcanconcaten
ateordinarycharacters,solastmatchesthestring''last''.Thesp
ecialcharactersare:"."Matchesanycharacterexceptanewline.
"^"Matchesthestartofthestring."$"Matchestheendofthe
stringorjustbeforethenewlineattheendofthestring.""M
atches0ormore(greedy)repetitionsoftheprecedingRE.Greedy
meansthatitwillmatchasmanyrepetitionsaspossible."+"Ma
tches1ormore(greedy)repetitionsoftheprecedingRE."?"Mat
ches0or1(greedy)oftheprecedingRE.?,+?,??Non-greedyver
sionsofthepreviousthreespecialcharacters.{m,n}Matches
frommtonrepetitionsoftheprecedingRE.{m,n}?Non-greedy
versionoftheabove."\\"Eitherescapesspecialcharactersors
ignalsaspecialsequence.//FROMTHISWEBSITE:www.mntuku.cn[]
Indicatesasetofcharacters.A"^"asthefirstcharacter
indicatesacomplementingset."|"A|B,createsanREthatwillm
atcheitherAorB.(...)MatchestheREinsidetheparenthese
s.Thecontentscanberetrievedormatchedlaterinthestring.
(?iLmsux)SettheI,L,M,S,U,orXflagfortheRE(seebelow)
.(?:...)Non-groupingversionofregularparentheses.(?P
...)Thesubstringmatchedbythegroupisaccessiblebyname.(?
P=name)Matchesthetextmatchedearlierbythegroupnamedn
ame.(?#...)Acomment;ignored.(?=...)Matchesif...matches
next,butdoesn''tconsumethestring.(?!...)Matchesif...do
esn''tmatchnext.(?<=...)Matchesifprecededby...(mustbefi
xedlength).(?edlength).(?(id/name)yes|no)Matchesyespatternifthegroupw
ithid/namematched,the(optional)nopatternotherwise.Thespec
ialsequencesconsistof"\\"andacharacterfromthelistbelow.
Iftheordinarycharacterisnotonthelist,thentheresulting
REwillmatchthesecondcharacter.\numberMatchesthecontents
ofthegroupofthesamenumber.\AMatchesonlyatthest
artofthestring.\ZMatchesonlyattheendofthestring
.\bMatchestheemptystring,butonlyatthestartorend
ofaword.\BMatchestheemptystring,butnotatthesta
rtorendofaword.\dMatchesanydecimaldigit;equivale
nttotheset[0-9].\DMatchesanynon-digitcharacter;eq
uivalenttotheset[^0-9].\sMatchesanywhitespacechara
cter;equivalentto[\t\n\r\f\v].\SMatchesanynon-white
spacecharacter;equiv.to[^\t\n\r\f\v].\wMatchesanya
lphanumericcharacter;equivalentto[a-zA-Z0-9_].WithLOCALE,i
twillmatchtheset[0-9_]pluscharactersdefinedaslettersfo
rthecurrentlocale.\WMatchesthecomplementof\w.\\
Matchesaliteralbackslash.Thismoduleexportsthefollowin
gfunctions:matchMatcharegularexpressionpatterntotheb
eginningofastring.searchSearchastringforthepresenceo
fapattern.subSubstituteoccurrencesofapatternfoundi
nastring.subnSameassub,butalsoreturnthenumberofs
ubstitutionsmade.splitSplitastringbytheoccurrencesof
apattern.findallFindalloccurrencesofapatterninastring
.finditerReturnaniteratoryieldingamatchobjectforeachma
tch.compileCompileapatternintoaRegexObject.purgeClea
rtheregularexpressioncache.escapeBackslashallnon-alphan
umericsinastring.Someofthefunctionsinthismoduletakesfl
agsasoptionalparameters:IIGNORECASEPerformcase-insensiti
vematching.LLOCALEMake\w,\W,\b,\B,dependentonthe
currentlocale.MMULTILINE"^"matchesthebeginningoflines(
afteranewline)aswellasthestring."$"matchestheendofli
nes(beforeanewline)aswellastheendofthestring.SDOTAL
L"."matchesanycharacteratall,includingthenewline.XVERB
OSEIgnorewhitespaceandcommentsfornicerlookingRE''s.U
UNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale
.Thismodulealsodefinesanexception''error''.上面说了基本语法和一些函数的使用。基
本语法在上面链接已经说明。下面介绍主要函数的使用。re的函数说明match查看帮助:help(re.match)Helponf
unctionmatchinmodulere:match(pattern,string,flags=0)Tryto
applythepatternatthestartofthestring,returningamatch
object,orNoneifnomatchwasfound.re.match(pattern,string,f
lags=0)功能:从字符串string第一个位置开始匹配,根据建立的pattern规则匹配,返回匹配规则的的字符串。如果没有匹配
成功返回:None.flags是可选参数,用于控制正则表达式的匹配方式。?例子:importrepattern=''[w]{3}.
[a-z]+.(com)''str1="www.baidu.com.net"str2="http:www.baidu.com.net
"re1=re.match(pattern,str1)printre1.group(0)re2=re.match(pattern
,str2)printre2.group(0)匹配开始位置是www.xxx.com的网址,第一个输出www.baidu.com
,第二个竟然报错了,因为第一个不匹配,但是说明文档说的是返回None的。search查看帮助:help(re.search)He
lponfunctionsearchinmodulere:search(pattern,string,flags=
0)Scanthroughstringlookingforamatchtothepattern,return
ingamatchobject,orNoneifnomatchwasfound.re.search(patte
rn,string,flags=0)功能:在字符串string中找到一个满足pattern匹配模式的字符串,不存在的返回Non
e例子:importrepattern=''[w]{3}\.[a-z]+\.(com)''str1="china.www.baidu
.com.net"str2="http:www.baiducom.net"re1=re.search(pattern,str1)p
rintre1.group()re2=re.search(pattern,str2)printre2.group()第一个输出
:www.baidu.com,第二个:报错,匹配失败sub查看帮助:help(re.sub)Helponfunctions
ubinmodulere:sub(pattern,repl,string,count=0,flags=0)Retu
rnthestringobtainedbyreplacingtheleftmostnon-overlapping
occurrencesofthepatterninstringbythereplacementrepl.re
plcanbeeitherastringoracallable;ifastring,backslashe
scapesinitareprocessed.Ifitisacallable,it''spassedthe
matchobjectandmustreturnareplacementstringtobeused.re.
sub(pattern,repl,string,count=0,flags=0)功能:将字符串string满足pattern规则的
字符串替换成repl,count默认是0全部替换,若是2是指只替换前两个。例子:importrepattern=''[w]{3}\
.[a-z]+\.(com)''repl=''www.google.com''str3="ilovewww.baidu.com,to
mlovewww.baid.com"re3=re.sub(pattern,repl,str3,1)printre3输出:i
lovewww.google.com,tomlovewww.baid.comsubn与re.sub差不多只是在返回时候还返
回替换字符的个数?例子:importrepattern=''[w]{3}\.[a-z]+\.(com)''repl=''www.goo
gle.com''str3="ilovewww.baidu.com,tomlovewww.baid.com"re3=re.s
ubn(pattern,repl,str3,2)printre3输出:(‘ilovewww.google.com,toml
ovewww.google.com’,2)split查看帮助:help(re.split)Helponfunction
splitinmodulere:split(pattern,string,maxsplit=0,flags=0)Sp
litthesourcestringbytheoccurrencesofthepattern,returnin
galistcontainingtheresultingsubstrings.re.split(pattern,str
ing,maxsplit=0,flags=0)功能:根据pattern规则把字符串string分离,保存在list中。maxspl
it是最大分类个数,默认最大。?例子:importrestr="xiaoming,xiaohua,xiaoli,xiaoqian
g,xiaozhang"pattern=","printre.split(pattern,str)输出结果:[‘xiaoming
’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaozhang’]findall查看帮助:help
(re.findall)Helponfunctionfindallinmodulere:findall(pattern
,string,flags=0)Returnalistofallnon-overlappingmatchesi
nthestring.Ifoneormoregroupsarepresentinthepattern,r
eturnalistofgroups;thiswillbealistoftuplesifthepatt
ernhasmorethanonegroup.Emptymatchesareincludedinthere
sult.re.findall(pattern,string,flags=0)功能:在字符串string中找出所有满足正则的字
符串,并存在列表list中,没有列表为空例子:importrestr="xiaoming,xiaohua,xiaoli,xiao
qiang,xiaozhang"pattern="\w+"printre.findall(pattern,str)结果和上面的一
样但是理解一样不一样的:[‘xiaoming’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaoz
hang’]finditer和findall类似,在字符串中找到正则表达式所匹配的所有子串,并组成一个迭代器返回例子:impo
rtrestr="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"pattern="\w
+"re4=re.finditer(pattern,str)foriinre4:printi.group()迭代器,通
过for循环输出foriinre4:...printi.group()...xiaomingxiaohuaxiaol
ixiaoqiangxiaozhangcompile查看帮助:help(re.compile)Helponfunctionc
ompileinmodulere:compile(pattern,flags=0)Compilearegulare
xpressionpattern,returningapatternobject.re.compile(pattern,
flags=0)功能:把正则表达式pattern转化成正则表达式对象?例子:importrestr="xiaoming,xia
ohua,xiaoli,xiaoqiang,xiaozhang"pattern="\w+"patternobj=re.compil
e(pattern)re4=re.finditer(pattern,str)foriinre4:printi.grou
p()结果和上一个一样,感觉就是转成对象,在进行其他操作。purge查看帮助:help(re.purge)Helponfunc
tionpurgeinmodulere:purge()Cleartheregularexpressioncach
e功能:清除缓存的正则表达式escape查看帮助:help(re.escape)Helponfunctionescape
inmodulere:escape(pattern)Escapeallnon-alphanumericcharacte
rsinpattern.功能:对字符串中的非字母数字进行转义,具体什么意思我就不知道了。?例子:>>>pattern''\\w+''>>>re.escape(pattern)''\\\\w\\+''看,不一样了。具体我真的不懂了。flagsIIGNORECASEPerformcase-insensitivematching.LLOCALEMake\w,\W,\b,\B,dependentonthecurrentlocale.MMULTILINE"^"matchesthebeginningoflines(afteranewline)aswellasthestring."$"matchestheendoflines(beforeanewline)aswellastheendofthestring.SDOTALL"."matchesanycharacteratall,includingthenewline.XVERBOSEIgnorewhitespaceandcommentsfornicerlookingRE''s.UUNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale.
献花(0)
+1
(本文系雨亭之东首藏)