Python正则基本说明之前讲过关于Python正则的,都是理论的东西,现在讲讲Python正则re模块。导入re模块:importre? 查看帮助文档:printre._doc_?下面就是输出的帮助文档:Supportforregularexpressions (RE).Thismoduleprovidesregularexpressionmatchingoperations similartothosefoundinPerl.Itsupportsboth8-bitandUnico destrings;boththepatternandthestringsbeingprocessedcanc ontainnullbytesandcharactersoutsidetheUSASCIIrange.Regula rexpressionscancontainbothspecialandordinarycharacters.Mo stordinarycharacters,like"A","a",or"0",arethesimplestre gularexpressions;theysimplymatchthemselves.Youcanconcaten ateordinarycharacters,solastmatchesthestring''last''.Thesp ecialcharactersare:"."Matchesanycharacterexceptanewline. "^"Matchesthestartofthestring."$"Matchestheendofthe stringorjustbeforethenewlineattheendofthestring.""M atches0ormore(greedy)repetitionsoftheprecedingRE.Greedy meansthatitwillmatchasmanyrepetitionsaspossible."+"Ma tches1ormore(greedy)repetitionsoftheprecedingRE."?"Mat ches0or1(greedy)oftheprecedingRE.?,+?,??Non-greedyver sionsofthepreviousthreespecialcharacters.{m,n}Matches frommtonrepetitionsoftheprecedingRE.{m,n}?Non-greedy versionoftheabove."\\"Eitherescapesspecialcharactersors ignalsaspecialsequence.//FROMTHISWEBSITE:www.mntuku.cn[] Indicatesasetofcharacters.A"^"asthefirstcharacter indicatesacomplementingset."|"A|B,createsanREthatwillm atcheitherAorB.(...)MatchestheREinsidetheparenthese s.Thecontentscanberetrievedormatchedlaterinthestring. (?iLmsux)SettheI,L,M,S,U,orXflagfortheRE(seebelow) .(?:...)Non-groupingversionofregularparentheses.(?P ...)Thesubstringmatchedbythegroupisaccessiblebyname.(? P=name)Matchesthetextmatchedearlierbythegroupnamedn ame.(?#...)Acomment;ignored.(?=...)Matchesif...matches next,butdoesn''tconsumethestring.(?!...)Matchesif...do esn''tmatchnext.(?<=...)Matchesifprecededby...(mustbefi xedlength).(?edlength).(?(id/name)yes|no)Matchesyespatternifthegroupw ithid/namematched,the(optional)nopatternotherwise.Thespec ialsequencesconsistof"\\"andacharacterfromthelistbelow. Iftheordinarycharacterisnotonthelist,thentheresulting REwillmatchthesecondcharacter.\numberMatchesthecontents ofthegroupofthesamenumber.\AMatchesonlyatthest artofthestring.\ZMatchesonlyattheendofthestring .\bMatchestheemptystring,butonlyatthestartorend ofaword.\BMatchestheemptystring,butnotatthesta rtorendofaword.\dMatchesanydecimaldigit;equivale nttotheset[0-9].\DMatchesanynon-digitcharacter;eq uivalenttotheset[^0-9].\sMatchesanywhitespacechara cter;equivalentto[\t\n\r\f\v].\SMatchesanynon-white spacecharacter;equiv.to[^\t\n\r\f\v].\wMatchesanya lphanumericcharacter;equivalentto[a-zA-Z0-9_].WithLOCALE,i twillmatchtheset[0-9_]pluscharactersdefinedaslettersfo rthecurrentlocale.\WMatchesthecomplementof\w.\\ Matchesaliteralbackslash.Thismoduleexportsthefollowin gfunctions:matchMatcharegularexpressionpatterntotheb eginningofastring.searchSearchastringforthepresenceo fapattern.subSubstituteoccurrencesofapatternfoundi nastring.subnSameassub,butalsoreturnthenumberofs ubstitutionsmade.splitSplitastringbytheoccurrencesof apattern.findallFindalloccurrencesofapatterninastring .finditerReturnaniteratoryieldingamatchobjectforeachma tch.compileCompileapatternintoaRegexObject.purgeClea rtheregularexpressioncache.escapeBackslashallnon-alphan umericsinastring.Someofthefunctionsinthismoduletakesfl agsasoptionalparameters:IIGNORECASEPerformcase-insensiti vematching.LLOCALEMake\w,\W,\b,\B,dependentonthe currentlocale.MMULTILINE"^"matchesthebeginningoflines( afteranewline)aswellasthestring."$"matchestheendofli nes(beforeanewline)aswellastheendofthestring.SDOTAL L"."matchesanycharacteratall,includingthenewline.XVERB OSEIgnorewhitespaceandcommentsfornicerlookingRE''s.U UNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale .Thismodulealsodefinesanexception''error''.上面说了基本语法和一些函数的使用。基 本语法在上面链接已经说明。下面介绍主要函数的使用。re的函数说明match查看帮助:help(re.match)Helponf unctionmatchinmodulere:match(pattern,string,flags=0)Tryto applythepatternatthestartofthestring,returningamatch object,orNoneifnomatchwasfound.re.match(pattern,string,f lags=0)功能:从字符串string第一个位置开始匹配,根据建立的pattern规则匹配,返回匹配规则的的字符串。如果没有匹配 成功返回:None.flags是可选参数,用于控制正则表达式的匹配方式。?例子:importrepattern=''[w]{3}. [a-z]+.(com)''str1="www.baidu.com.net"str2="http:www.baidu.com.net "re1=re.match(pattern,str1)printre1.group(0)re2=re.match(pattern ,str2)printre2.group(0)匹配开始位置是www.xxx.com的网址,第一个输出www.baidu.com ,第二个竟然报错了,因为第一个不匹配,但是说明文档说的是返回None的。search查看帮助:help(re.search)He lponfunctionsearchinmodulere:search(pattern,string,flags= 0)Scanthroughstringlookingforamatchtothepattern,return ingamatchobject,orNoneifnomatchwasfound.re.search(patte rn,string,flags=0)功能:在字符串string中找到一个满足pattern匹配模式的字符串,不存在的返回Non e例子:importrepattern=''[w]{3}\.[a-z]+\.(com)''str1="china.www.baidu .com.net"str2="http:www.baiducom.net"re1=re.search(pattern,str1)p rintre1.group()re2=re.search(pattern,str2)printre2.group()第一个输出 :www.baidu.com,第二个:报错,匹配失败sub查看帮助:help(re.sub)Helponfunctions ubinmodulere:sub(pattern,repl,string,count=0,flags=0)Retu rnthestringobtainedbyreplacingtheleftmostnon-overlapping occurrencesofthepatterninstringbythereplacementrepl.re plcanbeeitherastringoracallable;ifastring,backslashe scapesinitareprocessed.Ifitisacallable,it''spassedthe matchobjectandmustreturnareplacementstringtobeused.re. sub(pattern,repl,string,count=0,flags=0)功能:将字符串string满足pattern规则的 字符串替换成repl,count默认是0全部替换,若是2是指只替换前两个。例子:importrepattern=''[w]{3}\ .[a-z]+\.(com)''repl=''www.google.com''str3="ilovewww.baidu.com,to mlovewww.baid.com"re3=re.sub(pattern,repl,str3,1)printre3输出:i lovewww.google.com,tomlovewww.baid.comsubn与re.sub差不多只是在返回时候还返 回替换字符的个数?例子:importrepattern=''[w]{3}\.[a-z]+\.(com)''repl=''www.goo gle.com''str3="ilovewww.baidu.com,tomlovewww.baid.com"re3=re.s ubn(pattern,repl,str3,2)printre3输出:(‘ilovewww.google.com,toml ovewww.google.com’,2)split查看帮助:help(re.split)Helponfunction splitinmodulere:split(pattern,string,maxsplit=0,flags=0)Sp litthesourcestringbytheoccurrencesofthepattern,returnin galistcontainingtheresultingsubstrings.re.split(pattern,str ing,maxsplit=0,flags=0)功能:根据pattern规则把字符串string分离,保存在list中。maxspl it是最大分类个数,默认最大。?例子:importrestr="xiaoming,xiaohua,xiaoli,xiaoqian g,xiaozhang"pattern=","printre.split(pattern,str)输出结果:[‘xiaoming ’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaozhang’]findall查看帮助:help (re.findall)Helponfunctionfindallinmodulere:findall(pattern ,string,flags=0)Returnalistofallnon-overlappingmatchesi nthestring.Ifoneormoregroupsarepresentinthepattern,r eturnalistofgroups;thiswillbealistoftuplesifthepatt ernhasmorethanonegroup.Emptymatchesareincludedinthere sult.re.findall(pattern,string,flags=0)功能:在字符串string中找出所有满足正则的字 符串,并存在列表list中,没有列表为空例子:importrestr="xiaoming,xiaohua,xiaoli,xiao qiang,xiaozhang"pattern="\w+"printre.findall(pattern,str)结果和上面的一 样但是理解一样不一样的:[‘xiaoming’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaoz hang’]finditer和findall类似,在字符串中找到正则表达式所匹配的所有子串,并组成一个迭代器返回例子:impo rtrestr="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"pattern="\w +"re4=re.finditer(pattern,str)foriinre4:printi.group()迭代器,通 过for循环输出foriinre4:...printi.group()...xiaomingxiaohuaxiaol ixiaoqiangxiaozhangcompile查看帮助:help(re.compile)Helponfunctionc ompileinmodulere:compile(pattern,flags=0)Compilearegulare xpressionpattern,returningapatternobject.re.compile(pattern, flags=0)功能:把正则表达式pattern转化成正则表达式对象?例子:importrestr="xiaoming,xia ohua,xiaoli,xiaoqiang,xiaozhang"pattern="\w+"patternobj=re.compil e(pattern)re4=re.finditer(pattern,str)foriinre4:printi.grou p()结果和上一个一样,感觉就是转成对象,在进行其他操作。purge查看帮助:help(re.purge)Helponfunc tionpurgeinmodulere:purge()Cleartheregularexpressioncach e功能:清除缓存的正则表达式escape查看帮助:help(re.escape)Helponfunctionescape inmodulere:escape(pattern)Escapeallnon-alphanumericcharacte rsinpattern.功能:对字符串中的非字母数字进行转义,具体什么意思我就不知道了。?例子:>>>pattern''\\w+''>>>re.escape(pattern)''\\\\w\\+''看,不一样了。具体我真的不懂了。flagsIIGNORECASEPerformcase-insensitivematching.LLOCALEMake\w,\W,\b,\B,dependentonthecurrentlocale.MMULTILINE"^"matchesthebeginningoflines(afteranewline)aswellasthestring."$"matchestheendoflines(beforeanewline)aswellastheendofthestring.SDOTALL"."matchesanycharacteratall,includingthenewline.XVERBOSEIgnorewhitespaceandcommentsfornicerlookingRE''s.UUNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale. |
|