regex - re.sub in python : verbose mode does not work with replacement pattern? -
is there way around limitation of re.sub? not functional verbose mode (with reference here) in replace pattern; not eliminate whitespace or comments (yet interpret backreferences properly).
import remport re ft1=r"""(?p<test>[0-9]+)""" ft2=r"""\g<test>and then: \g<test> #this remains""" print re.sub(ft1,ft2,"front 1234 back",flags=re.verbose) #does not work #result: front 1234and then: 1234 #this remains re.verbose not apply replacement pattern... there work-around? (simpler working groups after re.match.)
here way have found "compile" re replace expression sub. there few constraints: both spaces , newlines have written spaces written re match expression (in square brackets: [ ] , [\n\n\n]) , whole replace expression should have verbose newline @ beginning.
an example: searches string , detects word repeated after /ins/ , /del/, replaces occurrences single occurrence of word in front of .
both match , replace expressions complex, why want verbose version of replace expression.
===========================
import re test = "<p>le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>" find=r""" <ins> (?p<front>[^<]+) #there added matches (?p<delim1>[ .!,;:]+) #get delimiter (?p<back1>[^<]*?) </ins> [ ] <del> (?p=front) (?p<delim2>[ .!,;:]+) (?p<back2>[^<]*?) </del> """ replace = r""" <<<<<\g<front>>>>> #pop out in front matching thing <ins> \g<delim1> \g<back1> </ins> [ ] <del> \g<delim2> #put delimiters , backend \g<back2> </del> """ flatreplace = r"""<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>""" def compilerepl(instring): outstring=instring #get space @ front of line outstring=re.sub(r"\n\s+","\n",outstring) #get space @ end of line outstring=re.sub(r"\s+\n","",outstring) #get rid of comments outstring=re.sub(r"\s*#[^\n]*\n","\n",outstring) #preserve space in brackets, , eliminate brackets outstring=re.sub(r"(?<!\[)\[(\s+)\](?!\[)",r"\1",outstring) # rid of newlines not in brackets outstring=re.sub(r"(?<!\[)(\n)+(?!\])","",outstring) #get rid of brackets around newlines outstring=re.sub(r"\[((\\n)+)\]",r"\1",outstring) #trim brackets outstring=re.sub(r"\[\[(.*?)\]\]","[\\1]",outstring) return outstring assert(flatreplace == compilerepl(replace)) print test print compilerepl(replace) print re.sub(find,compilerepl(replace),test, flags=re.verbose) #<p>le petit <ins>homme à</ins> <del>homme en</del> ressorts</p> #<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del> #<p>le petit <<<<<homme>>>><ins> à</ins> <del> en</del> ressorts</p>
Comments
Post a Comment