java - Word not preceded by a regular expression -
there plenty of these questions focus on having couple of characters.
in text file have txx , txx , need find those. have base64 encoded pictures.
meaning have
"picture":"/9j/4aaqsktxx . . .
basically txx
, txx
can appear randomly in base64-encoded pictures.
i used following regular expression:
(?<!"picture":")(?:(\w|\/|\+)+)(txx|txx)
i realized should changed into:
(?<!"picture":")(?:(\d|\w|\/|\+|\=)+)(txx|txx)
but says i'm doing catastrophic backtracking, , without (?:)
(non-capturing group) still doesn't work. doesn't take "picture":" , first char , takes else.
since cannot put regular expression inside negative look-behind quantifier
(?<!"picture":".+)txx|txx
how should form regular expression these pass
"something-txx": "somerandomstring" value not picture: "some other stringtxxsome string"
but doesn't
"picture":"txxl5l71jgwnxmxamjgot8zpwn24jngtzpyhpbqltviqvatk4zozhy+husj7pgv3ag4nmpj4cblxudzyda5c+5qecmgapz9vlrsbzra+tnns0gjufd+nsa5zho9krf2ncwll7360x2kx8za6dqunqubjoelpvro2dq0gomz8hmycktxxh08vekg84oplczvddqvnxkphob0sn5wly+vdgx1di82kzmxmlaojqzksjdgjz0+urlcji/xysc5gcpettxxguageaienoqqlygg/p8k8vlafcvvez+/sfmmpo74snyxgz+/0yi8qkbqcaqcp4dpg6melrzcqvihfar46l6govdpe69movlmhiph0nyarjttu2e+fqwypkqdsslqker0fkjvr0oe5ap1rqowd+pfuo7hefhbvjcfa8vlk42ycudjlilmd1imrnakepok5bpdyousvnhbmses9xmq+pyrdqrqwd0oj2vh/evlexj5omf7bsqhq2yjea2tq83nndrpehp5ywqemxg4+vppelzior4guageacvvgetxtcibci/ify2y2aa57ewu7ljbaibakbqcb4ep62ec/jywmopbnfeierngknk7e3ywtiyjn5fzpylid5kcv67shtclbt+vzg4vziu93lve8squmsdzpsrdz7jse2tzrs+o/kxc7z5oge/ptb+xows7tctpb4z9nikgf9yu3jesmb0yv422np5ai8eatxx"
sample input on : http://pastebin.com/5xjvnqgs (i know pastebin bad since expiration i'm having problem pasting amount of text page stucks)
and results should be:
result1: "some-txx": value
result2: hereistxx: "1235"
result3: "groupdata" : "{data1: sample, txx-value:12312 ,data2: sample2}"
i believe can use rather useful java "to-some-extent" variable-width look-behind:
(?<!"picture":"[^"]{0,10000})(?i:txx)
you can adjust 10000
value in case have longer base64-encoded strings.
tested on regexplanet
in case have large images, use reverse-string trick reversed regex (look-aheads can of undefined variable size):
string rx = "(?i)\"[^\"]*\"\\s*:\\s*\"[^\"]*xxt[^\"]*\"(?![^\"]*\":\"erutcip\")";
sample java program on ideone:
import java.util.regex.*; class helloworld{ public static void main(string []args){ string str = "the_huige_string_that_caused_body limited 30000 characters;you entered 53501_issue"; str = new stringbuilder(str).reverse().tostring(); string rx = "\"?[^\"]*\"?\\s*\"?[^\"\\n\\r]*(?:xxt|xxt)[^\"\\n\\r]*(?![^\"]*\":\"erutcip\")"; pattern ptrn = pattern.compile(rx); matcher m = ptrn.matcher(str); while (m.find()) { system.out.println(new stringbuilder(m.group(0)).reverse().tostring()); } m = ptrn.matcher(new stringbuilder("\"something-txx\": \"somerandomstring\"").reverse().tostring()); while (m.find()) { system.out.println(new stringbuilder(m.group(0)).reverse().tostring()); } } }
Comments
Post a Comment