java - Word not preceded by a regular expression -


there plenty of these questions focus on having couple of characters.

in text file have txx , txx , need find those. have base64 encoded pictures.

meaning have

"picture":"/9j/4aaqsktxx . . .

basically txx, txx can appear randomly in base64-encoded pictures.

i used following regular expression:

(?<!"picture":")(?:(\w|\/|\+)+)(txx|txx) 

i realized should changed into:

(?<!"picture":")(?:(\d|\w|\/|\+|\=)+)(txx|txx) 

but says i'm doing catastrophic backtracking, , without (?:) (non-capturing group) still doesn't work. doesn't take "picture":" , first char , takes else.

since cannot put regular expression inside negative look-behind quantifier

(?<!"picture":".+)txx|txx 

how should form regular expression these pass

"something-txx": "somerandomstring" value not picture:  "some other stringtxxsome string" 

but doesn't

"picture":"txxl5l71jgwnxmxamjgot8zpwn24jngtzpyhpbqltviqvatk4zozhy+husj7pgv3ag4nmpj4cblxudzyda5c+5qecmgapz9vlrsbzra+tnns0gjufd+nsa5zho9krf2ncwll7360x2kx8za6dqunqubjoelpvro2dq0gomz8hmycktxxh08vekg84oplczvddqvnxkphob0sn5wly+vdgx1di82kzmxmlaojqzksjdgjz0+urlcji/xysc5gcpettxxguageaienoqqlygg/p8k8vlafcvvez+/sfmmpo74snyxgz+/0yi8qkbqcaqcp4dpg6melrzcqvihfar46l6govdpe69movlmhiph0nyarjttu2e+fqwypkqdsslqker0fkjvr0oe5ap1rqowd+pfuo7hefhbvjcfa8vlk42ycudjlilmd1imrnakepok5bpdyousvnhbmses9xmq+pyrdqrqwd0oj2vh/evlexj5omf7bsqhq2yjea2tq83nndrpehp5ywqemxg4+vppelzior4guageacvvgetxtcibci/ify2y2aa57ewu7ljbaibakbqcb4ep62ec/jywmopbnfeierngknk7e3ywtiyjn5fzpylid5kcv67shtclbt+vzg4vziu93lve8squmsdzpsrdz7jse2tzrs+o/kxc7z5oge/ptb+xows7tctpb4z9nikgf9yu3jesmb0yv422np5ai8eatxx" 

sample input on : http://pastebin.com/5xjvnqgs (i know pastebin bad since expiration i'm having problem pasting amount of text page stucks)

and results should be:

result1: "some-txx": value

result2: hereistxx: "1235"

result3: "groupdata" : "{data1: sample, txx-value:12312 ,data2: sample2}"

i believe can use rather useful java "to-some-extent" variable-width look-behind:

(?<!"picture":"[^"]{0,10000})(?i:txx) 

you can adjust 10000 value in case have longer base64-encoded strings.

tested on regexplanet

in case have large images, use reverse-string trick reversed regex (look-aheads can of undefined variable size):

string rx = "(?i)\"[^\"]*\"\\s*:\\s*\"[^\"]*xxt[^\"]*\"(?![^\"]*\":\"erutcip\")"; 

sample java program on ideone:

import java.util.regex.*; class helloworld{       public static void main(string []args){       string str = "the_huige_string_that_caused_body limited 30000 characters;you entered 53501_issue";      str = new stringbuilder(str).reverse().tostring();      string rx = "\"?[^\"]*\"?\\s*\"?[^\"\\n\\r]*(?:xxt|xxt)[^\"\\n\\r]*(?![^\"]*\":\"erutcip\")";      pattern ptrn = pattern.compile(rx);      matcher m = ptrn.matcher(str);      while (m.find()) {          system.out.println(new stringbuilder(m.group(0)).reverse().tostring());      }       m = ptrn.matcher(new stringbuilder("\"something-txx\": \"somerandomstring\"").reverse().tostring());      while (m.find()) {         system.out.println(new stringbuilder(m.group(0)).reverse().tostring());      }   } } 

Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -