python - OR operator inside OR operator - RegEX -


i'm trying create compiler in python , i'm using re module create tokens. language similar assembly.

almost working, i'm having trouble token. let me give example of token:

mov [eax], 4 mov [name],2 mov eax, [ebx] 

tokens: [eax], [ebx]

i can find want using pattern: \[(eax|ebx)\] error when use other patterns, believe because of '|'.

scanner = re.compile(r"""     ;(.)*                    # comment     |(\[-?[0-9]+\])          # memory_int     |(\[-?0x[0-9a-fa-f]+\])      # memory_hex     |(\[(eax|ebx)\])             # memory access registers     """, re.verbose)  match in re.finditer(scanner, lines[i]):             comment, memory_int, memory_hex, memory_reg = match.groups() 

error:

valueerror: many values unpack (expected 4) 

is there way replace '|' character?

the problem isn't because of | characters in:

    |(\[(eax|ebx)\])             # memory access registers 

it's because part of expression defining two capturing groups, 1 nested inside other — match.groups() returning more values unpacked, such first line:

(none, none, none, '[eax]', 'eax') 

one way avoid nested group instead use:

    |(\[eax\]|\[ebx\])          # memory access registers 

which result in being returned:

(none, none, none, '[eax]') 

as @shashank pointed out, use non-capturing group (?:...) syntax define nested possible register value patterns:

    |(\[(?:eax|ebx)\])          # memory access registers 

to achieve same thing. approach advantageous when there larger number of possible sub-patterns (and they're more complicated) because otherwise you'd need spell out entire pattern in full each possibility rather take advantage of commonality might have.


Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -