python - OR operator inside OR operator - RegEX -
i'm trying create compiler in python , i'm using re
module create tokens. language similar assembly
.
almost working, i'm having trouble token. let me give example of token:
mov [eax], 4 mov [name],2 mov eax, [ebx]
tokens: [eax], [ebx]
i can find want using pattern: \[(eax|ebx)\]
error when use other patterns, believe because of '|'.
scanner = re.compile(r""" ;(.)* # comment |(\[-?[0-9]+\]) # memory_int |(\[-?0x[0-9a-fa-f]+\]) # memory_hex |(\[(eax|ebx)\]) # memory access registers """, re.verbose) match in re.finditer(scanner, lines[i]): comment, memory_int, memory_hex, memory_reg = match.groups()
error:
valueerror: many values unpack (expected 4)
is there way replace '|'
character?
the problem isn't because of |
characters in:
|(\[(eax|ebx)\]) # memory access registers
it's because part of expression defining two capturing groups, 1 nested inside other — match.groups()
returning more values unpacked, such first line:
(none, none, none, '[eax]', 'eax')
one way avoid nested group instead use:
|(\[eax\]|\[ebx\]) # memory access registers
which result in being returned:
(none, none, none, '[eax]')
as @shashank pointed out, use non-capturing group (?:...)
syntax define nested possible register value patterns:
|(\[(?:eax|ebx)\]) # memory access registers
to achieve same thing. approach advantageous when there larger number of possible sub-patterns (and they're more complicated) because otherwise you'd need spell out entire pattern in full each possibility rather take advantage of commonality might have.
Comments
Post a Comment