python - OR operator inside OR operator - RegEX -
i'm trying create compiler in python , i'm using re module create tokens. language similar assembly.
almost working, i'm having trouble token. let me give example of token:
mov [eax], 4 mov [name],2 mov eax, [ebx] tokens: [eax], [ebx]
i can find want using pattern: \[(eax|ebx)\] error when use other patterns, believe because of '|'.
scanner = re.compile(r""" ;(.)* # comment |(\[-?[0-9]+\]) # memory_int |(\[-?0x[0-9a-fa-f]+\]) # memory_hex |(\[(eax|ebx)\]) # memory access registers """, re.verbose) match in re.finditer(scanner, lines[i]): comment, memory_int, memory_hex, memory_reg = match.groups() error:
valueerror: many values unpack (expected 4) is there way replace '|' character?
the problem isn't because of | characters in:
|(\[(eax|ebx)\]) # memory access registers it's because part of expression defining two capturing groups, 1 nested inside other — match.groups() returning more values unpacked, such first line:
(none, none, none, '[eax]', 'eax') one way avoid nested group instead use:
|(\[eax\]|\[ebx\]) # memory access registers which result in being returned:
(none, none, none, '[eax]') as @shashank pointed out, use non-capturing group (?:...) syntax define nested possible register value patterns:
|(\[(?:eax|ebx)\]) # memory access registers to achieve same thing. approach advantageous when there larger number of possible sub-patterns (and they're more complicated) because otherwise you'd need spell out entire pattern in full each possibility rather take advantage of commonality might have.
Comments
Post a Comment