python - OR operator inside OR operator - RegEX -
i'm trying create compiler in python , i'm using re module create tokens. language similar assembly.
almost working, i'm having trouble token. let me give example of token:
mov [eax], 4 mov [name],2 mov eax, [ebx]   tokens: [eax], [ebx]
i can find want using pattern: \[(eax|ebx)\] error when use other patterns, believe because of '|'.
scanner = re.compile(r"""     ;(.)*                    # comment     |(\[-?[0-9]+\])          # memory_int     |(\[-?0x[0-9a-fa-f]+\])      # memory_hex     |(\[(eax|ebx)\])             # memory access registers     """, re.verbose)  match in re.finditer(scanner, lines[i]):             comment, memory_int, memory_hex, memory_reg = match.groups()   error:
valueerror: many values unpack (expected 4)   is there way replace '|' character?
the problem isn't because of | characters in:
    |(\[(eax|ebx)\])             # memory access registers   it's because part of expression defining two capturing groups, 1 nested inside other — match.groups() returning more values unpacked, such first line:
(none, none, none, '[eax]', 'eax')   one way avoid nested group instead use:
    |(\[eax\]|\[ebx\])          # memory access registers   which result in being returned:
(none, none, none, '[eax]')   as @shashank pointed out, use non-capturing group (?:...) syntax define nested possible register value patterns:
    |(\[(?:eax|ebx)\])          # memory access registers   to achieve same thing. approach advantageous when there larger number of possible sub-patterns (and they're more complicated) because otherwise you'd need spell out entire pattern in full each possibility rather take advantage of commonality might have.
Comments
Post a Comment