regex - How to get rid of weird characters in python string? -
i have lines contains pesky control characters:
when tried read file , str.replace()
, these control characters didn't replaced. i've tried it's still sticking around.
with io.open('infile', 'r', encoding='utf8') fin: line in fin: line = line.replace(u'\u0094', '"').replace(u'\u0093', '"').replace(u'\u0092', "'").replace(u'\u0096', '"').replace(u'\u0084', '"')
how these strings replaces? there cannonical way replace these strings (they quotation marks / whitespaces of various kind)?
what these characters anyway? u'\u0084'
?
last time had problem, happened because getting characters outside ascii range, had wrong bounds.
Comments
Post a Comment