c# - HTML hex to polish characters -
i'm downloading html file polish characters, , parsing string by:
public static string hextostring(string hex) { var sb = new stringbuilder(); (int = 0; < hex.length; += 2) { string hexdec = hex.substring(i, 2); int number = int.parse(hexdec, numberstyles.hexnumber); char chartoadd = (char)number; sb.append(chartoadd); } return sb.tostring(); }
so when found %21 i'm sending 21 hextostring()
, in return there !, ok, char ą represented %c4%85 (Ä) , whant ą char
the problem here treating hex codes if utf16 (which native format char
), in fact utf8.
this easy resolve using utf8 encoding.
first, let's write handy stringtobytearray()
method:
public static byte[] stringtobytearray(string hex) { return enumerable.range(0, hex.length) .where(x => x%2 == 0) .select(x => convert.tobyte(hex.substring(x, 2), 16)) .toarray(); }
now can convert hex string text so:
string hexstr = "c485"; // or whatever input hex string is. var bytes = stringtobytearray(hexstr); string text = encoding.utf8.getstring(bytes); // ...use text
Comments
Post a Comment