hadoop - Charset, Accents, Special Characters in Apache Hive -
the problem
i’m having quite problems hive tables contain special characters (in french) in of row values. basically, special character (like accent on letter or other diacritics) gets transformed in pure gibberish (various weird symbols) when querying data (via hive cli or other methods). problem not column names, actual row values , content.
for exemple, instead of printing "variat°" or other special character or accent mark, result (when using select statement):
variat� cancel
infos & conf
the hive table external, csv file in hdfs encoded in charset iso-8859-1. changing original file encoding charset doesn’t produce better result.
i'm using hortonworks distribution 2.2 on redhat enterprise 6. original csv displays correctly in linux.
the question
i've looked on web similar problems seem no 1 encountered it. or @ least uses english when using hive :) jiras have addressed issues special characters in hive table column names - problem actual content of rows.
- how can deal problem in hive?
- is not possible display special characters in hive?
- is there "charset" option hive?
any appreciated i’m stuck. thank in advance!
i had similar issue since source file small used notepad++ covert utf-8 encoding.
Comments
Post a Comment