python - how to transform pandas dataframe for insertion via executemany() statement? -
i have big pandas dataframe - 50
or headers , few hundred thousand rows of data - , i'm looking transfer data database using ceodbc
module. using pyodbc
, using simple execute statement in loop taking ridiculously long (1000 records per 10 minutes)...
i'm trying new module , trying introduce executemany()
although i'm not quite sure what's meant sequence of parameters in:
cursor.executemany("""insert table.name(a, b, c, d, e, f) values(?, ?, ?, ?, ?), sequence_of_parameters)
should constant list working through each header like
['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2', '2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd']
- where example of 3 rows
or format that's needed?
as related question, how can go converting regular pandas dataframe format?
thanks!
i managed figure out in end. if have pandas dataframe want write database using ceodbc
module used, code is:
(with all_data
dataframe) map dataframe values string , store each row tuple in list of tuples
for r in all_data.columns.values: all_data[r] = all_data[r].map(str) all_data[r] = all_data[r].map(str.strip) tuples = [tuple(x) x in all_data.values]
for list of tuples, change null value signifiers - have been captured strings in conversion above - null type can passed end database. issue me, might not you.
string_list = ['nat', 'nan', 'nan', 'none'] def remove_wrong_nulls(x): r in range(len(x)): i,e in enumerate(tuples): j,k in enumerate(e): if k == x[r]: temp=list(tuples[i]) temp[j]=none tuples[i]=tuple(temp) remove_wrong_nulls(string_list)
create connection database
cnxn=ceodbc.connect('driver={someodbcdriver};dbcname=xxxxxxxxxxx;uid=xxxxxxx;pwd=xxxxxxx;quietmode=yes;', autocommit=false) cursor = cnxn.cursor()
define function turn list of tuples new_list
further indexing on list of tuples, chunks of 1000. necessary me pass data database sql query not exceed 1mb.
def chunks(l, n): n = max(1, n) return [l[i:i + n] in range(0, len(l), n)] new_list = chunks(tuples, 1000)
define query.
query = """insert xxxxxxxxxxxx("xxxxxxxxxx", "xxxxxxxxx", "xxxxxxxxxxx") values(?,?,?)"""
run through the new_list
containing list of tuples in groups of 1000 , perform executemany
. follow committing , closing connection , that's :)
for in range(len(new_list)): cursor.executemany(query, new_list[i]) cnxn.commit() cnxn.close()
Comments
Post a Comment