python - Want to plot Pandas Dataframe as Multiple Histograms with log10 scale x-axis -

June 15, 2015

i have floating point data in pandas dataframe. each column represents variable (they have string names) , each row set of values (the rows have integer names not important).

>>> print data 0      kppawr23    kppaspyd 1      3.312387   13.266040 2      2.775202    0.100000 3    100.000000  100.000000 4    100.000000   39.437420 5     17.017150   33.019040 ...

i want plot histogram each column. best result have achieved hist method of dataframe:

data.hist(bins=20)

but want x-axis of each histogram on log10 scale. , bins on log10 scale too, easy enough bins=np.logspace(-2,2,20).

a workaround might log10 transform data before plotting, approaches have tried,

data.apply(math.log10)

and

data.apply(lambda x: math.log10(x))

give me floating point error.

    "cannot convert series {0}".format(str(converter))) typeerror: ("cannot convert series <type 'float'>", u'occurred @ index kppawr23')

you use

ax.set_xscale('log')

data.hist() returns array of axes. you'll need call ax.set_xscale('log') each axes, ax make each of logarithmically scaled.

for example,

import numpy np import pandas pd import matplotlib.pyplot plt np.random.seed(2015)  n = 100 arr = np.random.random((n,2)) * np.logspace(-2,2,n)[:, np.newaxis] data = pd.dataframe(arr, columns=['kppawr23', 'kppaspyd'])  bins = np.logspace(-2,2,20) axs = data.hist(bins=bins) ax in axs.ravel():     ax.set_xscale('log')  plt.gcf().tight_layout() plt.show()

yields

enter image description here

by way, take log of every value in dataframe, data, use

logdata = np.log10(data)

because numpy ufuncs (such np.log10) can applied pandas dataframes because operate elementwise on values in dataframe.

data.apply(math.log10) did not work because apply tries pass entire column (a series) of values math.log10. math.log10 expects scalar value only.

data.apply(lambda x: math.log10(x)) fails same reason data.apply(math.log10) does. moreover, if data.apply(func) , data.apply(lambda x: func(x)) both viable options, first should preferred since lambda function make call tad slower.

you use data.apply(np.log10), again since numpy ufunc np.log10 can applied series, there no reason bother doing when np.log10(data) works.

you use data.applymap(math.log10) since applymap calls math.log10 on each value in data one-at-a-time. far slower calling equivalent numpy function, np.log10 on entire dataframe. still, worth knowing applymap in case need call custom function not ufunc.

Search This Blog

Ruby Code

python - Want to plot Pandas Dataframe as Multiple Histograms with log10 scale x-axis -

Comments

Post a Comment

Popular posts from this blog

java - Incorrect order of records in M-M relationship in hibernate -

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -