python - Want to plot Pandas Dataframe as Multiple Histograms with log10 scale x-axis -
i have floating point data in pandas dataframe. each column represents variable (they have string names) , each row set of values (the rows have integer names not important).
>>> print data 0 kppawr23 kppaspyd 1 3.312387 13.266040 2 2.775202 0.100000 3 100.000000 100.000000 4 100.000000 39.437420 5 17.017150 33.019040 ... i want plot histogram each column. best result have achieved hist method of dataframe:
data.hist(bins=20) but want x-axis of each histogram on log10 scale. , bins on log10 scale too, easy enough bins=np.logspace(-2,2,20).
a workaround might log10 transform data before plotting, approaches have tried,
data.apply(math.log10) and
data.apply(lambda x: math.log10(x)) give me floating point error.
"cannot convert series {0}".format(str(converter))) typeerror: ("cannot convert series <type 'float'>", u'occurred @ index kppawr23')
you use
ax.set_xscale('log') data.hist() returns array of axes. you'll need call ax.set_xscale('log') each axes, ax make each of logarithmically scaled.
for example,
import numpy np import pandas pd import matplotlib.pyplot plt np.random.seed(2015) n = 100 arr = np.random.random((n,2)) * np.logspace(-2,2,n)[:, np.newaxis] data = pd.dataframe(arr, columns=['kppawr23', 'kppaspyd']) bins = np.logspace(-2,2,20) axs = data.hist(bins=bins) ax in axs.ravel(): ax.set_xscale('log') plt.gcf().tight_layout() plt.show() yields

by way, take log of every value in dataframe, data, use
logdata = np.log10(data) because numpy ufuncs (such np.log10) can applied pandas dataframes because operate elementwise on values in dataframe.
data.apply(math.log10) did not work because apply tries pass entire column (a series) of values math.log10. math.log10 expects scalar value only.
data.apply(lambda x: math.log10(x)) fails same reason data.apply(math.log10) does. moreover, if data.apply(func) , data.apply(lambda x: func(x)) both viable options, first should preferred since lambda function make call tad slower.
you use data.apply(np.log10), again since numpy ufunc np.log10 can applied series, there no reason bother doing when np.log10(data) works.
you use data.applymap(math.log10) since applymap calls math.log10 on each value in data one-at-a-time. far slower calling equivalent numpy function, np.log10 on entire dataframe. still, worth knowing applymap in case need call custom function not ufunc.
Comments
Post a Comment