python - scitkit-learn query data dimension must match training data dimension -

March 15, 2013

i'm trying use code scikit learn site:

http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

i'm using own data. problem is, have lot more 2 features. if want "expand" features 2 3 or 4....

i'm getting:

"query data dimension must match training data dimension"

def machine(): open("test.txt",'r') csvr:      reader= csv.reader(csvr,delimiter='\t')      i,row in enumerate(reader):          if i==0:             pass         elif '' in row[2:]:             pass         else:             liste.append(map(float,row[2:]))  = np.array(liste) h = .02  names = ["nearest neighbors", "linear svm", "rbf svm", "decision tree",          "random forest", "adaboost", "naive bayes", "lda", "qda"] classifiers = [     kneighborsclassifier(1),     svc(kernel="linear", c=0.025),     svc(gamma=2, c=1),     decisiontreeclassifier(max_depth=5),     randomforestclassifier(max_depth=5, n_estimators=10, max_features=1),     adaboostclassifier(),     gaussiannb(),     lda(),     qda()]    x = a[:,:3] y = np.ravel(a[:,13])  linearly_separable = (x, y) datasets =[linearly_separable] figure = plt.figure(figsize=(27, 9)) = 1  ds in datasets:     x, y = ds      x = standardscaler().fit_transform(x)     x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.4)      x_min, x_max = x[:, 0].min() - .5, x[:, 0].max() + .5     y_min, y_max = x[:, 1].min() - .5, x[:, 1].max() + .5     xx, yy = np.meshgrid(np.arange(x_min, x_max, h),                          np.arange(y_min, y_max, h))      cm = plt.cm.rdbu     cm_bright = listedcolormap(['#ff0000', '#0000ff'])     ax = plt.subplot(len(datasets), len(classifiers) + 1, i)      ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=cm_bright)      ax.scatter(x_test[:, 0], x_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6)     ax.set_xlim(xx.min(), xx.max())     ax.set_ylim(yy.min(), yy.max())     ax.set_xticks(())     ax.set_yticks(())     += 1      name, clf in zip(names, classifiers):         ax = plt.subplot(len(datasets), len(classifiers) + 1, i)         print clf.fit(x_train, y_train)         score = clf.score(x_test, y_test)         print y.shape, x.shape         if hasattr(clf, "decision_function"):             z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])             print z         else:             z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]           z = z.reshape(xx.shape)          ax.contourf(xx, yy, z, cmap=cm, alpha=.8)         ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=cm_bright)          ax.scatter(x_test[:, 0], x_test[:, 1], c=y_test, cmap=cm_bright,                    alpha=0.6)          ax.set_xlim(xx.min(), xx.max())         ax.set_ylim(yy.min(), yy.max())         ax.set_xticks(())         ax.set_yticks(())         ax.set_title(name)         ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),                 size=15, horizontalalignment='right')         += 1  figure.subplots_adjust(left=.02, right=.98) plt.show()

in case use 3 features. doing wrong in code, x_train , x_test data? 2 features, ok.

my x value:

(array([[ 1.,  1.,  0.],    [ 1.,  0.,  0.],    [ 1.,  0.,  0.],    [ 1.,  0.,  0.],    [ 1.,  1.,  0.],    [ 1.,  0.,  0.],    [ 1.,  0.,  0.],    [ 3.,  3.,  0.],    [ 1.,  1.,  0.],    [ 1.,  1.,  0.],    [ 0.,  0.,  0.],    [ 0.,  0.,  0.],    [ 0.,  0.,  0.],    [ 0.,  0.,  0.],    [ 0.,  0.,  0.],    [ 0.,  0.,  0.],    [ 4.,  4.,  2.],    [ 0.,  0.,  0.],    [ 6.,  3.,  0.],    [ 5.,  3.,  2.],    [ 2.,  2.,  0.],    [ 4.,  4.,  2.],    [ 2.,  1.,  0.],    [ 2.,  2.,  0.]]), array([ 1.,  1.,  1.,  1.,  0.,  1.,  1.,  0.,  1.,  1.,  0.,  1.,  1.,     1.,  1.,  1.,  0.,  1.,  1.,  0.,  1.,  0.,  1.,  1.]))

the first array x array , second array y(target) array.

i'm sorry bad format = error:

        traceback (most recent call last):  file "allm.py", line 144, in <module> mainplot(nameplot,1,2) file "allm.py", line 117, in mainplot  z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]  file "/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/classification.py", line 191, in predict_proba neigh_dist, neigh_ind = self.kneighbors(x)  file "/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.py", line 332, in kneighbors return_distance=return_distance)  file "binary_tree.pxi", line 1298, in sklearn.neighbors.kd_tree.binarytree.query (sklearn/neighbors/kd_tree.c:10433)  valueerror: query data dimension must match training data dimension

and x array without putting him dataset "ds".

[[ 1.  1.  0.][ 1.  0.  0.][ 1.  0.  0.][ 1.  0.  0.][ 1.  1.  0.][ 1.  0.  0.][ 1.  0.  0.][ 3.  3.  0.][ 1.  1.  0.][ 1.  1.  0.][ 0.  0.  0.][ 0.  0.  0.][ 0.  0.  0.][ 0.  0.  0.][ 0.  0.  0.][ 0.  0.  0.][ 4.  4.  2.][ 0.  0.  0.][ 6.  3.  0.][ 5.  3.  2.][ 2.  2.  0.][ 4.  4.  2.][ 2.  1.  0.][ 2.  2.  0.]]

this happening because clf.predict_proba() requires array each row has same number of elements rows in training data -- in other words input shape (num_rows, 3).

when working two-dimensional exemplars worked because result of np.c_[xx.ravel(), yy.ravel()] array two-element rows:

print np.c_[xx.ravel(), yy.ravel()].shape (45738, 2)

these exemplars have 2 elements because they're created np.meshgrid sample code uses create set of inputs cover two-dimensional space plot nicely. try passing array three-item rows clf.predict_proba , things should work fine.

if want reproduce specific piece of sample code, you'll have create 3d meshgrid, described in this question on so. you'll have plot results in 3d, mplot3d serve starting point, though based on (admittedly brief) gave plotting in sample code, suspect may more trouble it's worth. i'm not sure how 3d analog of plots look.

Search This Blog

Ruby Code

python - scitkit-learn query data dimension must match training data dimension -

Comments

Post a Comment

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

command line - Use qwinsta in PowerShell ISE -

java - Show Soft Keyboard when EditText Appears -