Weka ,Text Classification on an arff file -
.this basic question .i trying classify text files 20 different classes.
therefore have project structure folder called train,test. in train folder have 20 different folders ,each folder again has many files related particular class.ex:weather, atheism...etc
i have created train.arff file entire train folder.when data visualized through can see 2 attributes . have provided link below:
my doubt how can view various files under these folders , remove stopwords,punctuation,stemmin.how go preprocessing.if links resources available please suggest , provide necessary links
i found videos below quite helpful when first got hands on text classification using weka. might want take look.
- weka tutorial 31: document classification 1 (application)
- weka tutorial 32: document classification 2 (application)
- weka text classification first time & beginner users
you might want use stringtowordvector filter see effect of each word attribute, indeed described in detail in first , last video . within filter settings can give stopwords list , choose in each run use or not. same stemming can change well. documentation , videos understand easily.
Comments
Post a Comment