Using Matlab to randomly split an Excel Sheet -
i have excel sheet containing 1838 records , need randomly split these records 3 excel sheets. trying use matlab quite new , have managed following code:
[xlsn, xlst, raw] = xlsread('data.xls'); numrows = 1838; randindex = ceil(3*rand(numrows, 1)); raw1 = raw(:,randindex==1); raw2 = raw(:,randindex==2); raw3 = raw(:,randindex==3);
your general procedure read spreadsheet matlab variables, operate on matrices such end 3 thirds , write each third out.
so you've got read covered xlsread, results in 2 matrices xlsnum , xlstxt. suggest using syntax
[~, ~, raw] = xlsread('data.xls'); in xlsread file (you can access typing doc xlsread command window) says 3 output arguments hold numeric cells, text cells , whole lot. because matlab matrix can hold 1 type of value , spreadsheet expected have text or numbers. raw value hold of values in 'cell array' instead, different kind of matlab data type.
so have cell array valled raw. here want 3 things:
- work out how many rows have (i assume each record row) using
sizefunction , specifying appropriate dimension (again check file see how this) create index of random numbers between 1 , 3 inclusive, can use mask
randindex = ceil(3*rand(numrows, 1));apply mask cell array extract records matching each index
raw1 = raw(:,randindex==1); % same other 2 index valueswrite each cell file
xlswrite('output1.xls', raw1);
you have fettle arguments work way want sure check doc functionname page syntax right. main concern indexing correct - matlab indexes row-first whereas spreadsheets tend column-first (e.g. cell a2 column , row 2, matlab matrix element m(1,2) first row , second column of matrix m, i.e. cell b1).
update: split file evenly surprisingly more trouble: because we're using random numbers index it's not guaranteed split evenly. instead can generate vector of random floats , pick out lowest 33% of them make index 1, highest 33 make index 3 , let rest 2.
randvec = rand(numrows, 1); % float between 0 , 1 pct33 = prctile(randvec,100/3); % value of 33rd percentile pct67 = prctile(randvec,200/3); % value of 67th percentile randindex = ones(numrows,1); randindex(randvec>pct33) = 2; randindex(randvec>pct67) = 3; it still won't absolutely - 1838 isn't multiple of 3. can see how many members each group has way
numel(find(randindex==1))
Comments
Post a Comment