Using Matlab to randomly split an Excel Sheet -
i have excel sheet containing 1838 records , need randomly split these records 3 excel sheets. trying use matlab quite new , have managed following code:
[xlsn, xlst, raw] = xlsread('data.xls'); numrows = 1838; randindex = ceil(3*rand(numrows, 1)); raw1 = raw(:,randindex==1); raw2 = raw(:,randindex==2); raw3 = raw(:,randindex==3);
your general procedure read spreadsheet matlab variables, operate on matrices such end 3 thirds , write each third out.
so you've got read covered xlsread
, results in 2 matrices xlsnum
, xlstxt
. suggest using syntax
[~, ~, raw] = xlsread('data.xls');
in xlsread
file (you can access typing doc xlsread
command window) says 3 output arguments hold numeric cells, text cells , whole lot. because matlab matrix can hold 1 type of value , spreadsheet expected have text or numbers. raw
value hold of values in 'cell array' instead, different kind of matlab data type.
so have cell array valled raw
. here want 3 things:
- work out how many rows have (i assume each record row) using
size
function , specifying appropriate dimension (again check file see how this) create index of random numbers between 1 , 3 inclusive, can use mask
randindex = ceil(3*rand(numrows, 1));
apply mask cell array extract records matching each index
raw1 = raw(:,randindex==1); % same other 2 index values
write each cell file
xlswrite('output1.xls', raw1);
you have fettle arguments work way want sure check doc functionname
page syntax right. main concern indexing correct - matlab indexes row-first whereas spreadsheets tend column-first (e.g. cell a2 column , row 2, matlab matrix element m(1,2) first row , second column of matrix m, i.e. cell b1).
update: split file evenly surprisingly more trouble: because we're using random numbers index it's not guaranteed split evenly. instead can generate vector of random floats , pick out lowest 33% of them make index 1, highest 33 make index 3 , let rest 2.
randvec = rand(numrows, 1); % float between 0 , 1 pct33 = prctile(randvec,100/3); % value of 33rd percentile pct67 = prctile(randvec,200/3); % value of 67th percentile randindex = ones(numrows,1); randindex(randvec>pct33) = 2; randindex(randvec>pct67) = 3;
it still won't absolutely - 1838 isn't multiple of 3. can see how many members each group has way
numel(find(randindex==1))
Comments
Post a Comment