Problem:
For example, there are 'Name', 'Count' in inventory tables Sample1, Sample2, Sample3. Table Sample1, Sample2, Sample3 share some common variables of 'Names', but with different number of 'Count'. Now we want to combine all three tables into one table T_sum, in which 'Name' is a unique list and in the row of each name have the count for each sample.
Eg. from the many of the following table
Eg. from the many of the following table
Name | Count |
---|---|
Bread | 234 |
Pop | 123 |
... | ... |
to a summary table like the following
Name_all | Sample1-Count | Sample2-Count | ... |
---|---|---|---|
Bread | 234 | 236 | ... |
Pop | 123 | 456 | ... |
... | ... | ... | ... |
Solution #1: use for loop
%% find the list of Names
Name_all = unique([Sample1.Name; Sample2.Name ...]);
%% do a for loop to migrate data from individual tables to a summary table
% preallocate the memory for the summary table
Counts = zeros(size(Name_all, 1),1);
Counts = zeros(size(Name_all, 1),1);
T_sum = table(Name_all, Counts, Counts, ... );
% set table column names
% set table column names
T_sum.Properties.VariableNames(2:end) = ['Sample1-Count', 'Sample2-Count', ...]
% for loop for coping data
% for loop for coping data
for jj = 1: N_samples
for ii = 1:size(Name_all, 1)
% find the index of the Name in Sample{jj}
% find the index of the Name in Sample{jj}
tmpIndex = Sample{jj}.Name == Name_all(ii);
% if there Name_all(ii) is in the list of Sample{jj}.Name, copy Count to T_sum under the column of Samplejj-Count
% if there Name_all(ii) is in the list of Sample{jj}.Name, copy Count to T_sum under the column of Samplejj-Count
if sum(tmpIndex) ==1
T_sum(ii+1,jj) = Sample{jj}.Count(tmpIndex);
end
end
end
This code is good for small set of data (<10 a="" becomes="" br="" comes="" data="" it="" large="" of="" set="" slow.="" to="" very="" when=""> 10>
This code is good for small set of data (<10 a="" becomes="" br="" comes="" data="" it="" large="" of="" set="" slow.="" to="" very="" when=""> 10>
Solution #2: Vectorization (faster)
Same as the codes above for initialization and preallocation. We just need to replace the for-loops to the following codes:
%% use vectorized indexing to replace the for-loops
for jj = 1: N_samples
% use ismember to find the order (locations) for each name in Sample{jj} that appears in Name_all
[tmpIndex, tmploc] = ismember(Name_all, Sample{jj}.Name);
% delete zeros in tmploc
tmploc(~tmploc) = [];
% Copying counts to the Summary table
T_sum(tmp, jj+1) = Sample{jj}.Count(tmploc);
end
%% use vectorized indexing to replace the for-loops
for jj = 1: N_samples
% use ismember to find the order (locations) for each name in Sample{jj} that appears in Name_all
[tmpIndex, tmploc] = ismember(Name_all, Sample{jj}.Name);
% delete zeros in tmploc
tmploc(~tmploc) = [];
% Copying counts to the Summary table
T_sum(tmp, jj+1) = Sample{jj}.Count(tmploc);
end
Now, it is much faster.
Please leave a comment if you have any question.
No comments:
Post a Comment