Finding discrepancies between two spreadsheets in MATLAB
3 ビュー (過去 30 日間)
古いコメントを表示
Good evening! I am currently trying to work on writing a script that will allow me to find discrepancies between two spreadsheets (.xls, .csv, .xlsm).
Currently, the two spreadsheets have thousands of rows but a set number of columns. In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference." Is there a way to import the two spreadsheets and have MATLAB re-order the tables based on this similarity?
My current code reads the XLS and CVS files into two separate variables data1 and data2. One of the issues I notice is that I don't know how to declare the first row of each variable as the variable name. I'm not exactly sure if I'm even on the right path at this point so I'm very eager to listen to suggestions. I'm very new to this coding thing, aha!
This is my current code:
clc;
clear;
close all;
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the RTC Export:');
filename = fullfile(folder, baseName);
% Reads data in from selected file and loads them into num (only numeric),
% txt (only string) and raw (all data)
[num, txt, raw] = xlsread(filename);
data = [raw(:,1) raw(:,2) raw(:,5) raw(:,11) raw(:,12)];
tabledata = cell2table(data);
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the JIRA Export:');
filename = fullfile(folder, baseName);
[num, txt, raw] = xlsread(filename);
data2 = [raw(:,3) raw(:,5) raw(:,2) raw(:,1)];
tabledata2 = cell2table(data2);
9 件のコメント
採用された回答
Adam Danz
2020 年 1 月 22 日
編集済み: Adam Danz
2020 年 1 月 22 日
This uses Walter's innerjoin suggestion and then sorts the combined table back into the IssueKey order. If you'd rather use the IssueID order, remove the last two lines.
% Read data in as tables
folder = 'C:\......';
baseName1 = 'testtable1.xlsx';
baseName2 = 'testtable2.xlsx';
filename1 = fullfile(folder, baseName1);
filename2 = fullfile(folder, baseName2);
T1 = readtable(filename1); % you'll get a warning that column names were modified
T2 = readtable(filename2); % you'll get a warning that column names were modified
% Look at the tables
head(T1)
head(T2)
% Change CrossReference to IssueID
T2.Properties.VariableNames(ismember(T2.Properties.VariableNames,'CrossReference')) = {'IssueID'};
% Join tables
T = innerjoin(T1,T2);
% or
% T = outerjoin(T1,T2);
% Resort new table based on original order in T1
% assumes that all rows of T1 are in T.
[~, sortIdx] = ismember(T1.IssueKey,T.IssueKey);
T = T(sortIdx,:);
Result
T =
15×6 table
IssueKey Summary IssueID Status DueDate OrderNo_
________ ___________________________ __________ __________ _____________ ________
{'B-1' } {'Bike chain broken.' } 6.5413e+05 {'Open' } {'1/22/2020'} 5492
{'B-2' } {'Bike light broken.' } 3.1622e+05 {'Open' } {'1/22/2020'} 8135
{'B-3' } {'Bike seat torn.' } 8.4162e+05 {'Open' } {'1/22/2020'} 4613
{'B-4' } {'Bike in good condition.'} 9.5322e+05 {'Closed'} {'Closed' } 1256
{'B-5' } {'Bike in good condition.'} 1.2319e+05 {'Closed'} {'Closed' } 1617
{'B-6' } {'Bike in good condition.'} 7.8551e+05 {'Closed'} {'Closed' } 6772
{'B-7' } {'Bike chain broken.' } 3.1549e+05 {'Open' } {'1/29/2020'} 4381
{'B-8' } {'Rusty bike chain.' } 9.6345e+05 {'Open' } {'1/22/2020'} 1269
{'B-9' } {'Handlebar bent.' } 4.5642e+05 {'Open' } {'1/21/2020'} 3028
{'B-10'} {'Bike seat torn.' } 2.3312e+05 {'Open' } {'1/22/2020'} 8684
{'B-11'} {'Bike chain broken.' } 1.7899e+05 {'Open' } {'1/24/2020'} 3432
{'B-12'} {'Bike in good condition.'} 5.1087e+05 {'Closed'} {'Closed' } 1251
{'B-13'} {'Bike in good condition.'} 5.4675e+05 {'Closed'} {'Closed' } 8155
{'B-14'} {'Bike chain broken.' } 1.012e+05 {'Open' } {'1/20/2020'} 9493
{'B-15'} {'Bike seat torn.' } 3.0126e+05 {'Open' } {'1/25/2020'} 5810
4 件のコメント
Adam Danz
2020 年 1 月 22 日
"For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these?"
You could simulate that to see what would happen.
T1.Summary{12} = '';
If you use outerjoin you'll see an empty indicator for that value. If you use innerjoin you'll see that the entire row is missing.
I'm not sure I undertand your second question. If there was no way to pair the rows of each table, then each row would be considered a unique row.
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Tables についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!