MATLAB Answers

0

Merging table rows, keep all columns

Marc Elpel さんによって質問されました 2019 年 11 月 13 日 16:41
最新アクティビティ Marc Elpel さんによって コメントされました 2019 年 11 月 14 日 1:01
I'm trying to combine data from multiple tables into one. (data files attached). Seems like a simple join(), or outerjoin(), but every path has run into issues.
Specifically what I want to do:
  1. Add rows from table 2 to table 1.
  2. Keep all rows in both tables (append rows)
  3. Where column names match, use that column
  4. Where columns are new, add column to table width
  5. Keep column names (outer join is renaming based on source table)
  6. Some table values are empty and should combine as empty values in existing and/or new columns as needed.
Tried so far:
  1. Join - Fails do to some empty values
  2. Join w/Replaced nan - fails do to some other key value error
  3. outerjoin() w/multiple configuration options - all failed.
  4. innerjoin90 - does not seem like what I want (throwing out data).
When done combining the attached tables there should be slight more columns than the first table, and rows should be the sum of rows in both tables.
This should be a common issue so assuming I am missing some simple solution...?
Using Matlab 2016b
Marc

  6 件のコメント

Marc Elpel 2019 年 11 月 13 日 17:56
Here is the code which cycles through the files:
thisPath = uigetdir(pwd);
fnames = dir(fullfile(thisPath, '*.csv'));
for n = 1:size(fnames,1)
fname = fnames(n);
sName = fname.name;
if(strfind(sName, 'RESULTS'))
disp(['Processing Result File: ',sName]);
T = ParseThisResultFile(thisPath, sName);
T = fillmissing(T,'constant',0,'DataVariables',@isnumeric);
if(~isempty(TResults))
TResults = join(TResults,T);
%TResults = [TResults;thisResult];
else
TResults = T;
end
end
The function below has been tested to verify it is returning a table with correct columns/data as expected.
function [resultSet] = ParseThisResultFile(thisPath, sName)
% Format name and get CSV file to local parameter
thisFName = fullfile(thisPath, sName);
thisFName = strrep(thisFName,'.CSV','.csv');
T = readtable(thisFName,'Delimiter',',');
resultSet = T;
end
Adam Danz
2019 年 11 月 13 日 18:07
I've read-in your tables and the column names match between both tables. Points 3 and 4 in your question (thanks for the numbering - that makes this easy to discuss) mention column names that do not match. Are there supposed to be column names that do not match?
I should add that upon reading in your table, Matlab had to modify some of the column names to conform to Matlab syntax.
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table.
The original column headers are saved in the VariableDescriptions property.
Set 'PreserveVariableNames' to true to use the original column headers as table variable names.
files = {'RESULTS_SAMP1.CSV', 'RESULTS_SAMP2.CSV'}; %Full paths are always better
T1 = readtable(files{1},'Delimiter',',');
T2 = readtable(files{2},'Delimiter',',');
% Do column names match?
all(ismember(T1.Properties.VariableNames, T2.Properties.VariableNames)) % Yes
all(ismember(T2.Properties.VariableNames, T1.Properties.VariableNames)) % Yes
Marc Elpel 2019 年 11 月 13 日 19:10
Tried fixing names first with 'PreserveVariableNames', but this did not work. "No public property PreserveVariableNames exists for class matlab.io.text.DelimitedTextImportOptions." Lesser issue compared to others.
I randomly selected two files and they were giving me merging errors so I thought those had different columns. Some of my data DOES include differences; we can simulate that by deleting the 3rd column int he first table, and 5th column in the second table. (does not matter which we delete, just making them different). What join command will combine these tables keeping all rows, and adding columns as needed to match the data? In some cases there will be missing columns which should be stuffed with empty cells.

サインイン to comment.

1 件の回答

Adam Danz
回答者: Adam Danz
2019 年 11 月 13 日 20:22
編集済み: Adam Danz
2019 年 11 月 13 日 20:23
 採用された回答

% Read in the data
files = {'RESULTS_SAMP1.CSV', 'RESULTS_SAMP2.CSV'}; %Full paths are always better
T1 = readtable(files{1},'Delimiter',',');
T2 = readtable(files{2},'Delimiter',',');
% Simulate column-mismatch
T1 = removevars(T1,'SpecimenType'); % remove col 3
T2 = removevars(T2,'Test'); % remove col 5
% Vertically concatenate tables
T3 = outerjoin(T1,T2,'MergeKeys', true)

  4 件のコメント

Marc Elpel 2019 年 11 月 13 日 21:54
And another... in the original files they use odd encoding such as shown in this series: 8.58 .868L 5.03 2.20L 5.2, or they may append an X to invalidate a value (not my schema). So some files open with columns as double, others the same column as cells... the merge breaks on these.
If there is a parameter to deal with this let me know. Otherwise, thanks for the help and you answered the original question with the code you provided.
Adam Danz
2019 年 11 月 13 日 23:12
Glad I could help out.
Just so I understand, the problem you're describing isn't with the merging of tables, it's with importing the tables. Is that correct?
Have you tried importing the tables without using the PreserveVariableNames flag?
Could you attach one of the files causing problems?
Marc Elpel 2019 年 11 月 14 日 1:01
The problem is I need to sterilize the data for posting, and as soon as I make any change and save the file it works. There is something hidden in the original CSV files which is corrupting the importing. Unfortunately I cannot upload these files without modification.
I think I tried the PreserveVariableNames flag which was unknown in 2016b. Not using it now.
I'm going to close teh thread - thanks for your help!

サインイン to comment.



Translated by