Vertically Concatenating Timetables of Different Sizes

Question

John 2023 年 3 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1928145-vertically-concatenating-timetables-of-different-sizes

コメント済み: Jan Kappen 2025 年 1 月 17 日

As the title states, I wish to do just that. More specifically, I need to combine n timetables, each with one row, but varying column counts (Dimension counts). There is a nonzero intersection between each timetable, and so there are data being shared between each. However, some may also have data that other’s do not.

The goal here is to vertically concatenate each timetable so that all the data within the intersection are aligned by column/dimension name type. If there are data that are missing and so one timetable is smaller than the other, then that dimension/data type will be populated as a 'NaN' within the smaller timetable. In the example I give below, I say there are three timetables of lengths 1x40, 1x36 and 1x30. I wish to get a resulting timetable that is then 3x40.

Method 1:

Obviously doesn’t work as the # of columns are different.

Method 2:

I tried using a solution on another post that does exactly what I want, but with tables instead of timetables. This did not give the correct result as the output is a 3x215, which I don’t understand why. It also combines the dimension labels with “All” which doesn’t make sense to me as I specify “MergeKeys” as “True” which should leave the dimension labels alone according to the documentation. It comes close though in that it does concatenate the datetimes correctly and populates empty data/units with 'NaN's. I tried specifying the “Key” variables to align to, but because some timetables do not contain these variables and the Matlab function outerjoin wasn’t built to create missing variable columns that auto populate with NaNs when it encounters this, this does not work.

Method 3:

I tried using the synchronize function as this is specifically geared towards timetables. I get the exact same resulting timetable as Method 2. The synchronize documentation and "Combine Timetables and Synchronize Their Data" documentation give examples that do the exact same thing (Just that I have more columns and different dimension names) as my example from what I can see, so I am even more confused by why this does not produce the same result that I would like.

So I think I am just not employing Methods 2,3 correctly or I may have to create another for-loop prior to concatenation that creates the new variables in the smaller timetables and populates them with NaNs using indexing of the largest timetable.

%Create 3 timetables of different sizes and random dimension names with nonzero intersections
varTypes = {'string'};
doub = 'double';
str = 'string';
szList = [40,36,30];
ind = {'A','B','C'};
TimeDim = [datetime('2023-01-01 20:14:58'),datetime('2023-02-21 22:13:04'),datetime('2023-03-11 10:12:58')];
for i = 1:length(szList)
    for j = 1:szList(i)
        varTypes{end+1} = doub;
        varTypes{end+1} = str;
    end
    ArrayList.ind{i} = varTypes;
    varTypes = {'string'};
end
clear i j
for i = 1:length(szList)
    ttSaved.ind{i} = timetable('Size',[1 2*szList(i)+1],'VariableTypes',ArrayList.ind{1,i},'RowTimes',TimeDim(i));
    ttSaved.ind{i} = renamevars(ttSaved.ind{i},'Var1','Exp_ID');
    ttSaved.ind{i}.Exp_ID = i;
end
clear i
%Label each dimenion and populate tables with data
for i = 1:length(ttSaved.ind)
    choice = randperm(szList(i));
    for j = 1:szList(i)
        ttSaved.ind{i} = renamevars(ttSaved.ind{i},sprintf('Var%d',2*j),sprintf('Data_%d',choice(j)));
        ttSaved.ind{i} = renamevars(ttSaved.ind{i},sprintf('Var%d',2*j+1),sprintf('Units_%d',choice(j)));
    end
    Names = ttSaved.ind{i}.Properties.VariableNames;
    for k = 1:szList(i)
        ttSaved.ind{i}.(Names{2*k}) = rand*100; %Add data
        ttSaved.ind{i}.(Names{2*k+1}) = char(choice(k)); %Add Units
    end
end
clear i j k
%Method 1
ttAll = [];
for i = 1:length(ttSaved.ind)
    ttAll = [ttAll ; ttSaved.ind{i}];
end
clear i
%Method 2
ttAll = ttSaved.ind{1};
AllMeas = ttSaved.ind{1}.Properties.VariableNames;
for i = 2:length(ttSaved.ind)
    ttAll = outerjoin(ttAll,ttSaved.ind{i},'Keys',AllMeas,'MergeKeys', true);
end
clear i
%Method 3
ttAll2 = ttSaved.ind{1};
for i = 2:length(ttSaved.ind)
    ttAll2 = synchronize(ttAll2,ttSaved.ind{i});
end
clear i

Note: This post is somewhat of a continuation of a previous post, but since I marked the last as answered and this one is different enough and contains the same data structures to my actual data, I needed create a new question.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Stephen23 2025 年 1 月 17 日

編集済み: Stephen23 2025 年 1 月 17 日

MATLAB Online で開く

@Jan Kappen: see the user's question (Method 3) as well as my comment here:

https://www.mathworks.com/matlabcentral/answers/1928145-vertically-concatenating-timetables-of-different-sizes#comment_2660210

SYNCHRONIZE renames variables/columns that occur in more than one input timetable, which means that it does not provide the output required by the OP. We can demonstrate this quite easily using the demo data from my comment:

T = array2timetable(rand(3,7), 'RowTimes',datetime(2023,1:3,1));
C = {...
    T(1,[1,3,5,7]),...
    T(2,[2,4,5,6,7]),...
    T(3,[1,2,3])};
C{:}
ans = 1x4 timetable
       Time         Var1       Var3       Var5       Var7  
    ___________    _______    _______    _______    _______

    01-Jan-2023    0.97547    0.25331    0.24463    0.04756
ans = 1x5 timetable
       Time         Var2       Var4      Var5        Var6        Var7  
    ___________    _______    ______    _______    _________    _______

    01-Feb-2023    0.67175    0.5131    0.49674    0.0037637    0.53642
ans = 1x3 timetable
       Time         Var1     Var2      Var3  
    ___________    ______    _____    _______

    01-Mar-2023    0.9864    0.774    0.51707
T = synchronize(C{:})
T = 3x12 timetable
       Time        Var1_1     Var3_1     Var5_1     Var7_1     Var2_2      Var4     Var5_2       Var6       Var7_2     Var1_3    Var2_3    Var3_3 
    ___________    _______    _______    _______    _______    _______    ______    _______    _________    _______    ______    ______    _______

    01-Jan-2023    0.97547    0.25331    0.24463    0.04756        NaN       NaN        NaN          NaN        NaN       NaN      NaN         NaN
    01-Feb-2023        NaN        NaN        NaN        NaN    0.67175    0.5131    0.49674    0.0037637    0.53642       NaN      NaN         NaN
    01-Mar-2023        NaN        NaN        NaN        NaN        NaN       NaN        NaN          NaN        NaN    0.9864    0.774     0.51707

As you can see, all duplicated variable/column names have been renamed, which is not what the OP requested or wanted. SYNCHRONIZE currently does not have an option to disable this renaming (it has no concept of keys or similar). It should be possible to merge those duplicated variables with postprocessing e.g. in a FOR loop and some PATTERN-based name matching and iteratively creating new table columns/variables (but ultimately this effort just replicates the functionality of OUTERJOIN, which my answer already uses).

Jan Kappen 2025 年 1 月 17 日

I see, didn't catch that requirement and for some reasons also not that synchronize was already mentioned in the first post - apologies.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Stephen23 2023 年 3 月 14 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1928145-vertically-concatenating-timetables-of-different-sizes#answer_1192410

編集済み: Stephen23 2023 年 3 月 15 日

MATLAB Online で開く

"The goal here is to vertically concatenate each timetable so that all the data within the intersection are aligned by column/dimension name type."

First lets create some fake data, here are three tables in a cell array:

T = array2table(rand(3,7));
C = {...
    T(1,[1,3,5,7]),...
    T(2,[2,4,5,6,7]),...
    T(3,[1,2,3])};
C{:}
ans = 1×4 table
     Var1       Var3        Var5       Var7  
    _______    _______    ________    _______

    0.12972    0.41597    0.090679    0.25583
ans = 1×5 table
     Var2       Var4       Var5       Var6       Var7  
    _______    _______    _______    _______    _______

    0.59592    0.97832    0.13188    0.36775    0.82012
ans = 1×3 table
     Var1       Var2       Var3  
    _______    _______    _______

    0.86471    0.33433    0.68387

Now lets concatenate them together using OUTERJOIN. The trick is to ensure that you provide a key (or keys) that uniquely identifies each row (this could probably be a time, row name, etc). Note how the THISROW variable ensures at least one common key, which is required for joining tables:

T = C{1};
T.ThisRow = 1; % uniquely identify 1st row.
T = T(:,[end,1:end-1]); % ensure THISROW is the 1st column, to ensure the correct row order.
for i = 2:numel(C)
    V = C{i};
    V.ThisRow = i; % uniquely identify ith row.
    T = outerjoin(T,V, 'MergeKeys',true);
end
display(T)
T = 3×8 table
    ThisRow     Var1       Var3        Var5       Var7       Var2       Var4       Var6  
    _______    _______    _______    ________    _______    _______    _______    _______

       1       0.12972    0.41597    0.090679    0.25583        NaN        NaN        NaN
       2           NaN        NaN     0.13188    0.82012    0.59592    0.97832    0.36775
       3       0.86471    0.68387         NaN        NaN    0.33433        NaN        NaN

OPTIONAL, if you want a particular variable order:

[~,X] = sort(T.Properties.VariableNames);
T = T(:,X)
T = 3×8 table
    ThisRow     Var1       Var2       Var3       Var4        Var5       Var6       Var7  
    _______    _______    _______    _______    _______    ________    _______    _______

       1       0.12972        NaN    0.41597        NaN    0.090679        NaN    0.25583
       2           NaN    0.59592        NaN    0.97832     0.13188    0.36775    0.82012
       3       0.86471    0.33433    0.68387        NaN         NaN        NaN        NaN

Use REMOVEVARS if you want to get rid of THISROW.

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Stephen23 2023 年 3 月 14 日

MATLAB Online で開く

We can use the same approach using timetables, if the date/time values uniquely identify each row:

T = array2timetable(rand(3,7), 'RowTimes',datetime(2023,1:3,1));
C = {...
    T(1,[1,3,5,7]),...
    T(2,[2,4,5,6,7]),...
    T(3,[1,2,3])};
C{:}
ans = 1×4 timetable
       Time         Var1       Var3        Var5       Var7  
    ___________    _______    _______    ________    _______

    01-Jan-2023    0.26111    0.88354    0.021848    0.59192
ans = 1×5 timetable
       Time         Var2       Var4       Var5       Var6       Var7  
    ___________    _______    _______    _______    _______    _______

    01-Feb-2023    0.92165    0.46065    0.40458    0.55965    0.29382
ans = 1×3 timetable
       Time         Var1      Var2       Var3  
    ___________    ______    _______    _______

    01-Mar-2023    0.4521    0.22003    0.95754
T = C{1};
for i = 2:numel(C)
    V = C{i};
    K = intersect(T.Properties.VariableNames,V.Properties.VariableNames);
    K{end+1} = 'Time';
    T = outerjoin(T,V, 'Keys',K, 'MergeKeys',true);
end
display(T)
T = 3×7 timetable
       Time         Var1       Var3        Var5       Var7       Var2       Var4       Var6  
    ___________    _______    _______    ________    _______    _______    _______    _______

    01-Jan-2023    0.26111    0.88354    0.021848    0.59192        NaN        NaN        NaN
    01-Mar-2023     0.4521    0.95754         NaN        NaN    0.22003        NaN        NaN
    01-Feb-2023        NaN        NaN     0.40458    0.29382    0.92165    0.46065    0.55965

Optional, again just for sorting the rows/columns:

[~,X] = sort(T.Properties.VariableNames);
T = sortrows(T(:,X),'Time')
T = 3×7 timetable
       Time         Var1       Var2       Var3       Var4        Var5       Var6       Var7  
    ___________    _______    _______    _______    _______    ________    _______    _______

    01-Jan-2023    0.26111        NaN    0.88354        NaN    0.021848        NaN    0.59192
    01-Feb-2023        NaN    0.92165        NaN    0.46065     0.40458    0.55965    0.29382
    01-Mar-2023     0.4521    0.22003    0.95754        NaN         NaN        NaN        NaN

John 2023 年 3 月 15 日

編集済み: John 2023 年 3 月 15 日

MATLAB Online で開く

Interesting, so you used the intersection of the variable names to define the key variables for outerjoin but also manually adding in 'Time' since they're timetables and so that column is not considered apart of the variable names list by default. This works for my instance and is impemented like this:

ttAll = ttSaved.ind{1};
for i = 2:length(ttSaved.ind)
    Int = intersect(ttAll.Properties.VariableNames,ttSaved.ind{i}.Properties.VariableNames);
    Int{end+1} = 'Time';
    ttAll = outerjoin(ttAll,ttSaved.ind{i}, 'Keys',Int, 'MergeKeys',true);
end
clear i
disp(ttAll)
Sorted = sortrows(ttAll,'Time')

I was close with my method 2, just missing the intersection part. So why did the outerjoin function modify the variable names after the timetables were joined, while in this case, they were unmodified? Thank you for your help and the knowledge provided for both of my questions @Stephen23

Stephen23 2023 年 3 月 15 日

編集済み: Stephen23 2023 年 3 月 15 日

The main difference is actually that I uniquely specified each row. I realized later that using INTERSECT is superfluous (for tables), because by default OUTERJOIN uses all common variable names as keys. So we can simplify the code (see modified answer).

Not sure if that also works for timetables, because of the need(?) to specify the TIME.

"so why did the outerjoin function modify the variable names after the timetables were joined, while in this case, they were unmodified?"

Because OUTERJOIN keeps all variables/columns which are not keys, and makes no attempt to merge them. By specifying keys as being only the variables/column from the first table, you overrode the default behavior of using all common names as keys (or the equivalent using INTERSECT). Because you explicitly told it NOT to use any new variables/columns as keys OUTERJOIN helpfully renamed them and kept them all for you...

Jan Kappen 2025 年 1 月 17 日

MATLAB Online で開く

Based on your proposals I've created a function that also works with timetables (seems like in timetables the RowTimes are always used as keys?):

%% example data
T1 = timetable(seconds([0 1 2]'), randn(3,2), randn(3,3), VariableNames=["DistinctVar1", "VarToBeMerged"])
T1 = 3x2 timetable
    Time        DistinctVar1                 VarToBeMerged          
    _____    ___________________    ________________________________

    0 sec     1.4536    -0.61781       1.211     0.68434     0.48927
    1 sec     1.1174     0.45113    -0.70855      1.2196     -1.2196
    2 sec    -0.3217     0.91949       1.232    -0.73845     0.62089
T2 = timetable(seconds([2 3 4]'), randn(3,2), randn(3,3), VariableNames=["DistinctVar2", "VarToBeMerged"])
T2 = 3x2 timetable
    Time        DistinctVar2                 VarToBeMerged          
    _____    ___________________    ________________________________

    2 sec     2.0859    -0.18494      1.2009     -1.1552      2.2859
    3 sec    0.85863    -0.58515    -0.47341     0.15331     0.26802
    4 sec    -1.6791      1.0913     -0.8469     -2.3893     0.66512
%% merge
Tcat = table_vertcat(T1, T2)
Tcat = 6x3 timetable
    Time        DistinctVar1                 VarToBeMerged                 DistinctVar2    
    _____    ___________________    ________________________________    ___________________

    0 sec     1.4536    -0.61781       1.211     0.68434     0.48927        NaN         NaN
    1 sec     1.1174     0.45113    -0.70855      1.2196     -1.2196        NaN         NaN
    2 sec        NaN         NaN      1.2009     -1.1552      2.2859     2.0859    -0.18494
    2 sec    -0.3217     0.91949       1.232    -0.73845     0.62089        NaN         NaN
    3 sec        NaN         NaN    -0.47341     0.15331     0.26802    0.85863    -0.58515
    4 sec        NaN         NaN     -0.8469     -2.3893     0.66512    -1.6791      1.0913
%% helper
function Tcat = table_vertcat(T)
    arguments (Repeating)
        T tabular
    end
    mustContainSameClass(T) % can't be used as validation function due to "Repeating" argument
    if isscalar(T)
        Tcat = T{1};
        return
    end
    % todo: check for nested tables, empty tables etc
    if istimetable(T{1}) % due to mustContainSameClass all inputs are either tables or timetables
        handleTimetables = true;
        rowTimeVarNames = cellfun(@(T)string(T.Properties.DimensionNames{1}), T);
        assert(all(rowTimeVarNames(1) == rowTimeVarNames), "All RowTime variable names must be identical")
        T = cellfun(@timetable2table, T, UniformOutput=false);
    else
        handleTimetables = false;
    end
    Tcat = T{1};
    Tcat.UniqueIndexForAllVertcatTables = (1:height(Tcat))';
    for k = 2:length(T)
        T{k}.UniqueIndexForAllVertcatTables = (1:height(T{k}))' + Tcat.UniqueIndexForAllVertcatTables(end);
        % does not work nested tables, need to manually split/un-table them
        Tcat = outerjoin(Tcat, T{k}, MergeKeys=true);
    end
    Tcat.UniqueIndexForAllVertcatTables = [];
    if handleTimetables
        Tcat = table2timetable(Tcat, RowTimes=rowTimeVarNames(1));
    end
end
function mustContainSameClass(in)
    arguments
        in cell
    end
    assert(all(class(in{1}) == cellfun(@(in)string(class(in)), in)), "All inputs must be of same class")
end

サインインしてコメントする。

Vertically Concatenating Timetables of Different Sizes

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Vertically Concatenating Timetables of Different Sizes

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示