MATLAB Answers

0

Combine datasets of unequal lengths by matching dates

niki diogou さんによって質問されました 2015 年 11 月 19 日
最新アクティビティ niki diogou さんによって 回答されました 2016 年 1 月 11 日
My question has 3 sub-questions: I have 10 matrices (10 different variables: V1 V2, V3…, V10) that each one has 2 or 3 columns and a few hundreds of rows (different size each one). The first column at every matrix is the date (serial date number), the second (when there) is the depth and the third is a measured value of a variable (temperature, chlorophyll, etc).
1. I want to create a new matrix that will have all variables combined based on first column (date), and the rest of the columns will be the values of the variables that match these dates. In fewer words, I need to concatenate all matrices by matching up the dates from their first column.
2. I need to do the same as in (1) but not using all the dates recorded. Instead use just the dates of a certain variable (for example 'V1') as the matching point and line up the variables according to the dates of ‘V1’.
3. Lastly, I need to combine the data from the variables that have 3 columns when the elements of the first two elements in each row are the same between the different variables. So, the desired matrix here will have the date and depth as the first 2 columns and then the rest of the columns will be the measurements of the different variables at that day and that depth. So I have to concatenate the matrices by matching the elements of the first 2 columns.
I have tried a few things for each case but nothing seemed to work as desired.
I would appreciate your input to any or all of my sub-questions.
Warm regards, Niki

  0 件のコメント

サインイン to comment.

タグ

3 件の回答

Kirby Fears
回答者: Kirby Fears
2015 年 12 月 17 日
編集済み: Kirby Fears
2015 年 12 月 17 日
 採用された回答

niki,
You can do all of this quite easily if you first put your data into tables. Check out array2table documentation to convert your data into tables. The examples are pretty good.
Then you can use the various join operations (join, innerjoin, outerjoin) to obtain the merged tables you're looking for. You can find links to these functions from the tables link above. Pay close attention to the Name,Value pairs you can input to these functions. They will determine how joins are performed.
Hope this helps.

  3 件のコメント

Chris Turnes
2015 年 12 月 18 日
This is true, and a cleaner approach than what I had suggested, but it will only work if at least one of the variables contains the entire set of serial dates that appear across all of the variables. "join" can perform one-to-one or many-to-one joining between tables. If, for example, A has a serial date number 0 and B doesn't, and B has a serial date number 2 and A doesn't, then join will error:
>> T1 = array2table(eye(2))
T1 =
Var1 Var2
____ ____
1 0
0 1
>> T2 = array2table(1+eye(2))
T2 =
Var1 Var2
____ ____
2 1
1 2
>> join(T1, T2, 'Keys', 1)
Error using table/join (line 130)
The key variable for B must contain all values in the key variable for A.
>> join(T2, T1, 'Keys', 1)
Error using table/join (line 130)
The key variable for B must contain all values in the key variable for A.
Kirby Fears
2015 年 12 月 21 日
Tables have three join methods: join, innerjoin, and outerjoin join. Everything niki is trying to do can be done with one of these methods (along with method settings).
In your example, innerjoin would keep only serial date numbers in both tables (excluding 0 and 2) and join accordingly. On the other hand, outerjoin would include both 0 and 2.
Chris Turnes
2015 年 12 月 25 日
Ah! Good point. I forgot that outerjoin and innerjoin existed. That is a much cleaner solution.

サインイン to comment.


Chris Turnes
回答者: Chris Turnes
2015 年 12 月 17 日

There's a little bit of ambiguity here, but this shouldn't be too hard. What happens when there is no corresponding date from V1 in V2? Do the variables from V2 just become NaN?
Assuming that's the case, I think you should look into the unique and ismember functions. Here's a small example of how to use them to do the tasks you want to accomplish:
rng('default')
% Some fake data -- first column is the date index
V1 = [(1:100)', randi(10, 100, 2)];
V2 = [(1:50)', randi(10, 50, 2)];
V3 = [(1:73)', randi(10, 73, 1)];
% Task 1: Join the data by all dates
% Find the set of unique serial dates
uniqueDates = unique([V1(:,1); V2(:,1); V3(:,1)]);
Vjoined = nan(length(uniqueDates), 6);
Vjoined(:, 1) = uniqueDates;
% Repeat this process for each array
[~, idx] = ismember(V1(:,1), uniqueDates);
Vjoined(idx, 2:3) = V1(idx, 2:3);
[~, idx] = ismember(V2(:,1), uniqueDates);
Vjoined(idx, 4:5) = V2(idx, 2:3);
[~, idx] = ismember(V3(:,1), uniqueDates);
Vjoined(idx, 6) = V3(idx, 2);
This gives you your method for 1. For 2., it's pretty much the same, except you don't have to compute all the unique values, you just use the first column of whichever variable you want to match:
% Task 2: Join by the date in V1
Vjoined = V1;
% Repeat this process for V2, V3
Vjoined = nan(size(V1, 1), 6);
Vjoined(:, 1:3) = V1;
[~, idx] = ismember(V2(:,1), Vjoined(:, 1));
Vjoined(idx, 4:5) = V2(idx, 2:3);
[~, idx] = ismember(V3(:,1), Vjoined(:, 1));
Vjoined(idx, 6) = V3(idx, 2);
For the last task, you can the 'rows' option of intersect to figure out the sets of first two columns shared by data with 3 columns. Then, you can use the 'rows' option of ismember (together with some careful use of the output) to join them:
% Task 3: Join V1 and V2 if the first two variables are the same
C = intersect(V1(:, 1:2), V2(:, 1:2), 'rows');
Vjoined = zeros(size(C, 1), 4);
Vjoined(:, 1:2) = C;
[~, idx] = ismember(V1(:, 1:2), C, 'rows');
Vjoined(nonzeros(idx), 3) = V1(logical(idx), 3);
[~, idx] = ismember(V2(:, 1:2), C, 'rows');
Vjoined(nonzeros(idx), 4) = V2(logical(idx), 3);

  0 件のコメント

サインイン to comment.


回答者: niki diogou 2016 年 1 月 11 日

Thank you all for your responses. I found working with tables, and the innerjoin/outerjoin commands very useful! It worked perfectly.
Thank you again all, Niki

  0 件のコメント

サインイン to comment.



Translated by