Duplicate x,y in table; get min/max of variables to new table

1 回表示 (過去 30 日間)
newbie9
newbie9 2019 年 11 月 2 日
コメント済み: newbie9 2019 年 11 月 2 日
I have a large dataset, small subset attached. There are duplicate x,y values. I need a cleaned table with "var1_max, var1_min, var2_max, var2_min" etc. for every duplicate "x,y"
I'm just not sure where to start setting up the problem. Thanks for any pointers to get going. Using Matlab R2018a, no fancy add-ons/packages
file_in = 'example.csv';
data_in = readtable(file_in);
%%table_out.Properties.VariableNames = {'id', 'x', 'y', 'var1_max', 'var1_min', ...
%% 'var2_max', 'var2_min', 'var3_max', 'var3_min', 'var4_max', 'var4_min', ...
%% 'var5_max', 'var5_min', 'var6_max', 'var6_min', 'var7_max', 'var7_min'};
  3 件のコメント
Cris LaPierre
Cris LaPierre 2019 年 11 月 2 日
Not sure I understand completely. I have the same questions a dpb. It looks like you've highlighted groups, but rows 12-14 only have duplicate x values, not x and y.
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
Rows 6-10 do have values there. What should the min/max be?
And finally, 12-14 have same x but different y. How should this be handled?
newbie9
newbie9 2019 年 11 月 2 日
編集済み: newbie9 2019 年 11 月 2 日
Please disregard highlight on rows 12-14, it is in error.
I think I should add "var1_mid" ... "var7_mid" to handle cases where x,y are unique. Example in row 1:
  • x = 31.74053 | y = 27.18408 | var1_max = NaN | var1_min = NaN | var1_mid = 90 | var2_max = NaN | var2_min = NaN | var2_mid = 54 | ...etc.
Answers to other questions below:
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
  • just the min/max within the group (those that have the same x,y values)
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
  • NaN
Rows 6-10 do have values there. What should the min/max be?
  • x = 31.79403 | y = 27.26837 | var1_max = 79 | var1_min = 68 | var1_mid = NaN | var2_max = 251 | var2_min = 193 | var2_mid = NaN | ...etc.

サインインしてコメントする。

採用された回答

Cris LaPierre
Cris LaPierre 2019 年 11 月 2 日
One potential solution that is simple to implement is to use groupsummary. You can have your data grouped by x and y, and have it return the within-group min, mean, and max for table variables you specify.
However, you prescribe wanting NaN for mid-value if there are min/max values. This won't do that .It will provide values for all 3 statistics. Also note that group summary will only return one row for each group. You can use the join function if you want to merge the two together.
newData = groupsummary(data_in,{'x','y'},{'min','mean','max'},["var1","var2","var3","var4","var5","var6","var7"])
  1 件のコメント
newbie9
newbie9 2019 年 11 月 2 日
This is perfect, thank you. The groupsummary returns a GroupCount, so I can just use that such that if > 1 then mean = NaN

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeTables についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by