Using accumarray to organize large CSV
2 ビュー (過去 30 日間)
古いコメントを表示
I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.
I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).
I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH
The other two columns, d18O and date are what I need to plot/visualize
Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH
I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data
T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));
data = accumarray(G, T{:,4},[], @(x){T{x,:}});
0 件のコメント
採用された回答
dpb
2023 年 9 月 22 日
編集済み: dpb
2023 年 9 月 22 日
tT=readtable("data1.csv");
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
head(tT)
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');
function h=doit(x,y)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
end
2 件のコメント
dpb
2023 年 9 月 22 日
NOTA BENE that you can get much more creative with the user function...
tT=readtable("data1.csv");
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');
function h=doit(x,y,s,d)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
legend(compose('Site %3d Depth %3d',s(1),d(1)))
end
その他の回答 (1 件)
Voss
2023 年 9 月 21 日
編集済み: Voss
2023 年 9 月 21 日
"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"
Something like this?
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary.Date{1}
T_summary.d18O{1}
1 件のコメント
Voss
2023 年 9 月 22 日
Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
参考
カテゴリ
Help Center および File Exchange で Data Distribution Plots についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!