Using accumarray to organize large CSV

Question

0 投票

data1.csv

I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.

I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).

I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH

The other two columns, d18O and date are what I need to plot/visualize

Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH

I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data

T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));                                            
data = accumarray(G, T{:,4},[], @(x){T{x,:}});

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

サインインしてアクティビティをフォロー

Answer 1

dpb 2023 年 9 月 22 日

編集済み: dpb 2023 年 9 月 22 日

MATLAB Online で開く

0 投票

data1.csv

tT=readtable("data1.csv");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
tT.Properties.VariableNames(2)={'Site'};    % shorten to be more convenient to use
head(tT)
    d18O     Site    Date     Depth
    _____    ____    _____    _____

     -1.3    1004    44204     20  
     -1.9    1004    44263     20  
     -1.2    1004    44295     20  
     -1.7    1004    44567     20  
     -1.5    1004    44235     20  
     -9.3    1009    44323     50  
    -7.72    1009    44352     50  
    -7.07    1009    44445     50  
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
G = 6×9 table
               Site    Depth    GroupCount    mean_Date    median_Date    std_Date    mean_d18O    median_d18O    std_d18O
               ____    _____    __________    _________    ___________    ________    _________    ___________    ________

    1004_20    1004     20           5          44313         44263        146.03        -1.52         -1.5       0.28636 
    1004_80    1004     80           6          44404         44489        339.01      -8.5533       -8.695       0.30137 
    1009_50    1009     50          11          44441         44412        158.55      -8.0855        -8.22        1.0068 
    1012_40    1012     40           3          44381         44263        235.02        -5.81        -4.38        2.6254 
    1014_20    1014     20           5          43936         43984        142.16       -6.326        -6.22       0.38546 
    1023_20    1023     20           2          44067         44067        237.59        -4.39        -4.39       0.39598 

rowfun

hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');

function h=doit(x,y)

figure

h=scatter(x,y,'filled');

xlabel('Date'),ylabel('d18O')

end

2 件のコメント
なしを表示なしを非表示

Emily 2023 年 9 月 22 日

this produces what I need, thank you!

dpb 2023 年 9 月 22 日

MATLAB Online で開く

data1.csv

NOTA BENE that you can get much more creative with the user function...

tT=readtable("data1.csv");

Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.

tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use

hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');

function h=doit(x,y,s,d)

figure

h=scatter(x,y,'filled');

xlabel('Date'),ylabel('d18O')

legend(compose('Site %3d Depth %3d',s(1),d(1)))

end

サインインしてコメントする。

Answer 2

Voss 2023 年 9 月 21 日

編集済み: Voss 2023 年 9 月 21 日

MATLAB Online で開く

0 投票

data1.csv

"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"

Something like this?

T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary = 6×4 table
    Site number    Depth        d18O             Date     
    ___________    _____    _____________    _____________

       1004         20      { 5×1 double}    { 5×1 double}
       1004         80      { 6×1 double}    { 6×1 double}
       1009         50      {11×1 double}    {11×1 double}
       1012         40      { 3×1 double}    { 3×1 double}
       1014         20      { 5×1 double}    { 5×1 double}
       1023         20      { 2×1 double}    { 2×1 double}
T_summary.Date{1}
ans = 5×1
       44204
       44263
       44295
       44567
       44235
T_summary.d18O{1}
ans = 5×1
   -1.3000
   -1.9000
   -1.2000
   -1.7000
   -1.5000

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Voss 2023 年 9 月 22 日

MATLAB Online で開く

data1.csv

Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:

T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  

サインインしてコメントする。

Using accumarray to organize large CSV

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Using accumarray to organize large CSV

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示 なしを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示