How do I select max columns values based on certain column values?

Question

0 投票

data.xls

Dear Experiences ...

i have table look like the following..

Obs-name      var1    var2   var3 ...... varn      seg_id 
ob1           0.12    0.14   0.17         1.2         1
ob2           1.2     0.2    0.14         0.0         1
ob3           1.5     0.3    1.5          7.2         2
ob4           2.4     4.5    2.2          0.0         3
.......................................etc.

i'm doing the following procedures:

1- i eliminate the first var content .. (obs name), using the following

T = readtable('data.xls');
Tnew=T(:,2:end)
2- calculate averages of all vars values based on segment_ID 
Tavg=varfun(@mean,Tnew,'GroupingVariable','Seg_ID')

- now the tavg table includes all the average values for all vars based on Seg_ID.

Now i need to select the top var values (K) for every Seg_ID (K, where k=5).. then write table includes the Seg_ID , GroupCount , top K vars names look like the following table :

   seg_id(1) Group count (3) var1 name   var4 name    var100 name       
   seg_id(2) group count (5) var2 name   var3 name    var15 name       
   seg_id(3) Group count (10)var1 name   var12 name   varn name         
    ...etc..

attached file include a portion of my data table.

i will thanks any one can give a solution for this issue .. thanks

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

ahmed obaid 2017 年 4 月 4 日

MATLAB Online で開く

mean of a particular var of values which belong to same segment ID?

ahmed obaid 2017 年 4 月 4 日

i have updates my codes and find good solution till select top (k) vars and print their names

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Guillaume 2017 年 4 月 4 日

編集済み: Guillaume 2017 年 4 月 4 日

MATLAB Online で開く

1 投票

Not really clear on your question. If you're wanting to calculate the mean of the variables, grouped by seg_ID, then:

varfun(@mean, T, 'InputVariables', 2:width(T)-1, 'GroupingVariables', 'seg_ID')

I'm not sure what you want to do after that

edit: maybe you're looking for this function

function varargout = filterrows(rowvalues, varnames)
%function to be used with rowfun, returns the 5 highest column means of the rows together with their names, interlaced.
%requires 'SeparateInputs', false in the rowfun options.
    rowmeans = mean(rowvalues, 1);
    [~, order] = sort(rowmeans, 'descend');
    varargout = [num2cell(rowmeans(order(1:5))); num2cell(varnames(order(1:5)))];  %will be reshaped into a row vector by rowfun
end

which you can use with rowfun:

rowfun(@(rowvalues) filterrows(rowvalues, T.Properties.VariableNames(2:end-1)), ...
       T, ...
       'InputVariables', 2:width(T)-1, ...
       'GroupingVariables', 'seg_ID', ...
       'SeparateInputs', false, ...
       'NumOutputs', 10)

Not sure it's a good idea, though.

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

ahmed obaid 2017 年 4 月 4 日

MATLAB Online で開く

T.xls

may i'm not explain my problem in a proper way, any way consider the following table T:

----------------------------------------------------------

ok now let us calculate the mean of every var values (A...I) based on Seg number (1..4), as in your first line code.

Tavg=varfun(@mean, T, 'InputVariables', 2:width(T)-1, 'GroupingVariables', 'Seg')

now the mean of var values are calculated based on (Seg) number ...

my requirement as follow: - now the vars includes different values over all (Seg) number .. i need to select for every segment (Seg=1,,4) what are the top (max) k vars name (suppose K=5) , as follow :

-consider Seg=1 vars , the top K-var ( based on mean values) in descending order are ( I,B,H,C and D)
-consider Seg=2 vars the top k-vars (based on mean values ) in descending order are (I,H and B.. remaining vars are 0's values so any 2 vars names no problem) and so on for others...

final table (Tfin) includes the following output : 
seg           top (K) var names 
1               I,B,H,C,D
2               I,H,B,A,C
3..etc.

attached portion of my table in simble way

Guillaume 2017 年 4 月 4 日

MATLAB Online で開く

Isn't the output of my rowfun code (all the bit after edit) exactly what you've described?

If it's just the variable names, without the values that you want, then changing it to

function varargout = filterrows(rowvalues, varnames)
%function to be used with rowfun, returns the 5 highest column means of the rows together with their names, interlaced.
%requires 'SeparateInputs', false in the rowfun options.
    rowmeans = mean(rowvalues, 1);
    [~, order] = sort(rowmeans, 'descend');
    varargout = num2cell(varnames(order(1:5)));
end

and calling it with

rowfun(@(rowvalues) filterrows(rowvalues, T.Properties.VariableNames(2:end-1)), ...
     T, ...
     'InputVariables', 2:width(T)-1, ...
     'GroupingVariables', 'seg_ID', ...
     'SeparateInputs', false, ...
     'NumOutputs', 5)

will give you that.

ahmed obaid 2017 年 4 月 4 日

Thank you a lot .... very very smart solution .... thanks.

サインインしてコメントする。

How do I select max columns values based on certain column values?

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

タグ

Community Treasure Hunt

How do I select max columns values based on certain column values?

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

タグ

参考

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示