Find and reduce a numeric array with identical columns

Question

0 投票

Dear Sir/Madam,

I would like to ask you the following question:

I have a data file like this

tmp = [...
          121           12         6914       0.5625
          122          -48         6853      0.29688
          119           48         6914      0.17188
          125          -12         6853     0.078125
          125            4         6853       0.4375
          119            5         6832      0.20313
          119            4         6832     0.039063
          119           -4         6832     0.023438]

I would like re-group (or reduce) it with following conditions:

For any row, if column 1 AND column 3 of this row is identical with any column 1 AND column 3 of any other row. Then reduce to one new row with new value of column 2, this new value of column 2 is the sum of original values of column 2. Column 1 is kept the same, Column 4 is not important.

So, for above data, I expect to have the answer:

          119            5         6832      0.20313   %  5+4-4=5
          122          -48         6853      0.29688
          125           -8         6853      0.4375    %  -12+4=-8
          121           12         6914      0.5625
          119           48         6914      0.17188   

What Matlab command to use? I would greatly appreciate it if you left your code and running output.

I am using MATLAB R2014a.

Thank you very much

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

Image Analyst 2018 年 12 月 30 日

I was wondering the same thing. Hopefully the order doesn't matter. I'm sure you could write the code afterwards in such a ways that it didn't matter.

John Smith 2018 年 12 月 30 日

編集済み: John Smith 2018 年 12 月 30 日

In tmp (data file ), the order of the rows were randomly inputed by hand, no order at all.

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Stephen23 2018 年 12 月 30 日

0 投票

>> [~,X,Y] = unique(tmp(:,[1,3]),'rows');
>> out = tmp(X,:);
>> out(:,2) = accumarray(Y,tmp(:,2),[],@sum)
out =
    119.000000      5.000000   6832.000000      0.023438
    119.000000     48.000000   6914.000000      0.171880
    121.000000     12.000000   6914.000000      0.562500
    122.000000    -48.000000   6853.000000      0.296880
    125.000000     -8.000000   6853.000000      0.437500

7 件のコメント
5 件の古いコメントを表示 5 件の古いコメントを非表示

John Smith 2018 年 12 月 30 日

編集済み: John Smith 2018 年 12 月 30 日

Thank you very much Stephen,

I need further the output file "out" to be re-rewriten as classified by the value of first column and column 3 of "out":

If the values of the third column are not the same. then separated them: according to the value of third column:

out_new(:,:,1) =

119 5 6832 0.20313

0 0 0 0

out_new(:,:,2) =

122 -48 6853 0.29688

125 -8 6853 0.078125

out_new(:,:,3) =

119 48 6914 0.17188

121 12 6914 0.5625

2 then the output rewritten according to the value of first column and aligned all group according to the value of the first column.

3. The value of column 1 should be decreased order in value.

I expect the output file as the following format:

125 -8 6853 0.078125

122 -48 6853 0.29688

121 12 6914 0.5625

119 5 6832 0.20313 48 6914 0.17188

By the way, my code to obtain out_new is (may be not so good:)

length(out(:,1));

[C, ia, ic] = unique(out(:,3),'sorted');

L = length(C)

for j=1 : L

k=0;

for i=1 : length(out(:,1))

if C(j) == out(i,3)

k=k+1;

out_new(k,1,j) = out(i,1) ;

out_new(k,2,j) = out(i,2) ;

out_new(k,3,j) = out(i,3) ;

out_new(k,4,j) = out(i,4) ;

end

format shortG

out_new

Thank you very much

John Smith 2018 年 12 月 31 日

編集済み: John Smith 2019 年 1 月 1 日

Dear Sir/Madam,

Actually, I also need column 3 to be converted to date, then the requried output format is more meaningful: (column 4 is the costs of options)

out(:,3)

6832

6914

6853

datestr(out(:,3))

ans =

14-Sep-0018

05-Dec-0018

05-Oct-0018

then the requried output format should be:

125 -8 05-Oct-0018 0.078125

122 -48 05-Oct-0018 0.29688

121 12 05-Dec-0018 0.5625

119 5 14-Sep_0018 0.20313 48 05-Dec-0018 0.17188

Now you can see, the horizontal axis is time, the vertical axis (column 1) is strik-price of option.

In this way my option trading positions are:

e. g. for 119 strike price, I have two kinds of options: bought 5 options with expiration date 14-September, and bought 48 options with expirarion date 05-Dec.

for 122 strike price option, I have only one kind option: sold 48 options with expiration date 05-Oct.

That is why this kind of output format in important.

Thank you very much for your help.

John Smith 2019 年 1 月 1 日

編集済み: Image Analyst 2019 年 1 月 1 日

MATLAB Online で開く

Dear Stephen,

Thank you very much for your help, and Happy NEW YEAR 2019. Running this:

tmp = [...
	121           12         6914       0.5625
	122          -48         6853      0.29688
	119           48         6914      0.17188
	125          -12         6853     0.078125
	125            4         6853       0.4375
	119            5         6832      0.20313
	119            4         6832     0.039063
	119           -4         6832     0.023438]
[~,X,Y] = unique(tmp(:,[1,3]),'rows');
out = tmp(X,:);
out(:,2) = accumarray(Y,tmp(:,2),[],@sum)
% Convert to character:
baz = num2cell(out(:,2:4)).';
baz(2,:) = cellstr(datestr(out(:,3)));
baz = regexp(sprintf('%3d %s %g\v',baz{:}),'[^\v]+','match');
% Arrange into matrix:
[U,~,X1] = unique(out(:,1));
[V,~,X3] = unique(out(:,3));
C = accumarray([X1,X3],baz,[],@(s)s);
L = cellfun('length',C(:));
C(L==0) = {''};
C = flipud([num2cell(U),C]).'
% Print:
fmt = sprintf('%%-%ds',2+max(L));
fmt = repmat(fmt,1,numel(V));
fmt = ['%4d',fmt,'\n'];
fprintf(fmt,C{:})

I got this error:

Error using accumarray

Second input VAL must be a full numeric, logical, or char vector or scalar.

Error in main_ZN_Dec_7_2018_trade_4 (line 364)

C = accumarray([X1,X3],baz,[],@(s)s)

Stephen23 2019 年 1 月 1 日

編集済み: Stephen23 2019 年 1 月 1 日

Replace the line with accumarray with these three lines:

S = max([X1,X3]);
C = cell(S);
C(sub2ind(S,X1,X3)) = baz;

and a Happy New Year!

John Smith 2019 年 1 月 2 日

編集済み: John Smith 2019 年 1 月 3 日

Dear Stephen,

By changing those three lines, you code works.

Back to the question in the very beginning, I said " Column 4 is not important". but now, I need to treat column 4 the same way as column 2:

tmp =

121 12 6914 0.5625

122 -48 6853 0.29688

119 48 6914 0.17188

125 -12 6853 0.078125

125 4 6853 0.4375

119 5 6832 0.20313

119 4 6832 0.039063

119 -4 6832 0.023438

out =

119 5 6832 0.265631 (%=0.20313+0.039063+0.023438)

119 48 6914 0.17188

121 12 6914 0.5625

122 -48 6853 0.29688

125 -8 6853 0.515625 (%=0.078125+0.4375)

how do you change your three line code:

[~,X,Y] = unique(tmp(:,[1,3]),'rows');

out = tmp(X,:);

out(:,2) = accumarray(Y,tmp(:,2),[],@sum)

I tried to modify the following way, it did not work:

out(:,2) = accumarray(Y,tmp(:,2), tmp(:,4), [],@sum)

However, when I use two lines (two time) then it worked:

out(:,2) = accumarray(Y,tmp(:,2),[],@sum)

out(:,4) = accumarray(Y,tmp(:,4),[],@sum)

Thank you very much.

サインインしてコメントする。

Answer 2

Image Analyst 2018 年 12 月 30 日

編集済み: Image Analyst 2018 年 12 月 30 日

MATLAB Online で開く

0 投票

What about using grpstats(), if you have the Statistics and Machine Learning Toolbox.

tmp = [...
	121           12         6914       0.5625
	122          -48         6853      0.29688
	119           48         6914      0.17188
	125          -12         6853     0.078125
	125            4         6853       0.4375
	119            5         6832      0.20313
	119            4         6832     0.039063
	119           -4         6832     0.023438]
col5 = 10000*tmp(:, 1) + tmp(:, 3)
tmp = [tmp, col5];
% No sum in grpstats, so have to do it twice. 
% Once to get the mean and once to get the count.
outputMean = grpstats(tmp, tmp(:, 5), 'mean')
outputNumel = grpstats(tmp, tmp(:, 5), 'numel')
% Crop off temporary 5th column
output = outputMean(:, 1:4) % Initialize
% Column 2 is the sum = mean * count
output(:, 2) = outputMean(:, 2) .* outputNumel(:, 2)

The output seems to be sorted by the first column though:

output =

119 5 6832 0.088544

119 48 6914 0.17188

121 12 6914 0.5625

122 -48 6853 0.29688

125 -8 6853 0.25781

That might be a problem for you. I'm not sure. Of course column 4 can be cropped off or ignored since you say it's not important.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Find and reduce a numeric array with identical columns

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

7 件のコメント
5 件の古いコメントを表示 5 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Find and reduce a numeric array with identical columns

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

7 件のコメント 5 件の古いコメントを表示 5 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

7 件のコメント
5 件の古いコメントを表示 5 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示