Divide a data set into 4 parts so that the sum of each part 1/4th of the total

c = cumsum(sort(data, 'ascend'));
c = c / c(end); % Normalize from 0 to 1
c25 = find(c>0.25, 1, 'first');
c50 = find(c>0.5, 1, 'first');
c75 = find(c>0.75, 1, 'first');

At least that's one way that might work, though it would work best for lots of data rather than just a few elements like you have.

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Nagendra Reddy 2019 年 6 月 16 日

編集済み: Nagendra Reddy 2019 年 6 月 16 日

I am really clueless of what 4 sets your code is suggesting. Could you please tell me.

If I am not wrong it is suggesting the following 3 sets

1, 4, 5, 5, 10

15, 20

22

Image Analyst 2019 年 6 月 16 日

MATLAB Online で開く

Try this:

data = [10, 5, 1, 20, 5, 22, 4, 15]
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));

You get

group1 =

1 4 5 5

group2 =

10 15

group3 =

20

group4 =

22

The sum of group 1 is 15 = 18.29268%

The sum of group 2 is 25 = 30.48780%

The sum of group 3 is 20 = 24.39024%

The sum of group 4 is 22 = 26.82927%

but for a much larger set, it's better:

numElements = 100000;
maxValue = 99;
data = randi(maxValue, 1, numElements);
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));

The sum of group 1 is 1250676 = 24.99972%

The sum of group 2 is 1250679 = 24.99978%

The sum of group 3 is 1250651 = 24.99922%

The sum of group 4 is 1250755 = 25.00129%

If the accuracy of the CDF method is not accurate enough for your small groups then I think the one approach you might take is to just take every single permutation and check which had the average absolute deviation closest to 25%. I don't have code for that and probably won't write any. I'm assuming you just gave a very small set of data just for a simple example and that your actual data is much larger. Good luck.

サインインしてコメントする。

Divide a data set into 4 parts so that the sum of each part 1/4th of the total

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Divide a data set into 4 parts so that the sum of each part 1/4th of the total

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示