Divide a data set into 4 parts so that the sum of each part 1/4th of the total
7 ビュー (過去 30 日間)
古いコメントを表示
I want to divide a data set into four groups such that the sum of elements of each group is approximately same.
for eg: [10, 5, 1, 20, 5, 22, 4, 15]
For the above data set: sum of all the elements = 82
So, I want this data set to be divided into 4 groups such that, the sum of elements of each group is almost same.
One such possibility is
Set 1: 10, 5, 4,1
Set 2: 20
Set 3: 22
Set 4: 15,5
How do I set up this?
0 件のコメント
採用された回答
Image Analyst
2019 年 6 月 16 日
編集済み: Image Analyst
2019 年 6 月 16 日
I'd just sort them and then take the CDF and look for percentages:
c = cumsum(sort(data, 'ascend'));
c = c / c(end); % Normalize from 0 to 1
c25 = find(c>0.25, 1, 'first');
c50 = find(c>0.5, 1, 'first');
c75 = find(c>0.75, 1, 'first');
At least that's one way that might work, though it would work best for lots of data rather than just a few elements like you have.
4 件のコメント
Image Analyst
2019 年 6 月 16 日
Try this:
data = [10, 5, 1, 20, 5, 22, 4, 15]
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));
You get
group1 =
1 4 5 5
group2 =
10 15
group3 =
20
group4 =
22
The sum of group 1 is 15 = 18.29268%
The sum of group 2 is 25 = 30.48780%
The sum of group 3 is 20 = 24.39024%
The sum of group 4 is 22 = 26.82927%
but for a much larger set, it's better:
numElements = 100000;
maxValue = 99;
data = randi(maxValue, 1, numElements);
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));
The sum of group 1 is 1250676 = 24.99972%
The sum of group 2 is 1250679 = 24.99978%
The sum of group 3 is 1250651 = 24.99922%
The sum of group 4 is 1250755 = 25.00129%
If the accuracy of the CDF method is not accurate enough for your small groups then I think the one approach you might take is to just take every single permutation and check which had the average absolute deviation closest to 25%. I don't have code for that and probably won't write any. I'm assuming you just gave a very small set of data just for a simple example and that your actual data is much larger. Good luck.
その他の回答 (0 件)
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!