random select n elements from n arrays
古いコメントを表示
I need random select 50 elements from 50 lists to build a new array, each list includes 100 elements, then apply some post sorting processing on all possible new arrays to find the indexes of all 50 elements for the best array.
The question is there will be
possibility for the combination of the new array. How to realize this work in MATLAB? Can we try Mapreduce?
this problem feels like I have 50 classes and each class have 100 students, and i need find the 'best' team with 50 students from 50 classes. each student has a unique score/contribution for the team, so i scan all possible teams and sort them to find the one i want as well as the student ID in each class.
1 件のコメント
Feng Cheng
2021 年 8 月 4 日
編集済み: Feng Cheng
2021 年 8 月 4 日
採用された回答
その他の回答 (4 件)
Image Analyst
2021 年 8 月 4 日
How do you figure 100^50? You have 50 lists of 100 elements from which you are supposed to select half the elements randomly. So you will have 50 output vectors. You can use randperm()
For one vector, let's call it vec1, you can do
randomIndexes = randperm(length(vec1), 50); % Get 50 random indexes.
% Now extract those into vec1Output:
vec1Output = vec1(randomIndexes);
% Apply "some sorting processing":
vec1Output = sort(vec1Output, 'ascend'); % Or whatever you want to do.
There, just do that also for vec2 through vec50 and you'll have 50 output vectors.
7 件のコメント
Chunru
2021 年 8 月 4 日
randperm generates distinctive indexes. the question seems to say index over different lists can be repeated (so that to have 100^50 possiblities).
Feng Cheng
2021 年 8 月 4 日
Walter Roberson
2021 年 8 月 4 日
Suppose you have a specific list of 50 students, a proposed team. Then how do you find out how much contribution each particular student makes to the team? You cannot possibly have a lookup table that holds information for each possible combination, so there must be some kind of algorithm.
Feng Cheng
2021 年 8 月 4 日
Walter Roberson
2021 年 8 月 4 日
Exactly what is computed in findbestS() is important for solving the problem in a reasonable amount of time.
There are strategies such as Dynamic Programming https://en.wikipedia.org/wiki/Dynamic_programming that can make some kind of searches feasible that would otherwise be intractable.
Image Analyst
2021 年 8 月 4 日
@Chunru, originally, when I answered, it did not say to get the "best" or try multiple sets of random indexes, as the original poster admitted.
Peter O
2021 年 8 月 4 日
@Feng Cheng are you able to give us a little bit more information on what the sorting function needs to achieve and what the base list parameters are? Are they students with scores or are they actually just a big parameter grid?
As @Walter Roberson brings up, this is sounding a lot more like an optimization problem than a sorting problem, and DP looks like a good candidate. If the cost function can't be solved through a dynamic programming approach then something like a stochastic optimizer (genetic algorithms, particle swarm, simulated annealing, differential evolution) might also be suited to the task. In either case the algorithms are solving to find the "best" solution without needing to keep the entire possible solution space in memory.
nele = 100; nlist = 50;
data = randn(nele, nlist); % data in the list
idx = randi([1 nele], 1 ,nlist); % row index for each list
idx = (0:nele:nele*nlist-1) + idx; % linear index
x = data(idx); % data selected
7 件のコメント
Feng Cheng
2021 年 8 月 4 日
Feng Cheng
2021 年 8 月 4 日
Chunru
2021 年 8 月 4 日
The anwser is definetely no. Think something else.
Feng Cheng
2021 年 8 月 4 日
Chunru
2021 年 8 月 4 日
Try to appreciate the big number:
Let's say you have 10^100 x 50 array of double. The total size in memory is 10^100x50*8 =4x10^102.
Let's say you can save the data in Hard disks (not RAM). Assume each hard disk is 4TB = 4x10^12. You will need (4*10^102)/(4*10^12) = 10^90 such hard disks. Assum each hard disks is 100g. The total weight will be 10^89 kg. Just compare the weight of Sun 1.989 × 10^30 kg, all the hard disks can be ~10^69 times heavior than Sun.
Does the whole universe weight more than 10^69 Sun?
Walter Roberson
2021 年 8 月 4 日
Mass of the observable universe is apparently around 10^53 kg.
With a little programming and a for loop you can handle all possible combinations.
If you want to be able to draw repeated elements from the lists (e.g. the new list could include index 10 multiple times), use randi. Otherwise, use randperm.
% Generate some fake lists as a single array, each having 100 entries.
n_lists = 50;
n_entries = 100;
Lists = rand(100,n_lists);
n_pick = 50;
% We know the size of the draw. Preallocate.
NewLists = zeros(n_pick, n_lists);
% Generate the indices to draw at random.
for ix=1:n_lists
Indexes = randperm(size(Lists,1), n_pick);
% or if you want to allow repeats:
% Indexes = randi(size(Lists,1),n_pick,1);
NewLists(1:n_pick, ix) = Lists(Indexes, ix);
end
% Sort as-needed here. Since the lists here are ordered as a single matrix
% you can handle it with a single call. sort called without addiitonal
% arguments will sort individual columns
NewLists_sorted = sort(NewLists)
Walter Roberson
2021 年 8 月 4 日
編集済み: Walter Roberson
2021 年 8 月 4 日
Lists = arrayfun(@(idx) randi(99,1,100), 1:50, 'uniform', 0);
one_random_selection = cellfun(@(L) L(randi(100)), Lists)
sort(one_random_selection)
If you want to find all of the unique sort()'s, that would be a different matter entirely.
7 件のコメント
Feng Cheng
2021 年 8 月 4 日
Lists = mat2cell(randperm(50*100), 1, 100*ones(1,50));
one_random_selection = cellfun(@(L) L(randi(100)), Lists)
sort(one_random_selection)
Feng Cheng
2021 年 8 月 4 日
Walter Roberson
2021 年 8 月 4 日
I can provide you with code that will enumerate every possible team -- I have posted code in the past that can easily be adapted for the purpose.
However..
The Universe is roughly 4.4E+17 seconds old. If you had started your calculation at the time the Universe was born, and there were 4.4E+17 molecules that could be used for computation, and you had managed to complete 4.4E+17 operations every second on each of the molecules, and if there were 4.4E+17 different universes and you were running the same calculation in all of them... and you were to do all of that for 4.4E+17 times as long as the universe has existed...
Then you still would not have finished, by a factor of roughly 1/2 a billion.
So I can provide code, but the Sun is expected to have expanded into a Red Giant and destroyed the Earth before even a tiny fraction of the arrangements could be gone through, even using all the computers in world together.
Walter Roberson
2021 年 8 月 4 日
Sigh. Looks like you still want code that produces all 100^50 possibilities.
I give code at https://www.mathworks.com/matlabcentral/answers/623358-get-a-combination-of-unique-paths-for-given-pair-of-numbers#comment_1082638 that will iterate through all possibilities where in each position the set of values to be considered can be different than in the other positions. The code there does not assume that the elements of the different positions are the same data type; there is some (relatively small) optimization available for the case where they are all the same data type.
The code would have no inherent problem iterating through all 100^50 combinations of something like
StudentNumbers = mat2cell(randperm(5000), 1, 100*ones(50,1));
The code will handle it, given enough time.
... the estimated time requirement being on the order of pi*10^86 years, which is roughly the age of the universe, raised to the 5th power.
Feng Cheng
2021 年 8 月 4 日
Walter Roberson
2021 年 8 月 4 日
You are mistaken. I have provided a link to code that can produce EVERY combination from the arrays, which none of the other people have given you. You run the initialization routine that I provided there, then you run the loop that I provided there. Each time, the next combination is returned; you evaluate the combination according to your scoring system, and if it is the best combination so far, you keep a record of it. You keep looping until the code says there are no more possibilities available; at that point you return the best value.
I can set up 8 nested for-loop to save all combination indexes into multiple files into disk and apply Mapreduce to solve the problem, although it feels stupid.
My code provides a mechanism to have a single for loop that will evaluate every possibility.
But it's hard to extend the workflow for bigger data that is too big to calculation
My code extends to any dataset that in itself takes less than approximately 1/4 of your available memory. 5000 items (50 classes of 100 students) is not a problem for my code, given enough time.
Mapreduce
Using mapreduce implies that you want to evaluate all possibilities. My code evaluates all possibilities given enough time.
If mapreduce is even conceptually in your mental toolbox to solve this problem, then my code should be a serious contender.
For problems like this, there are two possibilities:
- That the results from one combination cannot be used to make predictions about the results from other combinations, so all of the combinations need to be tried; in such a case, my code can handle problems too big for mapreduce given enough time; or
- That the results from one combination can be used to make predictions about the results from other combinations; in such a case, mapreduce is not typically a useful strategy, but techniques such as Dynamic Programming could potentially be very effective
You also have to ask whether you need the best combination, or just a "reasonably good" combination. For example, it would be entirely reasonable to ask "If I put a limit of 10^9 combinations evaluated (to keep the computation time limited), then what techniques should I use to get the best combination I am likely to get?" . And typically mapreduce() is not one of the techniques used to get the best combination within a budget.
カテゴリ
ヘルプ センター および File Exchange で File Operations についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!