Matlab generate normal random sample with outliers
6 ビュー (過去 30 日間)
古いコメントを表示
Hi! I need a help for my trouble.
My trouble:
I want
- create random normal sample
- choose random index at 1 to 20 (only one index), e.g. obe element in every sample must be increased in 10-12 times
- find element with this index, and increase in 10-12 times
- after using bootstrap function in matlab evaluate mean and median for sample
- every sample i store in one cell
- steps 1-5 i want to repeat for every cell, and finally get all_y and all_stats....
clear
clc
clf
close all
format long
warning('off','all')
location = 17;
scale = 1;
num_samples = 20;
num_bootstraps = 1;
y = cell(1, num_samples);
stats = cell(1, num_samples);
for i = 1:num_samples
% Generate a random normal sample
sample = normrnd(location, scale, [1, num_samples]);
% Choose a random index
idx = randi([1, num_samples]);
% Increase the element at the chosen index by 10-12 times
sample(idx) = sample(idx) * randi([10, 12]);
y{i} = sample;
% Perform bootstrap resampling to calculate mean and median
bootstrap_means = zeros(num_bootstraps, 1);
bootstrap_medians = zeros(num_bootstraps, 1);
for j = 1:num_bootstraps
% Resample with replacement
resampled_data = randsample(sample, num_samples, true);
% Calculate mean and median for the resampled data
bootstrap_means(j) = mean(resampled_data);
bootstrap_medians(j) = median(resampled_data);
end
stats{i} = [bootstrap_means, bootstrap_medians];
end
% Combine all samples into one array
all_y = cat(1, y{:});
% Combine all statistics (mean and median) into one array
all_stats = cat(1, stats{:});
% Combine the original samples, means, and medians into a single dataset
data_3 = [all_y, all_stats];
This code not work correctly,in some samples not find outliers and some samples contain more than one outlier in code above? How to solve this problem?
Coode of problem solution provided above.
0 件のコメント
採用された回答
Steven Lord
2023 年 11 月 9 日
warning('off','all')
If you want to select a different element each iteration (and have enough elements in your vector where that's possible), generate a list of which element will be replaced before entering your loop using the randperm function and then index into that list to determine which element to replace at each iteration.
n = 10;
r = randperm(20, n);
for k = 1:n
fprintf("At iteration k = %d, replace element %d.\n", k, r(k))
end
If you ask for more elements of the random permutation than are available, MATLAB will throw an error.
v = randperm(20, 21);
その他の回答 (1 件)
dpb
2023 年 11 月 9 日
移動済み: dpb
2023 年 11 月 9 日
The funny thing about random sampling is that it is, well, random.
You're resampling with replacement so it is possible to pick the same sample more than once. The likelihood of that is going to be heavily dependent upon the size of the sample population; a sample size of 20 is not very many so it's almost guaranteed you will have such occur. On the other hand, it's also possible the particular index of the outlier isn't going to be in the resampled index vector.
num_samples=20;
N=10;
x=randi([1,num_samples],N,num_samples); % N sample vectors
r=cell2mat(arrayfun(@(i)randsample(x(i,:),num_samples,true),[1:N].','uni',0)); % resampled with replacement
nnz(arrayfun(@(i)numel(unique(x(i,:))),1:N).'<num_samples) % how many had duplicated indices
They all had duplicated indices; every case had at least one sample left out.OTOH, if there are N total samples returned and at least one isn't chosen, then at least one other one must have been duplicated. There's a 1:N chance it is your outlier for each opportunity to sample; by the time you do it N times, it goes up to where odds are pretty good. I'll let you calculate that... :)
3 件のコメント
dpb
2023 年 11 月 9 日
移動済み: dpb
2023 年 11 月 9 日
You already generated a sample with an extreme value; it's not clear why you are resampling if that was the intent.
With the magnitude of the offset you'r introducing in comparison to the variance, it will be virtually guaranteed that that element will be identified as such by any common test statistic.
It's also possible some other extreme element could fall outside such test statistic and be indicated as such, but the likelihood would be pretty small with those parameters. As the sample variance increases in relation to the introduced bias magnitude, that probability of more than one element being identified as an outlier will increase.
参考
カテゴリ
Help Center および File Exchange で Descriptive Statistics についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!