Random sample, I want the 5% of the data per each hour

2 ビュー (過去 30 日間)
Rachele Franceschini
Rachele Franceschini 2021 年 6 月 4 日
編集済み: Scott MacKenzie 2021 年 6 月 4 日
I have a database with 19 columns. One column has date, month, year and hour. I would like to get, per each hour, the 5% of the data. Naturaly, I would like to see all the other data, along with the column of time.
Can you help me?
I saw the comand resample, but at the moment, I am in difficulty.
  2 件のコメント
Scott MacKenzie
Scott MacKenzie 2021 年 6 月 4 日
It would help if you post the data -- or, better yet, a subset of the data -- and any code you have written so far.
Rachele Franceschini
Rachele Franceschini 2021 年 6 月 4 日
Thank you!
I put only 8 columns (to simplify). I tried to: retime, randsample, split the data on basis of time.
But nothing.
Thank you!!!!

サインインしてコメントする。

採用された回答

Scott MacKenzie
Scott MacKenzie 2021 年 6 月 4 日
編集済み: Scott MacKenzie 2021 年 6 月 4 日
There might be a way to simplify this, but I believe the script below achieves what you are after...
% read all the data into a table
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/642145/Cartel1.xlsx');
% build a vector of 0s and 1s --> each 1 occurs where the hour changes
dt = datetime(T{:,3});
hr = hour(dt);
z = diff(hr);
% build a vector of the indices where the time changes
idx = find(z); % indices of 1s in z
idx = [0; idx];
% build a vector of new indices, selecting at random 5% of the rows for each hour
idxNew = [];
for i=2:length(idx)
n = round(0.05 * (idx(i) - idx(i-1)+1));
idxNew = [idxNew, randi([idx(i-1)+1, idx(i)], 1, n)];
end
% create new table with 5% of the rows for each hour
Tnew = T(idxNew,:);
With this script, your data set is now much smaller. See below. That's the general idea, right?
  2 件のコメント
Rachele Franceschini
Rachele Franceschini 2021 年 6 月 4 日
yes, my idea was this, to get a set smaller, because then I will apply other methods (machine learning etc..).
Wonderful!!!!
Thank you very much!!!!
Scott MacKenzie
Scott MacKenzie 2021 年 6 月 4 日
@Rachele Franceschini You're welcome.
BTW, I just fixed a small bug in the answer script. The second index in each range included the first row of the following hour. It's fixed now. Good luck.

サインインしてコメントする。

その他の回答 (1 件)

KSSV
KSSV 2021 年 6 月 4 日
Let A be your data matrix.
[m,n] =size(A) ;
p = round(5/100*m) ;
idx = randsample(m,n) ;
iwant = A(idx,:)

カテゴリ

Help Center および File ExchangeStartup and Shutdown についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by