How can I separate data into multiple groups?

Hi,
I have a csv with more than 50,000 rows (an extract is provide in the attached csv file).
I need to group the data as highlighted in yellow in the attached file. The numbers in each group are either very close to each other (difference of less than 1) or they are multiple of the smaller number (with tolerance of +/- 0.3).
How can I write the code such that it can name the highlighted group as 1, 2, 3 and so on? For those number that don't belong to a group, 0 will be their default group number.
Thanks for the help in advance.

2 件のコメント

Jan
Jan 2023 年 3 月 3 日
CSV-files are text files. There are no colored elements.
Can you import the file already? Then you could start from "I have a vector" or "matrix".
Jayden Yeo
Jayden Yeo 2023 年 3 月 3 日
Hi,
The colour is only to help to explain my question. The csv file (with more data) that I am working on does not have any coloured elements.

サインインしてコメントする。

回答 (1 件)

Jan
Jan 2023 年 3 月 3 日
編集済み: Jan 2023 年 3 月 3 日

0 投票

data = [2416.015, 127.402, 382.165, 127.425, 127.3387, 127.406, 637.001, 127.405, 2240.913, ...
2257.54, 241.801, 3064.636, 441.559, 220.805, 220.799, 1204.011, 1547.622, 322.37, ...
322.43, 6482.511, 558.603, 279.301, 2234.423, 279.307, 279.31, 279.295, 3901.168, ...
3595.353, 90.315];
m = [true, abs(diff(data)) < 1]; % Distance is small
ini = strfind(m, [0, 1]); % Index where blocks are starting
p = zeros(size(data));
p(ini) = 1;
p = cumsum(p); % Count starts
m(ini) = true;
result = m .* p; % Use m as mask
format long g
disp([data.', result.'])
2416.015 0 127.402 0 382.165 0 127.425 1 127.3387 1 127.406 1 637.001 0 127.405 0 2240.913 0 2257.54 0 241.801 0 3064.636 0 441.559 0 220.805 2 220.799 2 1204.011 0 1547.622 0 322.37 3 322.43 3 6482.511 0 558.603 0 279.301 0 2234.423 0 279.307 4 279.31 4 279.295 4 3901.168 0 3595.353 0 90.315 0

8 件のコメント

Jayden Yeo
Jayden Yeo 2023 年 3 月 3 日
Hi Jan,
Your output does not include the multiple of the smaller number. The output that I am looking for is in the 2nd column of the file attached in this comment. Thanks.
Jan
Jan 2023 年 3 月 3 日
編集済み: Jan 2023 年 3 月 3 日
You are right, I did not read the question carefully enough. This is trickier and needs a loop. I'm coming back later.
Jan
Jan 2023 年 3 月 3 日
Oh, this is really complicated. Imagine the sequence: [5, 8, 10, 2, 5]. Do 8,10,2 belong to one group, because 8 and 10 are multiples of 2? Even a sequential processing is hard, because when [8, 10] is examined, they clearly do not build a group. But when a 2 follows, they do. A trailing 1 at the end of the total sequence with magically include many other groups and non-group-members also. Example
[5, 7, 7, 5, 8, 8, 8, 5, 2, 1]
If you process this until the 2, the output looks like:
[0, 1, 1, 0, 2, 2, 2, 0, 0, ?]
and when you reach the 1:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Brrr.
Jayden Yeo
Jayden Yeo 2023 年 3 月 6 日
Hi Jan,
Thanks for your help. Would it be easier if for multiples of the smaller number, you only need to consider the number before and after the smaller number? In your example of
[5 , 7, 7, 5, 8, 8, 8, 5, 2, 1], the ouput will be
[0, 1, 1, 0, 2, 2, 2, 0, 0, 0].
Hope to hear from you soon.
Jan
Jan 2023 年 3 月 6 日
Not really: [16, 16, 14, 14, 16, 16, 8, 4, 2]
Which problem do you want to solve actually?
Jayden Yeo
Jayden Yeo 2023 年 3 月 7 日
Hi Jen,
The one in my data.csv file, and only need to consider multiples for number before and after the smaller number. A quick scan of my full data is not really the same as your examples data. My data does not have such nice numbers where the numbers are multiples of the smaller numbers.
Jan
Jan 2023 年 3 月 7 日
@Jayden Yeo: Yes, I've simplified my example. With the real data considering the tolerances will even increase the complexity.
If the desciption of the process is such tricky already, this is usually a hint, that the view on the problem is to indirect or contains too complicated assumptions. Therefore I ask, which real world problem you want to solve. Maybe there is a simpler solution to define groups.
Jayden Yeo
Jayden Yeo 2023 年 3 月 8 日
@Jan: I have to admit that the problem is tricky, but the data I have is what I have shown as an extract in the csv file. I think your answer above is the best that I have, and thanks a lot for your help. Once the groups are defined, I will remove those groups that are zero, and then plot groups 1, 2, 3...and so on.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeMatrix Indexing についてさらに検索

製品

リリース

R2015b

タグ

質問済み:

2023 年 3 月 3 日

コメント済み:

2023 年 3 月 8 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by