combining similarly named variables

1 回表示 (過去 30 日間)
Corey McDowell
Corey McDowell 2022 年 6 月 29 日
編集済み: Vatsal 2023 年 9 月 29 日
in a dataset I have variables that are functionally identical but have slightly different names due to being imported from different machines, one example is:
'chest_abd_pelvis_w_contrast_over_50kg' & 'cap_w_contrast_over_50kg'
When doing group analysis on these it is often better for them to be considered a single variable. I have been able to merge them 1 at a time using a regexp based method shown below
protocols = groupcounts(B,"Protocol");
protocols = sortrows(protocols,"GroupCount","descend")
idx1 = ~cellfun(@isempty,(regexp(protocols.Protocol(:),'(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)')));
B.idx1 = ismember(B.Protocol,protocols.Protocol(idx1));
B.Protocol(B.idx1) = {'CAP w/ contrast over 50 kg'};
B{:,(~cellfun(@isempty,(strfind(B.Properties.VariableNames,'idx'))))} = []
The minor differences in names come in a variety of forms so I do not have much hope for being able to group all of them at once, however several of these have to be repeated several times, an example of this is that for the example above there is also a:
'chest_abd_pelvis_w_contrast_21_to_50kg' & 'cap_w_contrast_21_to_50kg'
I am asking to see if there is a way to merge the over the two over 50s together and the two 21-50s together simulataneously

回答 (1 件)

Vatsal
Vatsal 2023 年 9 月 21 日
編集済み: Vatsal 2023 年 9 月 29 日
I understand that you have variables in the dataset that are functionally identical but have different variable names. Now when doing group analysis, you wanted to group these variables and consider them as a single variable and you also wanted to do the same for a different set of variables simultaneously.
If your task is to merge the two over 50 variables and the two 21-50 variables , and not merge all four of them, then you have two use two different “regexp”, one will merge the two over 50 variables and another “regexp” will merge the two 21-50 variables together.
I am also providing the updated code for the reference:
protocols = groupcounts(B, "Protocol");
protocols = sortrows(protocols, "GroupCount", "descend");
idx_over_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)'));
B.idx_over_50 = ismember(B.Protocol, protocols.Protocol(idx_over_50));
B.Protocol(B.idx_over_50) = {'CAP w/ contrast over 50 kg'};
idx_21_to_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*21.*50|cap.*w.*21.*50)'));
B.idx_21_to_50 = ismember(B.Protocol, protocols.Protocol(idx_21_to_50));
B.Protocol(B.idx_21_to_50) = {'CAP w/ contrast 21 to 50 kg'};
B{:, (~cellfun(@isempty, (strfind(B.Properties.VariableNames, 'idx'))))} = [];
You can also refer to the MATLAB documentation for "regexp" to obtain more information on its usage and syntax. The link is provided below: -

カテゴリ

Help Center および File ExchangeWhos についてさらに検索

タグ

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by