Is it possible to join categorical variables in table according to group variables ?

7 ビュー (過去 30 日間)
David
David 2020 年 8 月 25 日
コメント済み: David 2020 年 8 月 25 日
I have a table (`A`) containing a string (`x`) of IDs and categorical (`y`) data types.
For example:
>> A.x
11×1 string array
"A-00555"
"A-01139"
"B-08811"
"B-00014"
"C-00007"
"C-00007"
"D-00015"
"D-00015"
"E-00048"
"E-00048"
"E-00048"
>> A.y
11×1 categorical array
APPLE
GRAPEFRUIT
COCONUT
APPLE
APPLE
BANANA
APPLE
COCONUT
APPLE
BANANA
KIWI
And I want to generate an array, of the same size as A.x, with a new categorical variable that "joins" all the A.y's of the same A.x(i). I may not be explaining this very well....
In the above example the resulting array would be something like this:
>> A.z
11×1 categorical array
APPLE
GRAPEFRUIT
COCONUT
APPLE
APPLE+BANANA
APPLE+BANANA
APPLE+COCONUT
APPLE+COCONUT
APPLE+BANANA+KIWI
APPLE+BANANA+KIWI
APPLE+BANANA+KIWI
Is there an efficient way to accomplish this? Is there a version of groupsummary—or something similiar—with a method option that is "concatenate categorical variable" according to groupvars?
Other info: The table contains a few million unique IDs. All rows of A are unique. There are 30 categorical variables.
  3 件のコメント
Steven Lord
Steven Lord 2020 年 8 月 25 日
That categorical array used to define A.y likely doesn't have categories like APPLE+BANANA or APPLE+BANANA+KIWI.
Do you need the result to be a categorical array or would the result being a string array be sufficient for your purposes?
David
David 2020 年 8 月 25 日
@Steven Lord: You are correct. A.y does not contain those categories.
A string array as a "between-step" could work. I think I could then covert it to a categorical array...

サインインしてコメントする。

採用された回答

Steven Lord
Steven Lord 2020 年 8 月 25 日
I'd use findgroups. First let's define the data.
x = ["A-00555"; "A-01139"; "B-08811"; "B-00014"; "C-00007"; ...
"C-00007"; "D-00015"; "D-00015"; "E-00048"; "E-00048"; "E-00048"];
y = categorical(["APPLE"; "GRAPEFRUIT"; "COCONUT"; "APPLE"; "APPLE"; ...
"BANANA"; "APPLE"; "COCONUT"; "APPLE"; "BANANA"; "KIWI"]);
Now use findgroups to get the group numbers for each element in x.
g = findgroups(x);
join the elements of y (converted to string) in each group, putting a + between the elements.
s = splitapply(@(x) join(string(x), "+"), y, g);
Let's see the results as a table.
T= table(x, y, g, s(g))
  1 件のコメント
David
David 2020 年 8 月 25 日
This got me 99.99% of the way there. I just modified the last line like this:
T = table(x, y, g, categorical(s(g)));
Thank you!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeTables についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by