Aggregation by Multiple Grouping Variables (With Custom Circular Mean Function)

1 回表示 (過去 30 日間)
Jon Ericson
Jon Ericson 2012 年 11 月 25 日
I need a simple way to aggregate a massive dataset by multiple grouping variables, while applying a custom function such as a circular mean (e.g. function "circ_mean" from CircStat2012).
Here is an extremely simplified example of how the data is formatted (in either a dataset array or a cell array; I'd love to have a function that will operate on dataset arrays instead of cell arrays, but either solution would be very helpful). In reality there are many more trials. Note that I have 7 grouping variables that I'd like to sort by; sometimes I'd only like to sort by 2 of them, sometimes by all 7. If possible, I'd like to avoid having to convert strings to numerical indices -- it's much easier for me to understand the dataset if the grouping variables remain nominal or as strings. First row is variable names.
--------
grouping variables: subject, sex, cond, error, trialnumber, trialtype, objects
dependent variables: ANGLES1 ANGLES2
--------
subject sex cond error trialnumber trialtype objects ANGLES1 ANGLES2
1 m 1 0 1 control clock_chair 90 64
1 m 1 0 2 probe well_sink 25 32
2 f 1 0 1 control clock_chair 60 83
2 f 1 1 2 probe well_sink 12 92
3 m 2 0 1 control clock_chair 59 87
3 m 2 0 2 probe well_sink 32 23
4 f 2 0 1 control clock_chair 50 92
4 f 2 0 2 probe well_sink 23 54
-----------
An example:
I'd like to get the circ_mean(ANGLES1) for each subject in condition 1, but only if error = 1, trialtype = control, and sex = m. I need to aggregate this data in all kinds of complicated ways to get circular means of angular data, so the function needs to be flexible, and should be able to handle non-numerical grouping variables.
------------
Another example (you can just run this at the prompt):
load('hospital')
figure()
boxplot(hospital.Weight,{hospital.Sex,hospital.Smoker})
% now I'd like to get the mean (actually, I'd like to use circ_mean from CircStat2012 package) for each of these 4 subgroups, but it doesn't work. Apparently boxplot will accept mutliple grouping variables but a function like mean won't?
mean(hospital.Weight,{hospital.Sex,hospital.Smoker})
---------
Any suggestions?
Thanks!

回答 (2 件)

Image Analyst
Image Analyst 2012 年 11 月 25 日
I think you'd have to apply your function first, then sort or classify (not sure which you want). So can you apply your circ_mean() function to whatever column(s) you want then either sort the columns, or do something like K nearest neighbors, k-means, fuzzy c-means, or whatever, to classify them into groups?

Tom Lane
Tom Lane 2012 年 11 月 28 日
For your simpler example with the hospital dataset, you can do this:
grpstats(hospital,{'Sex' 'Smoker'},'mean','DataVar','Weight')
I'm not sure if this does what you want for your other example, but maybe it's a starting point.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by