Strings to variable names
古いコメントを表示
I have a table with 30-ish columns and thousands of rows containing strings and numerical values (small sample attached). I need to plot hundreds of different figures by matching certain criteria and extracting only portion of the data for each plot, based on that criteria. I would like to do this by passing two variables (matching the .Properties.VariableNames) to a function which would then do something based on these two input arguments.
I started reading on converting strings to variable names and found the deprecated function genvarname(), which points to matlab.lang.makeValidName() that takes string as the argument. But the latter gives me the error:
Unrecognized table variable name 'X'.
The way I'm trying to use it is by reading my data:
results = readtable('raw_data.csv','PreserveVariableNames',true);
then specifying which variable names I want to work with:
X = matlab.lang.makeValidName('Sigma');
Y = matlab.lang.makeValidName('LR');
However, trying to pass X and Y as the arguments to the function, then referencing them as follows:
results.X
is throwing me the error.
If I manually edit this call to results.Sigma or results.LR, everything works just fine. But changing these two variables inside of the function would defeat its purpose. I's like to keep X and Y as generalized variable names within the function and only change the two above lines before I call the function.
I'm also reading on dynamically generated variable names and why eval is a big no-no. I tried the alternative methods mentioned there, such as cell/ND-arrays or Structs, etc. but couldn't get that to work either.
Can someone point me to what the most appropriate method to solve this would be, please?
8 件のコメント
The end of per isakson's answer has very important advice that you should not ignore.
Those are both irrelevant to your situation.
The problem is nomenclature.
For reasons beyond my understanding, TMW in their infinite wisdom decided to name the columns of a table as "variables", which is the same name used for arrays stored in the MATLAB workspace. Confusion arises when these two totally different concepts are mixed up, as in your question: GENVARNAME and MAKEVALIDNAME apply to names of workspace variables (and a few other situations), but given that your simple table column/variable names are already perfectly valid table column/variable names, neither of them will do anything useful for you. My tutorial page is totally unrelated to your situation, because it deals only with names of variables in the workspace, not names of table columns/variables (which is what you are quite reasonably trying to access).
The decision by TMW to use "variable" to describe different things made this harder than it needs to be, however instead of misapplying information/paradigms/advice relating to renaming variables in the workspace (i.e. what have been called "variables" since time immemorial) you should apply the simple techniques for accessing columns/variables of tables:
The real solution is given at the end of per isakson's answer.
Stephen23
2021 年 8 月 27 日
Milos Krsmanovic's incorrectly posted "Answer" moved here:
Thank you all for chiming in and offering advice.
I wasn't able to achieve what I want, and reading the answers it might be because I didn't formulate my question properly. Let me provide additional code to try and better explain what I'm trying to do.
% read the tabular data
results = readtable('raw_data.csv','PreserveVariableNames',true);
critMaterial = unique(results.Material); % identify unique values in the Material column
critSigma = unique(results.Sigma); % ditto for Sigma
e = numel(critMaterial); % count how many cells are there in critMaterial
data = cell(1,e); % preallocate data array
% populate data array based on given ad-hoc criteria, in this case...
% I need LR and L
for i = 1:e
data(1,i)=num2cell(mean(results.LR(results.Material==string(critMaterial(1)) & results.Sigma==string(critSigma(1)))));
data(2,i)=num2cell(mean(results.L(results.Material==string(critMaterial(2)) & results.Sigma==string(critMaterial(1)))));
end
Once I have my data array I will plot something based on it, plot it as a table, etc. it doesn't matter.
Now, I would like to keep this for loop inside of a function, so I don't have to copy, paste and edit it hundreds of times.
I am able to pass critMaterial and critSigma arguments to the function to quickly select which two criteria I want to apply. I would like to do the same for results.Properties.VariableNames as well. The reason is, while I have only four columns in my sample csv, in the actual file I have more than 30 of them, all different. In the sample csv file I have three unique materials and two sigmas, in the actual file I have dozens of them.
I understand that assigning variable names on-the-fly is a bad practice. So I'm wondering if my syntax inside of the for loop should change then.
I do not want to change the variable names in the results table as they are been reused so many times. But I would like to pass two arguments X and Y to the function that would do what is identical to:
data(1,i)=num2cell(mean(results.X(results.Material==string(critMaterial(1)) & results.Sigma==string(critSigma(1)))));
data(2,i)=num2cell(mean(results.Y(results.Material==string(critMaterial(2)) & results.Sigma==string(critMaterial(1)))));
where the arguments X and Y would be chosen manually from the results.Properties.VariableNames on per-case basis.
"Now, I would like to keep this for loop inside of a function, so I don't have to copy, paste and edit it hundreds of times."
Copy-and-pasting hundreds of times would be very bad code design.
"I understand that assigning variable names on-the-fly is a bad practice."
Dynamically access the names of variables in the workspace is slow, complex, and very inefficient. As I wrote in my earlier comment, this is totally unrelated to accessing the column/variable names of tables (which is what you have).
For some reason you keep mixing these up.
"So I'm wondering if my syntax inside of the for loop should change then."
Nothing in your explanation requires dynamic variable names.
"where the arguments X and Y would be chosen manually from the results.Properties.VariableNames on per-case basis."
Maybe this is the root of your confusion: if you want to manually select pairs of data to plot/analyze, then of course you will have to specify these manually (by your own definition). But there is no reason why you need to do this by copy-and-pasting code: much simpler would be to create a list of those pairs of data and loop over that list.
The advice at the end of dpb's answer is still highly relevant.
Milos Krsmanovic
2021 年 8 月 29 日
Walter Roberson
2021 年 8 月 29 日
See what Stephen wrote in https://www.mathworks.com/matlabcentral/answers/1438629-strings-to-variable-names#answer_775424
In particular,
Xdata = results.(Xs(k));
That tells MATLAB to take the k'th string in Xs, and use it as the name of the field or variable to look up inside results . So with
Xs = ["Sigma", "Xpair2", "Xpair3", .. "XpairN"];
then with k = 1 it would evaluate to results.Sigma, for k = 2 it would evaluate ot results.Xpair2 and so on.
No variables needed to be created dynamically.
This facility in MATLAB is known as dynamic field name expansion. And it can be used for assignments too.
Milos Krsmanovic
2021 年 8 月 29 日
"What I opted out to do at the end is to use the modified solution from @Wan Ji - I created two new columns X and Y, without deleting/overwriting any of the existing ones. Before calling my function I will repopulate X and Y with whatever two columns I want to work with at that instance."
If you only need the data in those two columns, why do you have to add them back into the table? Surely you could just write your function to simply accept those two columns directly... which is also what you asked in your question "However, trying to pass X and Y as the arguments to the function...", so it is not clear to me why you now want to make this more complex than it needs to be (or even what you asked about).
Please show us the code you are trying now, I am sure that this could be simplified.
Milos Krsmanovic
2021 年 8 月 30 日
採用された回答
その他の回答 (3 件)
per isakson
2021 年 8 月 23 日
編集済み: per isakson
2021 年 8 月 23 日
First a little exercise
%%
results = readtable('raw_data.csv');
results(4,:)
%%
results.Properties.VariableNames{'Sigma'} = 'X';
results.Properties.VariableNames{'LR'} = 'Y';
results(4,:)
Like this you can rename the variables in the table (see also @Wan Ji comment), but AFAIK you cannot create alias, which I think is what you are asking for.
IMO, renaming variables will eventually cause confusion. A better way to achieve "I like to keep X and Y as generalized variable names within the function" is to define functions like
function foo( X, Y )
% do stuff with X and Y
end
and call them like
foo( results.Sigma, results.LR )
2 件のコメント
Milos Krsmanovic
2021 年 8 月 26 日
per isakson
2021 年 8 月 27 日
"I don't want to rename the columns/property names in the main table" Yes and that's why I wrote: "[...] but AFAIK you cannot create alias, which I think is what you are asking for."
Mixing up unrelated topics has made you think that this is much more complex than it really is.
Look at your own code that you wrote in your question:
results.Sigma
results.LR
And then what you wrote after that: "If I manually edit this call to results.Sigma or results.LR, everything works just fine. But changing these two variables inside of the function would defeat its purpose."
So... then don't "change" them inside the function. Simply use strings to select the variables that you want (you specified here that you want to manually select the pairs of data that get plotted/analyzed), just as the MATLAB documentation explains:
Xs = ["Sigma", "Xpair2", "Xpair3", .. "XpairN"];
Ys = ["LR" , "Ypair2", "Ypair3", .. "YpairN"];
for k = 1:numel(Xs)
Xdata = results.(Xs(k));
Ydata = results.(Ys(k));
... whatever processing of Xdata and Ydata
end
So you can easily "manually" select and process any pairs of data that you want.
I see absolutely no reason why you need to copy-and-paste code hundreds of times.
Image Analyst
2021 年 8 月 23 日
0 投票
It's not just that using "eval() is a big no-no" like you said, it's that the whole concept of writing a program where you don't know the variable names in advance and are creating named variables based on strings or some other run-time input is a bad idea.
So it's bad period. It's not that eval() is the problem so you just need to find some other workaround or "alternative method" to do the bad thing. It's just not a good idea. I always thought it was obvious and didn't need much explanation, but maybe others have different ideas or don't understand the explanations.
See the FAQ for another discussion.
カテゴリ
ヘルプ センター および File Exchange で Variables についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!