Strings to variable names

Question

0 投票

raw_data.csv

I have a table with 30-ish columns and thousands of rows containing strings and numerical values (small sample attached). I need to plot hundreds of different figures by matching certain criteria and extracting only portion of the data for each plot, based on that criteria. I would like to do this by passing two variables (matching the .Properties.VariableNames) to a function which would then do something based on these two input arguments.

I started reading on converting strings to variable names and found the deprecated function genvarname(), which points to matlab.lang.makeValidName() that takes string as the argument. But the latter gives me the error:

Unrecognized table variable name 'X'.

The way I'm trying to use it is by reading my data:

results = readtable('raw_data.csv','PreserveVariableNames',true);

then specifying which variable names I want to work with:

X = matlab.lang.makeValidName('Sigma');
Y = matlab.lang.makeValidName('LR');

However, trying to pass X and Y as the arguments to the function, then referencing them as follows:

results.X

is throwing me the error.

If I manually edit this call to results.Sigma or results.LR, everything works just fine. But changing these two variables inside of the function would defeat its purpose. I's like to keep X and Y as generalized variable names within the function and only change the two above lines before I call the function.

I'm also reading on dynamically generated variable names and why eval is a big no-no. I tried the alternative methods mentioned there, such as cell/ND-arrays or Structs, etc. but couldn't get that to work either.

Can someone point me to what the most appropriate method to solve this would be, please?

8 件のコメント
6 件の古いコメントを表示 6 件の古いコメントを非表示

Stephen23 2021 年 8 月 23 日

編集済み: Stephen23 2021 年 8 月 23 日

The end of per isakson's answer has very important advice that you should not ignore.

"I'm also reading on dynamically generated variable names and why eval is a big no-no."

Those are both irrelevant to your situation.

The problem is nomenclature.

For reasons beyond my understanding, TMW in their infinite wisdom decided to name the columns of a table as "variables", which is the same name used for arrays stored in the MATLAB workspace. Confusion arises when these two totally different concepts are mixed up, as in your question: GENVARNAME and MAKEVALIDNAME apply to names of workspace variables (and a few other situations), but given that your simple table column/variable names are already perfectly valid table column/variable names, neither of them will do anything useful for you. My tutorial page is totally unrelated to your situation, because it deals only with names of variables in the workspace, not names of table columns/variables (which is what you are quite reasonably trying to access).

The decision by TMW to use "variable" to describe different things made this harder than it needs to be, however instead of misapplying information/paradigms/advice relating to renaming variables in the workspace (i.e. what have been called "variables" since time immemorial) you should apply the simple techniques for accessing columns/variables of tables:

https://www.mathworks.com/help/matlab/matlab_prog/access-data-in-a-table.html

The real solution is given at the end of per isakson's answer.

Stephen23 2021 年 8 月 27 日

MATLAB Online で開く

Milos Krsmanovic's incorrectly posted "Answer" moved here:

Thank you all for chiming in and offering advice.

I wasn't able to achieve what I want, and reading the answers it might be because I didn't formulate my question properly. Let me provide additional code to try and better explain what I'm trying to do.

% read the tabular data
results = readtable('raw_data.csv','PreserveVariableNames',true);
critMaterial = unique(results.Material); % identify unique values in the Material column
critSigma = unique(results.Sigma); % ditto for Sigma
e = numel(critMaterial); % count how many cells are there in critMaterial
data = cell(1,e); % preallocate data array
% populate data array based on given ad-hoc criteria, in this case...
% I need LR and L
for i = 1:e
	data(1,i)=num2cell(mean(results.LR(results.Material==string(critMaterial(1)) & results.Sigma==string(critSigma(1)))));
	data(2,i)=num2cell(mean(results.L(results.Material==string(critMaterial(2)) & results.Sigma==string(critMaterial(1)))));   
end

Once I have my data array I will plot something based on it, plot it as a table, etc. it doesn't matter.

Now, I would like to keep this for loop inside of a function, so I don't have to copy, paste and edit it hundreds of times.

I am able to pass critMaterial and critSigma arguments to the function to quickly select which two criteria I want to apply. I would like to do the same for results.Properties.VariableNames as well. The reason is, while I have only four columns in my sample csv, in the actual file I have more than 30 of them, all different. In the sample csv file I have three unique materials and two sigmas, in the actual file I have dozens of them.

I understand that assigning variable names on-the-fly is a bad practice. So I'm wondering if my syntax inside of the for loop should change then.

I do not want to change the variable names in the results table as they are been reused so many times. But I would like to pass two arguments X and Y to the function that would do what is identical to:

data(1,i)=num2cell(mean(results.X(results.Material==string(critMaterial(1)) & results.Sigma==string(critSigma(1)))));
data(2,i)=num2cell(mean(results.Y(results.Material==string(critMaterial(2)) & results.Sigma==string(critMaterial(1))))); 

where the arguments X and Y would be chosen manually from the results.Properties.VariableNames on per-case basis.

Stephen23 2021 年 8 月 27 日

編集済み: Stephen23 2021 年 8 月 27 日

"Now, I would like to keep this for loop inside of a function, so I don't have to copy, paste and edit it hundreds of times."

Copy-and-pasting hundreds of times would be very bad code design.

"I understand that assigning variable names on-the-fly is a bad practice."

Dynamically access the names of variables in the workspace is slow, complex, and very inefficient. As I wrote in my earlier comment, this is totally unrelated to accessing the column/variable names of tables (which is what you have).

For some reason you keep mixing these up.

"So I'm wondering if my syntax inside of the for loop should change then."

Nothing in your explanation requires dynamic variable names.

"where the arguments X and Y would be chosen manually from the results.Properties.VariableNames on per-case basis."

Maybe this is the root of your confusion: if you want to manually select pairs of data to plot/analyze, then of course you will have to specify these manually (by your own definition). But there is no reason why you need to do this by copy-and-pasting code: much simpler would be to create a list of those pairs of data and loop over that list.

The advice at the end of dpb's answer is still highly relevant.

Milos Krsmanovic 2021 年 8 月 29 日

"Nothing in your explanation requires dynamic variable names."

@Stephen,

I appreciate your inputs, I honestly do. But there seem to be some breaking in the communications.

If I understood this problem fully, I would not be asking the question. I'm asking this question because I have trouble understanding it. From what I'm reading in your replies it seems to me you're quite determined to drive this point home. But I'm not denying it.

As for what I'm trying to do, I cannot quote and reply to everything you said, so I'll try to be constructive and focus only on one crucial thing (the above quote). Let's consider the last piece of code I posted. Let us assume I have the for loop in a separate function file which I call from my main .m script. So in the example above I manually populated the two rows of data cell array with values from columns LR and L, based on multiple criteria as shown in the code sample. Say I passed some two arguments to the function so it knows I'm looking for LR and L. Next, I want to populate data array with values from columns Material and Sigma. Again, I pass two arguments to the function. Next, Sigma and LR. And so on for any combination of 30 columns I actually have. So my question is, how do I write the code inside of that function/for loop so that I can only change the arguments which I'm passing from the main .m script?

So if it's not dynamic renaming, OK I get it. I don't know enought to recognize that. But please, please can we try and meet half the way. I'm not sure if I again made a descriptive, semantic or logical error in my writing but what I'm trying to say is that I did it the best I could within the limits of my knowledge of the software and the language.

Thanks again.

Milos Krsmanovic 2021 年 8 月 29 日

@Walter Roberson

That is just one more thing I couldn't address because there was so much going on in his posts.

I initially tried a similar approach by passing the given string as a variable name, say: Xs(1), but I was was receiving the Indexing with parentheses '()' must appear as the last operation of a valid indexing expression error (please note that syntax in my example is different than his).

As a matter of fact, reading on how to convert string to variable name is how I initally ended up in the whole eval(), dynamic variable, etc. rabbit hole. That was the first thread I read, then the ones Jan linked, an only after that I came here to ask the question.

Turns out I haven't used the outher brackets (Xs(k)) in my first try. I now tried the proposed solution but I'm getting the error Index exceeds array bounds.

What I opted out to do at the end is to use the modified solution from @Wan Ji - I created two new columns X and Y, without deleting/overwriting any of the existing ones. Before calling my function I will repopulate X and Y with whatever two columns I want to work with at that instance. I understand it's not the most efficient way but it works and I figured out I'm not gonna get any closer.

Thank you as well for your reply, I do appreciate it.

Stephen23 2021 年 8 月 30 日

編集済み: Stephen23 2021 年 8 月 30 日

"What I opted out to do at the end is to use the modified solution from @Wan Ji - I created two new columns X and Y, without deleting/overwriting any of the existing ones. Before calling my function I will repopulate X and Y with whatever two columns I want to work with at that instance."

If you only need the data in those two columns, why do you have to add them back into the table? Surely you could just write your function to simply accept those two columns directly... which is also what you asked in your question "However, trying to pass X and Y as the arguments to the function...", so it is not clear to me why you now want to make this more complex than it needs to be (or even what you asked about).

Please show us the code you are trying now, I am sure that this could be simplified.

Milos Krsmanovic 2021 年 8 月 30 日

MATLAB Online で開く

I already expained why, in detail, in my previous comment.

Surely I would write the function to simply accept those two columns directly, which is also what I asked - if I knew how. Which is why I asked the question in the first place.

For the sake of other who might be reading this topic in the future, here is what I ended up with at the end.

Before the function:

results.X = results.LR;
results.Y = results.L;

Here I will change LR and L to any of the other 28 column names/headers for each instance when I'm calling the function.

Inside of the function:

for i = 1:e
	data(1,i)=num2cell(mean(results.Xresults.Material==string(critMaterial(1)) & results.Sigma==string(critSigma(1)))));
	data(2,i)=num2cell(mean(results.Y(results.Material==string(critMaterial(2)) & results.Sigma==string(critMaterial(1)))));   
end

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Wan Ji 2021 年 8 月 23 日

編集済み: Wan Ji 2021 年 8 月 23 日

MATLAB Online で開く

0 投票

Hi, try following code

results{:,'X'} = results{:,'Sigma'};
results(:,'Sigma') = [];
results{:,'Y'} = results{:,'LR'};
results(:,'LR') = [];

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

Milos Krsmanovic 2021 年 8 月 26 日

Thanks for replying. I would like to keep the columns in the main table as they are being reused. Please see the additional example I provided to try and better explain my needs.

Milos Krsmanovic 2021 年 8 月 29 日

I'm finally accepting this answer because it got me the closest to the working solution. The only difference I introduced was to not overwrite the existing columns, but to create additional two columns which I would overwrite at each instance I'm calling the function.

Thanks again.

サインインしてコメントする。

Answer 2

per isakson 2021 年 8 月 23 日

編集済み: per isakson 2021 年 8 月 23 日

MATLAB Online で開く

1 投票

raw_data.csv

First a little exercise

%%
results = readtable('raw_data.csv');
results(4,:)
ans = 1×4 table
    Material     Sigma        L      LR 
    ________    ________    _____    ___

    {'PET'}     {'Plus'}    0.149    522
%%
results.Properties.VariableNames{'Sigma'} = 'X';
results.Properties.VariableNames{'LR'}    = 'Y';
results(4,:)
ans = 1×4 table
    Material       X          L       Y 
    ________    ________    _____    ___

    {'PET'}     {'Plus'}    0.149    522

Like this you can rename the variables in the table (see also @Wan Ji comment), but AFAIK you cannot create alias, which I think is what you are asking for.

IMO, renaming variables will eventually cause confusion. A better way to achieve "I like to keep X and Y as generalized variable names within the function" is to define functions like

function foo( X, Y )

% do stuff with X and Y

end

and call them like

foo( results.Sigma, results.LR )

2 件のコメント
なしを表示なしを非表示

Milos Krsmanovic 2021 年 8 月 26 日

Thank you for the answer. I didn't explain well that I don't want to rename the columns/property names in the main table. I provided additional example below to try and better describe my issue.

per isakson 2021 年 8 月 27 日

"I don't want to rename the columns/property names in the main table" Yes and that's why I wrote: "[...] but AFAIK you cannot create alias, which I think is what you are asking for."

サインインしてコメントする。

Answer 3

Stephen23 2021 年 8 月 27 日

編集済み: Stephen23 2021 年 8 月 27 日

MATLAB Online で開く

1 投票

Mixing up unrelated topics has made you think that this is much more complex than it really is.

Look at your own code that you wrote in your question:

results.Sigma
results.LR

And then what you wrote after that: "If I manually edit this call to results.Sigma or results.LR, everything works just fine. But changing these two variables inside of the function would defeat its purpose."

So... then don't "change" them inside the function. Simply use strings to select the variables that you want (you specified here that you want to manually select the pairs of data that get plotted/analyzed), just as the MATLAB documentation explains:

https://www.mathworks.com/help/matlab/matlab_prog/access-data-in-a-table.html

Xs = ["Sigma", "Xpair2", "Xpair3", .. "XpairN"];
Ys = ["LR"   , "Ypair2", "Ypair3", .. "YpairN"];
for k = 1:numel(Xs)
    Xdata = results.(Xs(k));
    Ydata = results.(Ys(k));
    ... whatever processing of Xdata and Ydata
end

So you can easily "manually" select and process any pairs of data that you want.

I see absolutely no reason why you need to copy-and-paste code hundreds of times.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 4

Image Analyst 2021 年 8 月 23 日

0 投票

It's not just that using "eval() is a big no-no" like you said, it's that the whole concept of writing a program where you don't know the variable names in advance and are creating named variables based on strings or some other run-time input is a bad idea.

So it's bad period. It's not that eval() is the problem so you just need to find some other workaround or "alternative method" to do the bad thing. It's just not a good idea. I always thought it was obvious and didn't need much explanation, but maybe others have different ideas or don't understand the explanations.

See the FAQ for another discussion.

https://matlab.fandom.com/wiki/FAQ#How_can_I_create_variables_A1.2C_A2.2C....2C_A10_in_a_loop.3F

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Milos Krsmanovic 2021 年 8 月 26 日

Fair point, poor choice of words from my end. Where I meant dynamically assigning the variable names I made it sound as if I'm talking about the eval() function per se. I added a small example below to better explain what I'm trying to do.

サインインしてコメントする。

Strings to variable names

8 件のコメント
6 件の古いコメントを表示 6 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

その他の回答 (3 件)

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Strings to variable names

8 件のコメント 6 件の古いコメントを表示 6 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

その他の回答 (3 件)

2 件のコメント なしを表示 なしを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

8 件のコメント
6 件の古いコメントを表示 6 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示