Issue with removing space from char from loaded data

Hello,
I am helping a colleage with some data analysis and come accross an issue I cannot find a way of fixing;
After loading in data (using textscan and loading as %s, for some reason the format of the data doesnt allow for %f, %d etc) and extracting data from the cells, it ends up in a character such as the one shown:
1 . 3 3 3
How does one remove the white space from this character in order to then properly turn it into numeric data that can then be used? I have tried things such as ~isspace yet this doesnt work; the result of isspace on the above number is
0 1 0 0 0 0 0 0 0 0 0 0 0
Which isnt useful!
If anyone knows a way around this I would be greatfully appriciative! Or indeed ideas on why %d, %f etc doesnt work in the textscan, as that might also solve the problem...
Many thanks David

4 件のコメント

Stephen23
Stephen23 2017 年 11 月 3 日
@David: please edit your question and upload a sample file by clicking the paperclip button.
David
David 2017 年 11 月 3 日
Hi,
Ive attached the first few lines of one of the files. Note in practice its much longer than this (~500 lines) yet that shouldnt be an issue scaling up a solution!
Stephen23
Stephen23 2017 年 11 月 3 日
Why are you using %s for loading numeric data? Why not simply load numeric data using a numeric format and get a numeric variable?
David
David 2017 年 11 月 3 日
For some reason it doesnt work with d or f for example (gives blank/empty arrays). Im still trying to figure out why, yet if you have ideas as to why this could be the case Id be appriciative!

サインインしてコメントする。

 採用された回答

Stephen23
Stephen23 2017 年 11 月 3 日
編集済み: Stephen23 2017 年 11 月 3 日

0 投票

I don't have any problems importing your sample file (attached) as numeric data. Here are the two ways I tried:
Method one: dlmread:
>> M = dlmread('example_data.txt','\t',1,0)
M =
1.333 16594.953
2.667 16562.166
4 15972.454
5.333 15968.083
6.667 15482.982
8 14435.071
>>
Method two: textscan:
opt = {'HeaderLines',1, 'CollectOutput',true};
[fid,msg] = fopen('example_data.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,'%f%f',opt{:});
fclose(fid);
which gives this data:
>> C{1}
ans =
1.333 16594.953
2.667 16562.166
4 15972.454
5.333 15968.083
6.667 15482.982
8 14435.071

5 件のコメント

David
David 2017 年 11 月 3 日
編集済み: David 2017 年 11 月 3 日
Hi,
For some reason this yeilds an "empty" (i.e. just shows []) 1x1 cell (C)
I have uploaded the full file in the original post to show that I am getting the error using this.
When using the example file your method does work fine, which is odd
David
David 2017 年 11 月 3 日
I have done so now in the original post; I made the (in my eyes fair) assumption that the bulk of the data would behave the same as the first secotion of it as the format doesnt change.
Stephen23
Stephen23 2017 年 11 月 3 日
編集済み: Stephen23 2017 年 11 月 3 日
@David: the problem is the file encoding: your original data file is encoded as UCS2 Little Endian (similar to UTF-16), whereas the sample file is saved as a simple ANSI encoding. If you commonly write some non-English language or your PC Locale setting is not English then UCS2 may be the default file encoding.
There is no point in saving such a simple numeric data file as UCS2, so I would recommend that you save the file instead as ANSI: one simple way would be to open it using Notepad++, then change the encoding, and finally save it with the correct encoding. Alternatively you could fopen it as UCS2, but I have no experience with this.
The UCS2 file encoding also explains the "space" characters that you see when importing the data as character: the second byte of each character is being interpreted as a new character inside MATLAB, but most likely is out of ASCII range and is becomes a null character or a control character. In any case, this is a good example of the X-Y problem: rather than fixing mysterious spaces that you don't understand where they come from, you really should just fix the file encoding.
David
David 2017 年 11 月 3 日
Hi Stephen,
Indeed you are right and that did fix it! Many thanks.
I will now have some friendly words with my colleage on how he should save his data in the future...
Thanks again!
Stephen23
Stephen23 2017 年 11 月 3 日
@David: I hope that it helped. You can also accept my answer, if these comments helped you.

サインインしてコメントする。

その他の回答 (1 件)

KL
KL 2017 年 11 月 3 日

0 投票

if you're only importing numeric data from those files, why not just use dlmread or csvread or textread. I'd personally prefer readtable.
It'S better to import the data clearly than having to deal with the problem of improper imports.
check these links:

4 件のコメント

David
David 2017 年 11 月 3 日
編集済み: David 2017 年 11 月 3 日
Hi,
The issue is the formating of the raw data, it looks like so (Ive now uploaded a sample file of the data on the original post);
X Y
1.333 16594.953
2.667 16562.166
4 15972.454
As you can see the size of columns keeps changing and there is a headerline to ignore (I didnt take the data Im just helping to salvage it...); hence using textscan as it can handle the column sizes changes and remove the headerline. As mentioned for some reason a numeric data type like %d or %f didnt work on the load for some reason, and things like dlmread find issues with this, at least to my knowledge.
KL
KL 2017 年 11 月 3 日
編集済み: KL 2017 年 11 月 3 日
What do you mean by the size of columns keeps changing? I see perfectly tab spaced data. About ignoring the headerline, if you read any of the links I gave you, you would have seen that you could simply ignore the first line (if you want to). For example,
filename = 'somename.txt';
delimter = ' ';
rowstoIgnore = 1;
columnstoIgnore = 0;
data = dlmread(filename,delimiter, rowstoIgnore, columnstoIgnore);
on the other hand, if you use readtable, it understands X and Y are variable names and stores them as well. For example,
data = readtable(filename);
now you can access the data like,
data.X %or data.Y
David
David 2017 年 11 月 3 日
Hi,
Using dlmread I get this error;
Mismatch between file and format string. Trouble reading 'Numeric' field from file (row number 1, field number 1) ==> \n
And readtable yeilds:
Error using readtable (line 143) Cannot interpret data in the file 'PlotValues0002.txt'. Found 2 variable names but 1 data columns. You may need to specify a different format string, delimiter, or number of header lines.
hence me refering to the columns chaning size (ie decimal points)
Stephen23
Stephen23 2017 年 11 月 3 日
"...and things like dlmread find issues with this, at least to my knowledge"
This does not mean you should jump straight to creating some complex work-around using strings. You could have asked about how to import the numeric data first. Only if that proved really difficult should you start to investigate other methods.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeText Data Preparation についてさらに検索

質問済み:

2017 年 11 月 3 日

コメント済み:

2017 年 11 月 3 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by