Issue with removing space from char from loaded data

Question

0 投票

Hello,

I am helping a colleage with some data analysis and come accross an issue I cannot find a way of fixing;

After loading in data (using textscan and loading as %s, for some reason the format of the data doesnt allow for %f, %d etc) and extracting data from the cells, it ends up in a character such as the one shown:

1 . 3 3 3

How does one remove the white space from this character in order to then properly turn it into numeric data that can then be used? I have tried things such as ~isspace yet this doesnt work; the result of isspace on the above number is

0 1 0 0 0 0 0 0 0 0 0 0 0

Which isnt useful!

If anyone knows a way around this I would be greatfully appriciative! Or indeed ideas on why %d, %f etc doesnt work in the textscan, as that might also solve the problem...

Many thanks David

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

Stephen23 2017 年 11 月 3 日

Why are you using %s for loading numeric data? Why not simply load numeric data using a numeric format and get a numeric variable?

David 2017 年 11 月 3 日

For some reason it doesnt work with d or f for example (gives blank/empty arrays). Im still trying to figure out why, yet if you have ideas as to why this could be the case Id be appriciative!

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Stephen23 2017 年 11 月 3 日

編集済み: Stephen23 2017 年 11 月 3 日

MATLAB Online で開く

0 投票

example_data.txt

I don't have any problems importing your sample file (attached) as numeric data. Here are the two ways I tried:

Method one: dlmread:

>> M = dlmread('example_data.txt','\t',1,0)
M =
                 1.333             16594.953
                 2.667             16562.166
                     4             15972.454
                 5.333             15968.083
                 6.667             15482.982
                     8             14435.071
>>

Method two: textscan:

opt = {'HeaderLines',1, 'CollectOutput',true};
[fid,msg] = fopen('example_data.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,'%f%f',opt{:});
fclose(fid);

which gives this data:

>> C{1}
ans =
               1.333             16594.953
               2.667             16562.166
                   4             15972.454
               5.333             15968.083
               6.667             15482.982
                   8             14435.071

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

Stephen23 2017 年 11 月 3 日

編集済み: Stephen23 2017 年 11 月 3 日

@David: the problem is the file encoding: your original data file is encoded as UCS2 Little Endian (similar to UTF-16), whereas the sample file is saved as a simple ANSI encoding. If you commonly write some non-English language or your PC Locale setting is not English then UCS2 may be the default file encoding.

There is no point in saving such a simple numeric data file as UCS2, so I would recommend that you save the file instead as ANSI: one simple way would be to open it using Notepad++, then change the encoding, and finally save it with the correct encoding. Alternatively you could fopen it as UCS2, but I have no experience with this.

The UCS2 file encoding also explains the "space" characters that you see when importing the data as character: the second byte of each character is being interpreted as a new character inside MATLAB, but most likely is out of ASCII range and is becomes a null character or a control character. In any case, this is a good example of the X-Y problem: rather than fixing mysterious spaces that you don't understand where they come from, you really should just fix the file encoding.

David 2017 年 11 月 3 日

Hi Stephen,

Indeed you are right and that did fix it! Many thanks.

I will now have some friendly words with my colleage on how he should save his data in the future...

Thanks again!

Stephen23 2017 年 11 月 3 日

@David: I hope that it helped. You can also accept my answer, if these comments helped you.

サインインしてコメントする。

Answer 2

KL 2017 年 11 月 3 日

0 投票

if you're only importing numeric data from those files, why not just use dlmread or csvread or textread. I'd personally prefer readtable.

It'S better to import the data clearly than having to deal with the problem of improper imports.

check these links:

https://de.mathworks.com/help/matlab/ref/csvread.html

https://de.mathworks.com/help/matlab/ref/dlmread.html

https://de.mathworks.com/help/matlab/ref/readtable.html

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

David 2017 年 11 月 3 日

Hi,

Using dlmread I get this error;

Mismatch between file and format string. Trouble reading 'Numeric' field from file (row number 1, field number 1) ==> \n

And readtable yeilds:

Error using readtable (line 143) Cannot interpret data in the file 'PlotValues0002.txt'. Found 2 variable names but 1 data columns. You may need to specify a different format string, delimiter, or number of header lines.

hence me refering to the columns chaning size (ie decimal points)

Stephen23 2017 年 11 月 3 日

"...and things like dlmread find issues with this, at least to my knowledge"

This does not mean you should jump straight to creating some complex work-around using strings. You could have asked about how to import the numeric data first. Only if that proved really difficult should you start to investigate other methods.

http://xyproblem.info/

サインインしてコメントする。

Issue with removing space from char from loaded data

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

採用された回答

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

その他の回答 (1 件)

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Issue with removing space from char from loaded data

4 件のコメント 2 件の古いコメントを表示 2 件の古いコメントを非表示

採用された回答

5 件のコメント 3 件の古いコメントを表示 3 件の古いコメントを非表示

その他の回答 (1 件)

4 件のコメント 2 件の古いコメントを表示 2 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示