textscan doesn't stop at blank space in txt file

Question

Josh Tome 2022 年 10 月 20 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1831643-textscan-doesn-t-stop-at-blank-space-in-txt-file

編集済み: dpb 2022 年 10 月 22 日

walking_01.txt

Hi, I'm trying to import data from a txt file using the textscan function. While I thought it was suppose to stop at the first blank space it sees, it seems to be grabbing data beyond the blank space. My Group1 should stop at the first blank space before "Events", but it includes "Events", "100", and "Subject".

I'm using the following code thus far..

[file_list, path_n] = uigetfile('.txt','Select the Files to Process','Multiselect','on');
fidi = fopen(file_list);
Group1 = textscan(fidi, '%s %s %s %f %s %s','HeaderLines',3, 'Delimiter','\t');

Attached is the txt file data:

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

dpb 2022 年 10 月 20 日

"... I thought it was suppose to stop at the first blank space..."

I don't know what would have given you that idea, the documentation Description for textscan says explicitly, that

"textscan attempts to match the data in the file to the conversion specifier in formatSpec. The textscan function reapplies formatSpec throughout the entire file and stops when it cannot match formatSpec to the data."

If you were to know the number of records in the section, you could use the count argument to limit the application of the format string to the file to that many records. One presumes that is not known a priori and it doesn't appear that the "100" value has any relationship to the number of records in the initial section so that doesn't help.

Here's another case probably better to either use fgetl and read/parse each record or bring the whole file into memory and locate the sections and then process them from memory.

Been several of this kind of thing here in the last few days for examples...

dpb 2022 年 10 月 21 日

編集済み: dpb 2022 年 10 月 21 日

MATLAB Online で開く

"...the blank space doesn't match the "%s" specifier (or so I believe),"

Well, that isn't correct assumption, either, a blank is a valid character as is any other. However, unless told different with the optional 'whitespace' named parameter, blanks are considered whitespace and ignored or treated as delimiters except for quoted strings in which they are significant.

Again the textscan doc Algorithms section states--

"When matching data to a text conversion specifier, textscan reads until it finds a delimiter or an end-of-line character."

But, the format spec was '%s %s %s %f %s %s' which gets reapplied over and over until it either fails or reaches the end of file. In this case it found the %s and a numeric it could convert, but then the following records fail.

Another alternative to parsing w/ textscan when such is known to be in the file is to just accept the error; and resynch the file pointer to the next expected record and then carry on with the next section format string. This can be tricky if the file doesn't have fixed-length records as the example; fgetl will get to the next EOL record, but depending upon file content, that may not include all of the next record to be scanned and trying to back up to the previous end of record isn't easily supported in stream files. In the particular file, however, with the failure in the header line, that would work and you could subsequently get the second group in the same open with textscan as

fidi = fopen(file_list);
fmt=[repmat('%s',1,3) '%f' repmat('%s',1,2)];
G1=textscan(fidi,fmt,'HeaderLines',3,'Delimiter','\t','collectoutput',1);
fmt=[repmat('%s',1,3) '%f' repmat('%s',1,1)];
fgetl(fidi);         % resynch to BOL next header group
G2=textscan(fidi,fmt,'Delimiter','\t','collectoutput',1);

Personally, I'd still opt for higher level parsing tools instead of having to then put the above into something useful...

Walter Roberson 2022 年 10 月 21 日

All textscan formats other than %c and %[] skip leading whitespace as defined by the Whitespace option (or default list of whitespace characters if no option was passed.) And %c is perfectly happy to read a space.

If you need a space to be rejected then you have two possibilities:

pass Whitespace option that does not include space; or
use %[^ ] taking into account that would be happy to gobble a number returning it as a character vector

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2022 年 10 月 20 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1831643-textscan-doesn-t-stop-at-blank-space-in-txt-file#answer_1080038

MATLAB Online で開く

opt=detectImportOptions(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'), ...
            'numheaderlines',2, ...
            'readvariablenames',1, ...
            'delimiter','\t', ...
            'expectednumvariables',6, ...
            'missingrule','fill');
opt.VariableTypes(1)={'char'};
tG=readtable(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'),opt);
ix=find(contains(tG.Subject,'Events'));
tG=tG(1:ix-1,:);
[head(tG);tail(tG)]
ans = 16×6 table
       Subject         Context               Name                Value         Units              Description       
    ______________    _________    _________________________    _______    _____________    ________________________

    {'PluginGait'}    {'Left' }    {'Cadence'              }     116.39    {'steps/min'}    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Walking Speed'        }     1.3038    {'m/s'      }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Stride Time'          }      1.031    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Step Time'            }      0.551    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Opposite Foot Off'    }     12.609    {'%'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Opposite Foot Contact'}     46.557    {'%'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Foot Off'             }     62.076    {'%'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Single Support'       }       0.35    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Single Support'       }      0.391    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Double Support'       }      0.309    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Stride Length'        }     1.3652    {'m'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Step Length'          }      0.651    {'m'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Step Width'           }    0.19855    {'m'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Limp Index'           }     1.0249    {0×0 char   }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'GDI'                  }     75.701    {0×0 char   }    {'Gait Deviation Index'}
    {'PluginGait'}    {'Right'}    {'GDI'                  }     72.639    {0×0 char   }    {'Gait Deviation Index'}

Got to thinking -- each of the first two sections would make a great table -- and can import each in part directly. Unfortunately, readtable isn't set up to be able to read from memory...but thought it worthy of showing an import object and what could do.

"In anger" (as my old Scottish power plant testing engineer friend use to say) I'd still probably first read the file in in toto and use that to find the sections and then parse them.

The first two sections are pretty easy; not so sure about the "Devices" section -- the "Moment" section also looks ok although appears empty in this dataset.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

dpb 2022 年 10 月 20 日

編集済み: dpb 2022 年 10 月 22 日

MATLAB Online で開く

SECTIONS={'Gait Cycle','Events','Devices'};
F=readlines(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'));
ix=find(startsWith(F,SECTIONS))
ix = 3×1
     1
    33
    51

Gives the section starting locations for internal parsing -- or use those to limit the ranges read using readtable from the file itself.

サインインしてコメントする。

textscan doesn't stop at blank space in txt file

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

textscan doesn't stop at blank space in txt file

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示