Reading text file word by word

Question

Paolo Binetti 2018 年 10 月 13 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/423783-reading-text-file-word-by-word

編集済み: Paolo Binetti 2018 年 10 月 13 日

dataset_300_8.txt

Input is the attached text file, with one long word, a newline, then several equal-length short words separated by white spaces. I would like to read the first word in a variable, then the other ones in another variable one by one, rather than all into a single string or into a single huge cell array. I tried to do this with fscanf in several ways, but failed, and even got the impression that fscanf is not complying with https://fr.mathworks.com/help/matlab/ref/fscanf.html no clue about what I am doing wrong.

fileID = fopen('dataset_300_8.txt');
long_word = fscanf(fileID, '%[$ACGT]'); % is there another way to stop reading at newline? 
short_word = ' ';
while ~isempty(short_word)
  short_word = fscanf(fileID, '%s'); % does not work: shouldn't %s stop as it encounters a white space?
  % short_word = fscanf(fileID, '%10s'); % this also does not work
  % short_word processing code here
end
fclose(fileID);

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

madhan ravi 2018 年 10 月 13 日

%10c and the delimiter does the work as my answer below.

Paolo Binetti 2018 年 10 月 13 日

編集済み: Paolo Binetti 2018 年 10 月 13 日

Thank you but reading all the content of the file in one shot is what I want to avoid. The "file" variable from Madhan's code has 127134 Bytes. What I want is a variable long_word of 2002 Bytes plus a variable short_word of 20 Bytes, into which I want to read each of the short words in the text file one by one. I read one, I use it for computations, then I don't need it anymore, so I read the next one, and so forth. I want to get the job done with 2022 Bytes, rather than 127134 Bytes. My file is just a small sample, but for bigger files memory would be an issue.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

jonas 2018 年 10 月 13 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/423783-reading-text-file-word-by-word#answer_341207

編集済み: jonas 2018 年 10 月 13 日

MATLAB Online で開く

Try this minor change

short_word = fscanf(fileID, '%s+');
                               ↑

Edit: After further testing, any character after the %s gives the same results as it causes the fscanf to stop reading (due to mismatch). Another iteration begins where the previous attempt failed, so at the next word.

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

jonas 2018 年 10 月 13 日

編集済み: jonas 2018 年 10 月 13 日

+ just means to continue reading characters until something else is encountered. It is used in textscan so I just assumed it applies here as well.

"A = fscanf(fileID,formatSpec) reads data from an open text file into column vector A and interprets values in the file according to the format specified by formatSpec. The fscanf function reapplies the format throughout the entire file and positions the file pointer at the end-of-file marker. If fscanf cannot match formatSpec to the data, it reads only the portion that matches and stops processing.

So formatSpec (%s) reads all characters, skips whitespaces and returns a single long character sequence whereas (%c) does the same but retains the whitespaces.

This means that adding any character after (%s) forces the scan to stop processing and the pointer is placed where the scan failed due to mismatch. If you do another fscan, then it continues to read where failed previously.

Paolo Binetti 2018 年 10 月 13 日

Clear, thank you

サインインしてコメントする。

Answer 2

Image Analyst 2018 年 10 月 13 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/423783-reading-text-file-word-by-word#answer_341218

MATLAB Online で開く

"I would like to read the first word in a variable, then the other ones in another variable one by one, rather than all into a single string or into a single huge cell array." <--- this is a really bad idea. I'm sure Stephen will soon give you the reasons why.

Better solution is to use fileread() followed by strsplit() to make the single cell array.

str = fileread('dataset_300_8.txt'); % Read entire file.
ca = strsplit(str, ' '); % Put each word into a cell

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Paolo Binetti 2018 年 10 月 13 日

Really wonder why. OK, what if the combined size of the small words is 100 GB: guess you can't throw all of it in a single variable right? The file is on the hard disk, which can handle big chunks of data, but once you read it into a variable it goes into RAM, which cannot, right? So the idea was to read a piece of data at a time, process it, then read the next one, process it, etc.

サインインしてコメントする。

Reading text file word by word

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Reading text file word by word

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示