I have a text file that I would like to split into an array. Each array cell should be a word, not a sentence or line in the file.

1 回表示 (過去 30 日間)
This is what I got so far. But it does not actually solve my problem.
file= fopen('marktwain.txt','r');
string= fread(file, [1, inf], 'char');
fclose(file);
CStr = dataread('file', 'marktwain.txt', '%s', 'delimiter', '\n');
I have little clue where to go from here.

採用された回答

Cedric
Cedric 2013 年 3 月 17 日
編集済み: Cedric 2013 年 3 月 17 日
buffer = fileread('marktwain.txt') ;
words = regexp(buffer, '\<\w+', 'match') ;
.. and we can discuss the pattern if you want to refine the regexp. You could for example have "it's" or "John's" count as single words (and not two) using (EDITED)
words = regexp(buffer, '\<[\w'']+', 'match') ;
The final answer, after the discussion below, is:
buffer = fileread('marktwain.txt') ;
words = regexp(buffer, '\<[\w''\-,]+', 'match') ;
  8 件のコメント
Marco
Marco 2013 年 3 月 17 日
Thank you very much, I understand this much better now.
Cedric
Cedric 2013 年 3 月 17 日
編集済み: Cedric 2013 年 3 月 17 日
You want the comma to be part of words? If so, you probably figured out now that you can match it with
words = regexp(buffer, '\<[\w'',-]+', 'match') ;
Note that the dash has a special meaning when followed by a literal (it codes a range, like in A-Z that means A to Z), so you have to escape it if it doesn't come last within the []:
words = regexp(buffer, '\<[\w''\-,]+', 'match') ;
This is why I put the comma before the dash in the first expression.

サインインしてコメントする。

その他の回答 (2 件)

Walter Roberson
Walter Roberson 2013 年 3 月 17 日
file = fopen('marktwain.txt', 'rt');
CStr = textscan(file, '%s');
fclose(file);
Only problem: you have not defined exactly what a "word" is for your purposes, so the above is going to break things up at whitespace.
  1 件のコメント
Marco
Marco 2013 年 3 月 17 日
All right; a word is the letters , like a in apple,between spaces excluding { , " ; : ? ! etc.

サインインしてコメントする。


Image Analyst
Image Analyst 2013 年 3 月 17 日
編集済み: Image Analyst 2013 年 3 月 17 日
For example:
>> allwords('This is what I got so far. But it does not actually solve my problem.')
ans =
'This' 'is' 'what' 'I' 'got' 'so' 'far' 'But' 'it' 'does' 'not' 'actually' 'solve' 'my' 'problem'
  2 件のコメント
Marco
Marco 2013 年 3 月 17 日
I tried allwords but MATLAB didn't recognize the function. It is useful, do I have to download it?
Walter Roberson
Walter Roberson 2013 年 3 月 17 日
Yes you would have to download it from the link that was given.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by