MATLAB Answers

xRobot
0

Traversing Text Document Matlab

xRobot
さんによって質問されました 2019 年 11 月 17 日 15:09
最新アクティビティ Adam Danz
さんによって 編集されました 2019 年 11 月 19 日 21:39
Please provide guidance on this particular inquiry. All responses are highly valued and will be used to further knowledge(not just looking for a copy and paste solution). I am attempting to read a Microsoft Word dictionary into Matlab. From here I would like to be able to traverse it and extract words of a specific length, say four letter words, and put them into an array. Then I would like to select random words from the array and put them into a matrix. ?

  0 件のコメント

サインイン to comment.

1 件の回答

Adam Danz
回答者: Adam Danz
2019 年 11 月 17 日 16:17
編集済み: Adam Danz
2019 年 11 月 17 日 16:20

Reading from word doc
Here's the general approach to reading a Microsoft word document.
directory = 'C:\Users\AOC\Documents\MATLAB';
file = 'myDocFile.docx';
% Full path to the MS Word file
filePath = fullfile(directory,file);
% Read MS Word file using actxserver function
word = actxserver('Word.Application');
wdoc = word.Documents.Open(filePath);
txt = wdoc.Content.Text;
Quit(word)
delete(word)
The variable txt is a char array containing the text in your document.
Extracting 4-letter words
There are several approaches you could use. This one is fast and doesn't require segementing each word and counting each word-length. Instead, it uses a regular expression to search for this pattern:
[non-letter],[4-letters],[non-letter]
It also uses strtrim() to remove the leading and trailing white space.
% Extract 4-letter words.
s = strtrim(regexp(txt, '([^a-zA-Z])[a-zA-Z]{4}([^a-zA-Z])', 'match'));
s is a 1xn cell array of 4-letter words at character arrays.
Randomly select words
You can't put non-numeric values into a matrix but you can put them into a cell array. This example below chooses n random values from the extracted words.
n = 10;
if n > numel(s)
error('There are only %d words available. You selected %d words.' numel(s), n)
end
randIdx = randi(numel(s),1,n);
randWords = s(randIDx); % Here is your random selection

  5 件のコメント

xRobot
2019 年 11 月 19 日 16:12
I have obtained a .txt file. I would like to read it into my script. I would like to put it into a string array after extracting four letter words. Which would be more effecient for reading in the file fscanf? textscan?
xRobot
2019 年 11 月 19 日 16:22
fileID = fopen('mylist.odt','r');
formatSpec = '%s';
words = fscanf(fileID,formatSpec);
I have used the above code to read in the file. It read in as a 1x11102 char. What I would like to do is convert this to a string array.
Adam Danz
2019 年 11 月 19 日 21:38

サインイン to comment.



Translated by