Incomplete reading of MS Word file
2 ビュー (過去 30 日間)
古いコメントを表示
At work I have to read some VERY long Word documents (~300 pages) and analyze the text. However, if I use the commands suggested in https://fr.mathworks.com/matlabcentral/answers/348737-how-to-read-ms-word-file-doc-docx :
word = actxserver('Word.Application');
wdoc = word.Documents.Open(filePath);
text = wdoc.Content.text;
wdoc.Close; % close document
word.Quit; % end application
the resulting "text" variable (1x158745 char) only contains ~25% of the document.
How can I read the whole document using this method? I saw that on newer relaseses there are dedicated functions/toolboxes for reading Word documents, but I don't have access to them as my company only provides R2020b and limited toolboxes.
0 件のコメント
回答 (1 件)
Oguz Kaan Hancioglu
2023 年 4 月 12 日
I haven't tried for such a huge file but can you try the open word document with fopen and read the whole text using read(fid, '*char'). Maybe it will work.
1 件のコメント
Walter Roberson
2023 年 4 月 12 日
That will not work in the form stated. .docx files are zip files that contain a directory of mostly XML files.
You can unzip the .docx file and go through the directory and try to extract things from the XML files; the XML files would be text files.
参考
カテゴリ
Help Center および File Exchange で Text Files についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!