- Every word ends with a space
- Every line ending has a carriage return and line feed
How can I get the word count of each line from an extracted PDF file
3 ビュー (過去 30 日間)
古いコメントを表示
Hi, I extracted text from a PDF file with many lines/entries of comments. I want to get the word count of each line, the average word count all lines, and the number of lines that only has one word. Is this possible..? Thanks!!
0 件のコメント
回答 (1 件)
Kiran Felix Robert
2021 年 2 月 2 日
Hi Yao,
I assume that you have extracted the text from a pdf file which is saved as a string variable. You can convert the string to a character array (convertStringsToChars) and count the words and lines.
Assume that
Using the built-in MATLAB example, the following program gives you the total line count and word count in the section of the file.
str = extractFileText("exampleSonnets.pdf");
ii = strfind(str,"II");
iii = strfind(str,"III");
start = ii(1);
fin = iii(1);
stringText = extractBetween(str,start,fin-1);
B = convertStringsToChars(stringText);
% Define the space character and end-of-line character
SpaceCharacter = B(3);
CarraigeReturnCharacter = B(4);
lineCount = 0;
wordCount = 0;
i = 1;
while i <= length(B)
if B(i) == CarraigeReturnCharacter
lineCount = lineCount + 1; % Total line count
end
if B(i) == SpaceCharacter
wordCount = wordCount + 1; % Total Word Count
end
i = i + 1;
end
Kiran
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Text Files についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!