I figured it out. The read statement is what moves the file pointer, and if it gives an error then the pointer stays put. I solved the problem by making the read function fileparts, obtaining the filename from that, and then using try,catch with extractFileText on that file.
How do I skip a file that gives an error when using fileDatastore to loop through a folder of pdfs?
2 ビュー (過去 30 日間)
古いコメントを表示
I am mining text from several thousand pdfs in a folder using the Text Analytics Toolbox. I am using fileDatastore to loop through them. Some of the pdfs are encrypted, which gives an error with extractFileText. I have added a try,catch segment to skip those files, but when it catches the error it goes back to try and reads the same file again. The loop never ends. How do I increment the counter so that it will move on past the bad file? Here is part of the code:
fds = fileDatastore('File*.pdf','ReadFcn',@extractFileText);
while hasdata(fds)
% extract and prepare text
try % be prepared for error such as locked pdf
text=read(fds); % this is where error occurs
catch
disp('encrypted pdf');
continue
end
text=erasePunctuation(text);
% etc. (other text-parsing)
...
end
0 件のコメント
採用された回答
Allen
2019 年 1 月 12 日
1 件のコメント
Eniola Oluwakoya
2020 年 7 月 28 日
Hi, could you share more light on how you made the read function fileparts?
その他の回答 (1 件)
参考
カテゴリ
Help Center および File Exchange で Startup and Shutdown についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!