Extracting data from pdf files

Question

joseph Frank 2014 年 4 月 19 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/126386-extracting-data-from-pdf-files

回答済み: Christopher Creutzig 2021 年 4 月 27 日

Hi,

I have around 300 pdf files with 19 pages each. I want to extract from each of them a fraction of a table on page 4 in order to build a research data set. Is i possible to do so using matlab? if so,which toolboxes and functions I need. I have matlab 2013a.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Kristian Gennaci 2014 年 4 月 21 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/126386-extracting-data-from-pdf-files#answer_134069

Hi Joseph,

Have you tried using this File Exchange submission?

http://www.mathworks.com/matlabcentral/fileexchange/19798-extract-text-from-a-pdf-document

This seems like the most promising solution. Alternatively, if you could convert the tables to an excel spreadsheet/CSV format, they can then easily be parsed using MATLAB's Excel/CSV functions:

http://www.mathworks.com/help/matlab/spreadsheets.html

http://www.mathworks.com/help/matlab/ref/csvread.html

I'll let you know if I find any other solutions.

Best,

Kristian

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Christopher Creutzig 2021 年 4 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/126386-extracting-data-from-pdf-files#answer_685860

JFTR, since R2017b, extractFileText('filename.pdf','Pages',4) from Text Analytics Toolbox gives you the text on ("physical") page 4 of the PDF, from which you can then extract the parts you need with string operations (extractBetween, regexp, etc.).