現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
How to Read PDF file in Matlab?
267 ビュー (過去 30 日間)
古いコメントを表示
I want to read pdf file and make some changes in it and then save them in excel.... I have tried my best but fail every time....Need your help....Any effort will be greatly appreciated..Thanks in advance.....
20 件のコメント
Geoff Hayes
2014 年 8 月 16 日
What kind of changes do you want to make to the PDF that you wish to then save to Excel? What is the code that you have written so far?
azizullah khan
2014 年 8 月 16 日
i want to capture some data...and i havn't written code up till now...My 1st step is to read pdf file...........thanks for comments.
azizullah khan
2014 年 8 月 25 日
Geoff Hayes thanks for comments. Please just give me a clue how i can be possible to read pdf files...I am waiting for your response..
Geoff Hayes
2014 年 8 月 25 日
azizullah - I noticed that you looked at Dimitri Shvorob's extract text from PDF on the MATLAB File Exchange, but you had some problems with it. Did you download the two libraries that are needed for this submission, and modify the pdfParseDemo.m file as per the author's instructions?
One of the comments in the above submission indicates that there is a utility called pdftotext that you may be able to call from within the MATLAB code. Have you looked in to this?
José-Luis
2014 年 8 月 25 日
What is your goal with this? It might be that Matlab is not the best tool for this.
azizullah khan
2014 年 8 月 25 日
yes i have done which was required but pdfParsedemo makes a problem with me...
azizullah khan
2014 年 8 月 25 日
thanks Jose-Luis:MY goal is to capture data from pdf file and save the data to excel (the capture data)...
Geoff Hayes
2014 年 8 月 25 日
Is there just one PDF file, or several? What data in particular are you looking for in the pdf - a table of numeric data, some text, or ..?
José-Luis
2014 年 8 月 25 日
Why go through Matlab at all? Use Excel directly. A quick google search will tell you how to import pdf's to Excel.
azizullah khan
2014 年 8 月 25 日
I have thousands of pdf files and get data from the pdf files and manually it's very difficult.That is why i am using matlab at all.Thanks
Geoff Hayes
2014 年 8 月 25 日
Have you considered using pdftotext? Or any other converter, to HTML for example? Supposing that you are able to convert the file to text, what would you be looking in it for? Is there just one page of data that you need or one line from each page or..?
You might want to provide an example of a PDF that you wish to extract data from, and indicate which data in the file you want.
Jan
2014 年 8 月 26 日
@azizullah khan: You wrote "but pdfParsedemo makes a problem with me...". Please explain the problems. Your question is much to vague to be answered efficiently.
azizullah khan
2014 年 8 月 26 日
編集済み: Walter Roberson
2015 年 5 月 25 日
The problem with pdfParsedemo:...when i simulate the code the following error appear
??? Java exception occurred:
java.lang.NoClassDefFoundError: org/fontbox/afm/AFMParser
at org.pdfbox.pdmodel.font.PDFont.getAFM(PDFont.java:350)
at org.pdfbox.pdmodel.font.PDFont.getAverageFontWidthFromAFMFile(PDFont.java:313)
at org.pdfbox.pdmodel.font.PDSimpleFont.getAverageFontWidth(PDSimpleFont.java:231)
at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:276)
Error in ==> Untitled at 20
pdfstr = reader.getText(pdfdoc) %#ok
java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
at java.lang.ref.Finalizer.access$100(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
at java.lang.ref.Finalizer.access$100(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
at java.lang.ref.Finalizer.access$100(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
azizullah khan
2014 年 8 月 26 日
Hoeff Hayes: I have attached pdf file which i want to read and extract account info and some other data.Please explain any possibility of it.Thanks
Geoff Hayes
2014 年 8 月 26 日
Azizullah - you did not include an attachment.
As for the error, the AFMParser is part of the FontBox library. Did you add the FontBox jar file path to your Java class path? I looked at the pdfParsedemo.m script, and while it doesn't have a command to do so, you probably should. So if you updated
javaaddpath('M:\My Documents\MATLAB\PDF Exercise\PDFBox-0.7.3\lib\PDFBox-0.7.3.jar')
to the path on your workstation that corresponds to PDFBox-0.7.3.jar (or whatever the jar file is), then you should add an equivalent statement for the FontBox
javaaddpath('whateverYourPathIsTo\FontBox-someVersionIds.jar')
(I don't know what the name of the jar is, so FontBox-someVersionIds.jar is just an example.)
azizullah khan
2014 年 8 月 27 日
Yes.I did it as required.If there is any way to convert pdf into excel in matlab kindly share with me.For example: if we can load a pdf to another software with the help of matlab and then convert pdf into excel and got the output? IS it possible in matlab to operate another software?Thanks
Geoff Hayes
2014 年 8 月 27 日
Unfortunately, this is not something that I have considered and so am not aware of any other means of reading the pdf into MATLAB. You could always try the pdftotext program.
Naftali
2016 年 6 月 15 日
編集済み: Naftali
2016 年 6 月 15 日
I am no expert but could not find a way to read a pdf file to Matlab. People talk here a bout text, but pdf is usually a series of pics. I go to professional adobe reader and export the pages of the pdf document either by file/save as or by Advanced/Export. This produces a png or jpeg file for each page of the document. From there it is easy in Matlab - loop over the pages with the imread function.
Walter Roberson
2016 年 6 月 15 日
pdf is effectively a programming language; you need to execute the commands in order to determine what the output is.
Stefanie Schwarz
2021 年 1 月 5 日
Following up with Naftali's comment, there is also a way to convert a PDF to an image file in MATLAB. See: https://www.mathworks.com/matlabcentral/answers/709623-how-can-i-convert-a-scanned-pdf-to-an-image-using-matlab
採用された回答
Christopher Creutzig
2017 年 10 月 16 日
編集済み: Walter Roberson
2017 年 11 月 4 日
Just for the record, Text Analytics Toolbox (new in R2017b) includes a function extractFileText that will extract text data from PDF (or MS Word) files.
その他の回答 (1 件)
参考
カテゴリ
Help Center および File Exchange で Text Analytics Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
アジア太平洋地域
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)