How can I convert a scanned PDF to an image using MATLAB?

277 ビュー (過去 30 日間)
MathWorks Support Team
MathWorks Support Team 2021 年 1 月 5 日
編集済み: MathWorks Support Team 2023 年 2 月 6 日
How can I import a scanned PDF into MATLAB and convert it to image files?
I tried to use extractFileText() from Text Analytics Toolbox, but it only works for native PDFs and not scanned PDFs:
>> extractFileText('example.pdf')
ans =
<missing>

採用された回答

MathWorks Support Team
MathWorks Support Team 2023 年 2 月 6 日
編集済み: MathWorks Support Team 2023 年 2 月 6 日
MATLAB ships with the Apache PDFBox Java library which allows importing and rendering PDF files. Use the following MATLAB function PDFtoImg() to import a scanned PDF, and save each page as a separate PNG file:
function images = PDFtoImg(pdfFile)
import org.apache.pdfbox.*
import java.io.*
filename = fullfile(pwd,pdfFile);
jFile = File(filename);
document = pdmodel.PDDocument.load(jFile);
pdfRenderer = rendering.PDFRenderer(document);
count = document.getNumberOfPages();
images = [];
for ii = 1:count
    bim = pdfRenderer.renderImageWithDPI(ii-1, 300, rendering.ImageType.RGB);
    images = [images (filename + "-" +"Page" + ii + ".png")];
    tools.imageio.ImageIOUtil.writeImage(bim, filename + "-" +"Page" + ii + ".png", 300);
end
document.close()
The input, variable "pdfFile", must be a string or a character array. For example,
pdfFile = "example.pdf" % String
Notes:
1. The function will split the input PDF data into one image for each PDF page. For example, if “example.pdf” contains 13 pages, it will convert the 13 pages to 13 images.
2. For subsequent OCR tasks, is important to render the PDF pages with 300 dpi or higher resolution:
>> bim = pdfRenderer.renderImageWithDPI(ii-1, 300, rendering.ImageType.RGB);

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeConvert Image Type についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by