我想获取 PDF 并将每个页面提取为图像。我已经能够使用 ImageMagick 和 GhostScript 做到这一点,但结果质量非常差。我尝试了许多不同的输出选项,但没有任何运气。下面的脚本应该是相当不言自明的。它可以工作,但与打开 PDF 相比,图像质量确实令人失望。
- 有没有一种方法可以使用 ImageMagick 来输出高质量的图像?
- 使用其他工具怎么样,但最好以编程方式,因为如果我必须在 GUI 中一张一张地处理大量 PDF,那么处理它们会很尴尬。
# Extract each page from a PDF as a png using ImageMagick
# ImageMagick requires GhostScript for PDF manipulation so have to make sure that is installed
# Current install folder: C:\Program Files\ImageMagick-7.1.1-Q16-HDRI
# Chocolatey package does not inclued the 'identify.exe' command
# Path to the PDF file
$pdfFilePath = "C:\0\MyFile.pdf"
# Output directory for images
$outputDirectory = "C:\0"
# Image type to output to (tried jpg, png, tiff etc)
$imageExtension = "jpg"
# Check if running as Admin
if (!([Security.Principal.WindowsPrincipal][Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole] "Administrator")) { Write-Host "Please run this script as an administrator."; exit }
# Check if Chocolatey is installed, if not, install it
if (!(Test-Path "$env:ProgramData\chocolatey")) { Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')) }
# Check for magick.exe on path; if not installed, install ImageMagick
$imageMagickExePath = Get-ChildItem -Path "C:\Program Files\ImageMagick-*" -Filter "magick.exe" -Recurse | Select-Object -First 1 -ExpandProperty FullName
if (!(Get-Command "magick.exe" -ea silent) -and ($null -eq $imageMagickExePath)) { Write-Host "ImageMagick not found. Installing..."; choco install imagemagick -y }
if ($null -eq $imageMagickExePath) { Write-Host "Error: magick.exe not found at $imageMagickExePath"; exit }
Write-Host "magick.exe found at '$imageMagickExePath'"
# Check for gswin64.exe on path; if not installed, install GhostScript
$gsExePath = Get-ChildItem -Path "C:\Program Files\gs\gs*\bin" -Filter "gswin64.exe" -Recurse | Select-Object -First 1 -ExpandProperty FullName
if (!(Get-Command "gswin64.exe" -ea silent) -and ($null -eq $gsExePath)) { Write-Host "GhostScript not found. Installing..."; choco install ghostscript -y }
if ($null -eq $gsExePath) { Write-Host "Error: gswin64.exe not found, this is required by ImageMagick for PDF manipulation"; exit }
Write-Host "gswin64.exe found at '$gsExePath'"
# Add Ghostscript directory to the PATH temporarily so that ImageMagick can use it
$env:Path += ";$($gsExePath | Split-Path -Parent)"
# Create the output directory if it doesn't exist
if (-not (Test-Path $outputDirectory)) { New-Item -ItemType Directory -Force -Path $outputDirectory | Out-Null }
# Convert each page of the PDF to PNG
$imageNamePrefix = [System.IO.Path]::GetFileNameWithoutExtension($pdfFilePath)
$imageNamePrefix = $imageNamePrefix -replace '\s+', '_'
# Use ImageMagick's identify command to get the total number of pages in the PDF
$numberOfPages = (identify "$pdfFilePath" 2>$null | Measure-Object -Line).Lines
Write-Host "'$pdfFilePath' has $numberOfPages pages"
# Use ImageMagick's convert command to convert PDF
Start-Process $imageMagickExePath -ArgumentList "convert `"$pdfFilePath`" -density 600 -quality 100 -antialias -resize 300% `"$outputDirectory\$imageNamePrefix-%d.$imageExtension`"" -NoNewWindow -Wait
# Determine the maximum number of digits to normalise all page numbers to that length
$maxDigits = $numberOfPages.ToString().Length
# Normalize page numbers
for ($i = 0; $i -le $numberOfPages; $i++) {
$pageNumber = "{0:D$maxDigits}" -f $i
$oldFileName = Join-Path $outputDirectory "$imageNamePrefix-$i.$imageExtension"
$newFileName = Join-Path $outputDirectory "$imageNamePrefix-$pageNumber.$imageExtension"
if ((Test-Path $oldFileName) -and !(Test-Path $newFileName)) { Rename-Item -Path $oldFileName -NewName $newFileName }
# Remove the Ghostscript directory from the PATH
$env:Path = $env:Path -replace [regex]::Escape(";"+($gsExePath | Split-Path -Parent))
假设您想要 300 DPI PNG,最简单的方法是使用“drop on me”.CMD 文件或将链接放入“SendTo”文件夹中。无论哪种方式,都可以将一个文件即时导出为图像,可以很容易地适应文件文件夹。
使用https://github.com/oschwartz10612/poppler-windows提供的 2024 64 位版本的 Poppler PDFtoPPM 二进制文件
因此这个 12 页 PDF 将导出到同一工作文件夹。
对于更复杂的用法,然后扩展 CMD 文件以在包含多个文件和/或子文件夹的当前工作目录中运行。
GhostScript 的类似命令可能是这样的
根据 @KenS 评论,使用 %%04d 简化为 000# 位
输出的差异可以通过更改附加开关来调整,但正如上面的 2 个命令所示,GhostScript(左下)会生成更紧凑的文件。
继原始帖子之后,在 PowerShell 中,对于包含一些 PDF 的文件夹,以下内容将循环遍历并提取每个 PDF 的 PNG。
以及包含 PDF 的文件夹的位置。请注意,对于gswin64c.exe
上面 CMD 中的编号,%
CMD 脚本内部,因此%%04d
对于 CMD 脚本,而%04d
在 PowerShell 中工作):