Chrome 一直要求保存 PDF 并且从不打开它

Question

yankee

Asked: 2023-12-27 15:20:19 +0800 CST2023-12-27 15:20:19 +0800 CST 2023-12-27 15:20:19 +0800 CST

如何向扫描的 PDF 文档添加文本（以启用搜索以及复制和粘贴）？

772

我有 Microsoft Word 文档。有人打印了该文件，签署了它（是的。用笔。非常怀旧。）然后扫描它。显然，这个过程将文档变成了图像，使得搜索它们或从中复制和粘贴文本变得困难。

我尝试了 OCR 工具，运行该工具后，文档在视觉上与扫描件完全相同，我可以搜索、复制和粘贴文本。然而，检查 OCR 错误很麻烦，我什至不知道如何纠正我发现的任何错误。而且这似乎完全没有必要，因为我仍然有原始的 Word 文档。

如何嵌入或以其他方式组合扫描文档和原始 Word 文档，以便您看到扫描件，但文本选择和搜索的行为与原始 Word 文档类似？

首选基于在 Linux 上离线工作的开源软件（pdftk、qpdf...）的解决方案。

2 个回答

Voted

K J · Answer 1 · 2023-12-28T00:48:05+08:00

由于文档的重点是它是源代码的签名副本（否则没有必要添加签名）。

然后，您需要将签名返回到应签名的位置，这意味着将其添加回源 DocX，就像在 Word 中签名一样，可以将其存档为真正的 PDF 副本。对于 Linux，您显然需要使用 Open 或 LibreOffice。否则，您需要将扫描添加到 DocX 的媒体文件夹中，并对文档 XML 进行高度精细的添加。

这样，毫无疑问它是可搜索的源签名文档，无需担心 OCR 损坏或降级。

yankee · Answer 2 · 2023-12-29T01:53:58+08:00

pdftk您可以使用和命令获得所需的结果multistamp。

首先将 M$-Word 文档导出为 PDF 文件，document.pdf并将签名文件导出为document_signed.pdf. 然后将两个文档合并如下：

pdftk document.pdf multistamp document_signed.pdf output document_signed_searchable.pdf

这将创建一个document_signed_searchable.pdf具有您想要的功能的文件。

以下是手册的相关摘录：

background <background PDF filename | - | PROMPT>
    Applies a PDF watermark to the background of a single input PDF.  Pass the background PDF's filename after background like so:
  
    pdftk in.pdf background back.pdf output out.pdf
  
    Pdftk uses only the first page from the background PDF and applies it to every page of the input PDF.  This page is scaled and rotated as needed to fit the input page.  You can use - to pass a background PDF into pdftk via stdin.
  
    If the input PDF does not have a transparent background (such as a PDF created from page scans) then the resulting background won't be visible -- use the stamp operation instead.
  
multibackground <background PDF filename | - | PROMPT>
    Same as the background operation, but applies each page of the background PDF to the corresponding page of the input PDF.  If the input PDF has more pages than the stamp PDF, then the final stamp page is repeated across these remaining pages in the input PDF.
  
stamp <stamp PDF filename | - | PROMPT>
    This behaves just like the background operation except it overlays the stamp PDF page on top of the input PDF document's pages.  This works best if the stamp PDF page has a transparent background.

multistamp <stamp PDF filename | - | PROMPT>
    Same as the stamp operation, but applies each page of the background PDF to the corresponding page of the input PDF.  If the input PDF has more pages than the stamp PDF, then the final stamp page is repeated across these remaining pages in the input PDF.

如何向扫描的 PDF 文档添加文本（以启用搜索以及复制和粘贴）？

如何减少“vmmem”进程的消耗？

从 Microsoft Stream 下载视频

Google Chrome DevTools 无法解析 SourceMap：chrome-extension

Windows 照片查看器因为内存不足而无法运行？

支持结束后如何激活 WindowsXP？

远程桌面间歇性冻结

子网掩码 /32 是什么意思？

鼠标指针在 Windows 中按下的箭头键上移动？

VirtualBox 无法以 VERR_NEM_VM_CREATE_FAILED 启动

应用程序不会出现在 MacBook 的摄像头和麦克风隐私设置中

如何向扫描的 PDF 文档添加文本（以启用搜索以及复制和粘贴）？

2 个回答

相关问题