我有使用 pdf2htmlEX 和 HTMLpurifier 将 pdf 文档转换为文本格式的 PHP 应用程序。转换过程包括几个步骤:
1. 使用网络浏览器上传书籍
2. 使用 pdf2htmlex 从 pdf 转换为 txt
3. 使用 HTMLPurifer 处理 txt 文件
对于大多数文档,一切正常,但对于某些页数很多(超过 230 页)的文档,第 3 步失败。当 HTMLpurifier 处理页面时,它会引发错误:“PHP 致命错误:超过 0 秒的最大执行时间”。在我的配置中,max_execution_time 设置为 0。我已将 strace 附加到 Apache 进程,这是终止前的输出:
lstat("/tmp/books/3349/html/78.page", {st_mode=S_IFREG|0644, st_size=40165, ...}) = 0
open("/tmp/books/3349/html/78.page", O_RDONLY) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=40165, ...}) = 0
lseek(20, 0, SEEK_CUR) = 0
fstat(20, {st_mode=S_IFREG|0644, st_size=40165, ...}) = 0
read(20, "<div class=\"pd w1 h1\"><div id=\"p"..., 8192) = 8192
read(20, "AACAsAQAAQFgCAAAgLAEAABCWAAAACEs"..., 8192) = 8192
read(20, "7\"><span class=\"_ _1f\"> </span>F"..., 8192) = 8192
read(20, "class=\"_ _8\"> </span>of<span cla"..., 8192) = 8192
read(20, "/span></div><div class=\"t m1 x7a"..., 8192) = 7397
read(20, "", 8192) = 0
read(20, "", 8192) = 0
close(20) = 0
lstat("/tmp/books/3349/text/78.txt", 0x7fff115a43f0) = -1 ENOENT (No such file or directory)
open("/tmp/books/3349/text/78.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lseek(20, 0, SEEK_CUR) = 0
write(20, "66 2. TOPOSESa \357\254\201xed space is a"..., 2157) = 2157
close(20) = 0
lstat("/tmp/books/3349/html/79.page", {st_mode=S_IFREG|0644, st_size=48214, ...}) = 0
open("/tmp/books/3349/html/79.page", O_RDONLY) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=48214, ...}) = 0
lseek(20, 0, SEEK_CUR) = 0
fstat(20, {st_mode=S_IFREG|0644, st_size=48214, ...}) = 0
read(20, "<div class=\"pd w1 h1\"><div id=\"p"..., 8192) = 8192
read(20, "AWAIAACAsAQAAYN5hAoBPSWIEdtXWCAD"..., 8192) = 8192
read(20, "=\"_ _0\"></span>oof<span class=\"f"..., 8192) = 8192
read(20, "c\"></span>).</span></div><div cl"..., 8192) = 8192
read(20, "lass=\"_ _23\"> </span>sho<span cl"..., 8192) = 8192
read(20, "ls0 ws0 r0\">F<span class=\"ff4\"><"..., 8192) = 7254
read(20, "", 8192) = 0
read(20, "", 8192) = 0
close(20) = 0
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
有趣的是 - 我在同一系统配置中有两个环境 - 一个在 AWS 中,另一个 VM 在 VirtualBox 中。两者都有 Ubuntu 12.04 + Apache 2.2 + PHP 5.4.13,配置设置相同,但问题仅出现在 AWS 节点上。任何想法?