AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / server / 问题 / 972680
Accepted
Parker
Parker
Asked: 2019-06-25 08:40:45 +0800 CST2019-06-25 08:40:45 +0800 CST 2019-06-25 08:40:45 +0800 CST

上传大文件时偶尔出现 mod_jk 客户端错误

  • 772

注意:此问题末尾有一个 TL;DR。

我有一个夜间备份脚本,它已经可靠地工作了 6 年,两个月前,它开始无法上传到我的存档服务器(只有大约四分之一的尝试成功)。一旦备份文件的大小超过大约 440MB(成功上传需要 20-40 秒才能完成),这种情况就开始发生。其他较小的文件 (>200MB) 每次都成功上传。故障的开始与有问题的服务器的任何配置更改或更新不一致。

以下是我正在上传的文件的详细信息:

-rw-r--r-- 1 backup backup 486948951 Jun 12 13:20 snapshot-20190612.zip

这些文件通过 curl 上传到我的服务器PUT(请参见下面的命令)。源服务器是弗吉尼亚州的 Amazon Linux 实例,目标服务器是俄亥俄州的 Amazon Linux 实例(均位于美国东部地区)。

处理上传的 Web 服务是Apache HTTPD 2.4.39 + mod24_ssl/OpenSSL 1.0.2k -> mod_jk 1.2.42 -> Apache Tomcat 8.5.40 -> Artifactory 5.3.2实例。


我在其中一个错误发生之前和之后捕获了jk_status页面(之前/之后在动画 GIF 中以突出显示差异)。该错误被mod_jk解释为客户端错误。

错误前后的 jk_status


我的 mod/jk workers.properties文件:

worker.list=main_ajp13_worker
# Set properties for main_ajp13_worker (ajp13)
worker.main_ajp13_worker.type=ajp13
worker.main_ajp13_worker.host=localhost
worker.main_ajp13_worker.port=8009

以下是上传失败后的控制台输出(已清除授权详细信息),然后是成功上传:

-bash-4.2$ curl -v -k -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
*   Trying 123.222.100.132...
* TCP_NODELAY set
* Connected to repo.mydomain.org (123.222.100.132) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.mydomain.org
*  start date: May 28 00:00:00 2019 GMT
*  expire date: May 27 23:59:59 2021 GMT
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
* Server auth using Basic with user 'backup'
> PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1
> Host: repo.mydomain.org
> Authorization: Basic DELETED_TOKEN
> User-Agent: curl/7.61.1
> Accept: */*
> X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3
> X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10
> Content-Length: 486948951
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
* Closing connection 0
curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104

-bash-4.2$ curl -v -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
*   Trying 123.222.100.132...
* TCP_NODELAY set
* Connected to repo.mydomain.org (123.222.100.132) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.mydomain.org
*  start date: May 28 00:00:00 2019 GMT
*  expire date: May 27 23:59:59 2021 GMT
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
* Server auth using Basic with user 'backup'
> PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1
> Host: repo.mydomain.org
> Authorization: Basic DELETED_TOKEN
> User-Agent: curl/7.61.1
> Accept: */*
> X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3
> X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10
> Content-Length: 486948951
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
* Closing connection 0
curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104

通过第二次(或者通常是第三次或第四次)尝试上传,它最终会成功:

-bash-4.2$ curl -v -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
*   Trying 123.222.100.132...
* TCP_NODELAY set
* Connected to repo.mydomain.org (123.222.100.132) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.mydomain.org
*  start date: May 28 00:00:00 2019 GMT
*  expire date: May 27 23:59:59 2021 GMT
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
* Server auth using Basic with user 'backup'
> PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1
> Host: repo.mydomain.org
> Authorization: Basic DELETED_TOKEN
> User-Agent: curl/7.61.1
> Accept: */*
> X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3
> X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10
> Content-Length: 486948951
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 201 201
< Date: Wed, 12 Jun 2019 13:22:46 GMT
< Server: Apache
< Strict-Transport-Security: max-age=15768000
< X-Artifactory-Id: 9e49bf90cd4b7700:7ba45cf1:16b1db56e4d:-8000
< Location: https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
< Transfer-Encoding: chunked
< Content-Type: application/vnd.org.jfrog.artifactory.storage.itemcreated+json;charset=ISO-8859-1
<
{
  "repo" : "resource-snapshot",
  "path" : "/snapshot/20190612/snapshot-20190612.zip",
  "created" : "2019-06-12T13:23:05.805Z",
  "createdBy" : "backup",
  "downloadUri" : "https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip",
  "mimeType" : "application/zip",
  "size" : "486948951",
  "checksums" : {
    "sha1" : "6b9525bf9bb572ec7f70d0d474cb6533e0091bd3",
    "md5" : "6da0c444ad1b53dc1cf06784e204cb10"
  },
  "originalChecksums" : {
    "sha1" : "6b9525bf9bb572ec7f70d0d474cb6533e0091bd3",
    "md5" : "6da0c444ad1b53dc1cf06784e204cb10"
  },
  "uri" : "https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip"
* Connection #0 to host repo.mydomain.org left intact
}

服务器日志不是特别有用。以下是上传失败时的相关日志:

/var/log/httpd/ssl_request_log

[24/Jun/2019:12:17:45 +0000] 52.1.100.190 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1" -

/var/log/httpd/repo.mydomain.org-ssl_access_log

140.1.100.101 - - [24/Jun/2019:12:17:45 +0000] "PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1" 400 -

catalina.out

没有日志条目。

/var/log/httpd/mod_jk.log

这些是 mod/jk 日志级别设置为info时的日志:

[Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] ajp_read_into_msg_buff::jk_ajp_common.c (1553): (main_ajp13_worker) receiving data from client failed. Connection aborted or network problems
[Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] ajp_process_callback::jk_ajp_common.c (2105): (main_ajp13_worker) Reading from client aborted or client network problems
[Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] ajp_service::jk_ajp_common.c (2778): (main_ajp13_worker) sending request to tomcat failed (unrecoverable), because of client read error (attempt=1)
[Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] jk_handler::mod_jk.c (2988): Aborting connection for worker=main_ajp13_worker

这是调试级别的 240 万个 mod_jk 日志条目的相关部分,从 HTTP 100 Continue 开始并显示文件上传的最开始(您可以看到 zip 文件的幻数,50 4B 03 04 ):

[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0160    32 65 32 00 A0 08 00 09 34 39 36 32 37 38 36 30  - 2e2.....49627860
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0170    33 00 00 06 45 78 70 65 63 74 00 00 0C 31 30 30  - 3...Expect...100
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0180    2D 63 6F 6E 74 69 6E 75 65 00 0A 00 0F 41 4A 50  - -continue....AJP
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0190    5F 52 45 4D 4F 54 45 5F 50 4F 52 54 00 00 05 33  - _REMOTE_PORT...3
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01a0    32 39 37 30 00 0A 00 0E 41 4A 50 5F 4C 4F 43 41  - 2970....AJP_LOCA
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01b0    4C 5F 41 44 44 52 00 00 0C 31 37 32 2E 33 31 2E  - L_ADDR...172.31.
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01c0    31 33 2E 39 36 00 0A 00 10 4A 4B 5F 4C 42 5F 41  - 13.96....JK_LB_A
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01d0    43 54 49 56 41 54 49 4F 4E 00 00 03 41 43 54 00  - CTIVATION...ACT.
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01e0    0A 00 0F 53 53 4C 5F 43 4C 49 45 4E 54 5F 43 45  - ...SSL_CLIENT_CE
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01f0    52 54 00 00 0F 53 53 4C 5F 43 4C 49 45 4E 54 5F  - RT...SSL_CLIENT_
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0200    43 45 52 54 00 FF 00 00 00 00 00 00 00 00 00 00  - CERT............
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_send_request::jk_ajp_common.c (1782): (main_ajp13_worker) request body to send 496278603 - request body to resend 0
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_send_request::jk_ajp_common.c (1886): (main_ajp13_worker) sending 8186 bytes of request body
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): sending to ajp13 pos=4 len=8192 max=8192
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0000    12 34 1F FC 1F FA 50 4B 03 04 14 00 00 00 08 00  - .4....PK........
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0010    84 61 D8 4E 73 67 93 AF 7C 0A 00 00 E0 B0 00 00  - .a.Nsg..|.......
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0020    0A 00 1C 00 62 61 63 6B 75 70 2E 6C 6F 67 55 54  - ....backup.logUT
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0030    09 00 03 18 BE 10 5D 90 BD 10 5D 75 78 0B 00 01  - ......]...]ux...
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0040    04 1A 00 00 00 04 1A 00 00 00 ED 5C 4D 6F DC C8  - ...........\Mo..
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0050    11 BD E7 57 34 74 4A 80 D8 80 77 17 8B 45 80 1C  - ...W4tJ...w..E..

.... [lots of encoded data trimmed out here, the next part shows re-filling the 8k buffer] ....

[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): received from ajp13 pos=0 len=3 max=8192E D7 E9 E6 DD BB  - 8...U?.er.>.....
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): 00000   06 1F FA 00 00 00 00 00 00 00 00 00 00 00 00 00  - .................
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): sending to ajp13 pos=4 len=8192 max=8192 2E 40 41 4C 94  - ..$z~YW.?1G.@AL.
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0000    12 34 1F FC 1F FA C9 62 3E 48 C5 70 E9 05 E1 8D  - .4.....b>H.p....

.... [lots of encoded data trimmed out here, the next part shows where re-filling the 8k buffer fails] ....

[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 03e0    2D E5 36 F3 8C 12 8C 51 69 66 21 AE A5 58 BE 6E  - -.6....Qif!..X.n
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 03f0    57 F3 F4 EC CB E9 49 9B 59 44 AB FE 66 6E 67 DC  - W.....I.YD..fng.
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): received from ajp13 pos=0 len=3 max=8192
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): 0000   06 1F FA 00 00 00 00 00 00 00 00 00 00 00 00 00  - ................
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] ajp_read_into_msg_buff::jk_ajp_common.c (1553): (main_ajp13_worker) receiving data from client failed. Connection aborted or network problems
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] ajp_process_callback::jk_ajp_common.c (2105): (main_ajp13_worker) Reading from client aborted or client network problems
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] ajp_service::jk_ajp_common.c (2778): (main_ajp13_worker) sending request to tomcat failed (unrecoverable), because of client read error (attempt=1)
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_reset_endpoint::jk_ajp_common.c (851): (main_ajp13_worker) resetting endpoint with socket 24 (socket shutdown)
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_abort_endpoint::jk_ajp_common.c (821): (main_ajp13_worker) aborting endpoint with socket 24
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] jk_shutdown_socket::jk_connect.c (932): About to shutdown socket 24 [127.0.0.1:38678 -> 127.0.0.1:8009]
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] jk_shutdown_socket::jk_connect.c (1005): shutting down the read side of socket 24 [127.0.0.1:38678 -> 127.0.0.1:8009]
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] jk_shutdown_socket::jk_connect.c (1016): Shutdown socket 24 [127.0.0.1:38678 -> 127.0.0.1:8009] and read 265 lingering bytes in 0 sec.
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_done::jk_ajp_common.c (3287): recycling connection pool for worker main_ajp13_worker and socket -1
[Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] jk_handler::mod_jk.c (2988): Aborting connection for worker=main_ajp13_worker
[Mon Jun 24 14:38:24 2019] [17825:139891516274752] [debug] wc_shutdown::jk_worker.c (390): Shutting down worker main_ajp13_worker
[Mon Jun 24 14:38:24 2019] [17825:139891516274752] [debug] jk_shm_close::jk_shm.c (745): Closed shared memory /var/log/httpd/mod_jk.shm.17823 childs=9
[Mon Jun 24 14:38:24 2019] [17877:139891516274752] [debug] wc_shutdown::jk_worker.c (390): Shutting down worker main_ajp13_worker

查看AJPv3 文档,我们看到了 8k 块的解释:

容器从请求中请求更多数据(如果主体太大而无法放入发送的第一个数据包或请求被分块时)。服务器将发回一个正文数据包,其中包含的数据量是 request_length、最大发送正文大小 (8186 (8 Kbytes - 6)) 和实际剩余要从请求正文发送的字节数中的最小值。如果 body 中没有更多数据(即 servlet 容器试图读取超过 body 的末尾),服务器将发回一个“空”数据包,这是一个有效负载长度为 0 的 body 数据包。( 0x12,0x34,0x00,0x00)

在上述日志中,我统计了重新填充 8k 缓冲区的次数(36,856 次),发现在错误发生之前传输了 301,703,216 字节(287 MB,或 62%)的文件。在反复尝试时,这个百分比变化很大,但通常超过 300MB,有时会成功完成所有 440MB。


我对以上日志的解读

SSL 握手成功,错误发生在 HTTP 100 Continue 之后,所以我怀疑该文件在mod_jk.log中的以下消息之前已部分发送:

  • 从客户端接收数据失败。连接中止或网络问题
  • 由于客户端读取错误(尝试=1),向 tomcat 发送请求失败(不可恢复)

由于catalina.out没有错误,并且 mod_jk 抱怨客户端读取错误,我相信这排除了 Tomcat 本身(以及 Web 应用程序Artifactory)。

我在谷歌搜索中看到的大多数关于上述 mod_jk 消息的评论都表明它们是暂时的网络错误,但我不相信,因为这些是同一区域中 EC2 实例之间的文件传输。我也将此解释为与连接超时没有任何关系,因为(我认为)会产生“连接在 n 毫秒后超时”消息。


更改 mod_jk 和 Tomcat 配置

我尝试过的其他事情(这些变化是累积的,结果没有明显的变化):

  • 将 mod_jk 升级到最新版本 (1.2.46)
  • 将工作包大小从 8,196(默认)增加到 16,384
  • 工作数据包大小从 16,384 增加到 32,768
  • 将 JkShmSize 从 1kB(默认)增加到 1GB(哎呀,值以 kB 为单位)
  • 将 JkShmSize 从 1GB 减少到 1MB
  • 将工作人员 JkShmSize 返回到默认值
  • 将工作包大小返回为默认值
  • 将工作重试次数从 2(默认)增加到 5
  • 将 reply_timeout 从永远(默认)减少到 1000 毫秒(这实际上在成功上传期间导致底层服务出现一些问题)
  • 返回的工人重试默认值
  • 将 reply_timeout 返回到默认值
  • 将 socket_keepalive 从 false(默认)更改为 true
  • 将 socket_keepalive 恢复为默认值
  • 将连接器协议从 org.apache.coyote.http11.Http11NioProtocol 更改为 org.apache.coyote.http11.Http11Nio2Protocol
  • 将 Tomcat 连接超时从 20 秒 (20000) 增加到 60 秒 (60000)
  • 将 Tomcat connectionUploadTimeout 设置为 60 秒 (60000)
  • 将 Tomcat disableUploadTimeout 设置为 false
  • 将 connectionUploadTimeout 和 disableUploadTimeout 返回为默认值
  • 通过 -H "Transfer-Encoding:chunked" 强制分块传输
  • 将worker属性设置为:retries=2,socket_timeout=100,reply_timeout=30000
  • 将工作人员属性恢复为默认值。
  • 在 JkOptions 中添加了 +DisableReuse。
  • 将 JkOptions 恢复为默认值。
  • 为 repo 和 repo-access 虚拟主机创建了单独的工作人员。
  • 删除了非 SSL 虚拟主机条目,只留下两个 SSL 虚拟主机。

注意:分离工作人员至少隔离了工作人员消息,所以我现在知道错误完全发生在 repo web 服务上,而不是 repo-access web 服务上:

[Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] ajp_read_into_msg_buff::jk_ajp_common.c (1550): (repo) receiving data from client failed. Connection aborted or network problems
[Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] ajp_process_callback::jk_ajp_common.c (2102): (repo) Reading from client aborted or client network problems
[Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] ajp_service::jk_ajp_common.c (2775): (repo) sending request to tomcat failed (unrecoverable), because of client read error (attempt=1)
[Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] jk_handler::mod_jk.c (2984): Aborting connection for worker=repo

更改 HTTP(S) 客户端

我能够使用 wget 而不是 curl 重现该问题,这意味着这不是客户端软件的一些深奥问题:

-bash-4.2$ wget -v --tries=1 --user=backup --password=PASSWORD --header=X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 --header=X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 --method=PUT --body-file=snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip

--2019-06-24 19:59:16--  https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
Resolving repo.mydomain.org (repo.mydomain.org)... 123.222.100.132
Connecting to repo.mydomain.org (repo.mydomain.org)|123.222.100.132|:443... connected.
HTTP request sent, awaiting response... 401 401
Authentication selected: Basic realm="Artifactory Realm"
Reusing existing connection to repo.mydomain.org:443.
Giving up.

在本地主机上测试

我已经在存档服务器(https://localhost:443/)上测试了从本地主机上传,这需要将-k标志添加到 curl 以跳过证书主机名验证,并且每次上传都成功(11秒)。但是,添加--limit-rate 20M会导致每次上传都失败(它适用于--limit-rate 25M)。


绕过 Apache HTTPD、mod_jk 和 SSL 有效

如果我绕过 Apache HTTPD、mod_jk 和 SSL 并直接通过端口 8080,上传总是会成功:

[ec2-user@repo ~]$ curl -v -k --limit-rate 1024K -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip http://localhost:8080/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
* Server auth using Basic with user 'backup'
> PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1
> Host: localhost:8080
> Authorization: Basic DELETED_TOKEN
> User-Agent: curl/7.61.1
> Accept: */*
> X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3
> X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10
> Content-Length: 496278603
> Expect: 100-continue
>
< HTTP/1.1 100
* We are completely uploaded and fine
< HTTP/1.1 201
< Server: Artifactory/5.3.2
...

仅绕过SSL失败

通过没有 SSL 的 Apache HTTPD + mod_jk,我仍然能够复制问题。这意味着该问题与 SSL 无关。

[ec2-user@repo ~]$ curl -v -k --limit-rate 1024K -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip http://localhost/resource-snapshot/snapshot/20190612/snapshot-20190612.zip
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
* Server auth using Basic with user 'backup'
> PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1
> Host: localhost:8080
> Authorization: Basic DELETED_TOKEN
> User-Agent: curl/7.61.1
> Accept: */*
> X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3
> X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10
> Content-Length: 496278603
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

绕过 HTTPD 和 mod_jk

我尝试通过端口 8443(仍然使用 SSL)直接上传到 Tomcat 来绕过 Apache HTTPD 和 mod_jk。这些请求每次都成功。这使我相信问题出在 Apache HTTPD 和/或 mod_jk 上,而不是由“瞬态网络错误”或防火墙引起的。


TL;博士

上传失败仅在通过 HTTPD + mod_jk 时发生,而不是在直接与 Tomcat 通信时发生。SSL 已被排除为问题的潜在根源。这不是网络问题,因为从 localhost 上传时该错误是可重现的。该问题似乎仅与传输文件所需的时间有关,与文件大小无关。


我的问题

我应该为 HTTPD 和/或mod_jk、workers.properties甚至Tomcat寻找任何配置选项来消除这些问题吗?

或者,有没有办法告诉mod_jk尝试以容错方式处理这些失败的缓冲区重新填充,以便上传可以成功完成?

apache-2.4
  • 1 1 个回答
  • 1251 Views

1 个回答

  • Voted
  1. Best Answer
    Boobooma
    2019-08-01T06:20:09+08:002019-08-01T06:20:09+08:00

    您是否尝试将 Apache/modSSL 降级到 2.4.38?

    自从我们升级到 2.4.39 后,我们遇到了文件上传问题。

    • 1

相关问题

  • Apache2 - SSL 不工作

  • Apache <VirtualHost> 标签,如果客户端的 IP = x 则不重定向... 怎么做?

  • 我无法启用我的网站?[关闭]

  • SELinux 阻止 Apache 写入文件

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve