我有一个 HAProxy 2.8.4,用于代理多个 URL 路径和多个不同后端上的多个 HTTPS 服务,还有一个基于 TCP 的 PostgreSQL 集群。这是完整的 haproxy -vvv 输出:
HAProxy version 2.8.4-a4ebf9d 2023/11/17 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.4.html
Running on: Linux 4.18.0-513.5.1.el8_9.x86_64 #1 SMP Fri Sep 29 05:21:10 EDT 2023 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
OPTIONS = USE_THREAD=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1
DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS
Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT +PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=2).
Built with OpenSSL version : OpenSSL 1.1.1k FIPS 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-20)
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : none
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
这是我的配置,省略了不相关的 Postgre 内容和其他 HTTPS URL 路径:
lobal
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/haproxy.sock mode 600 level admin
pidfile /var/run/haproxy.pid
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option dontlognull
retries 2
timeout connect 4s
timeout client 30m
timeout server 30m
timeout check 5s
# Temporary detailed logging
log-format "Client IP:port = [%ci:%cp], Start Time = [%tr], Frontend Name = [%ft], Backend Name = [%b], Backend Server = [%s], Time to receive full request = [%TR ms], Response time = [%Tr ms], Status Code = [%ST], Bytes Read = [%B], Request = [%{+Q}r], Request Body = [%[capture.req.hdr(0)]]"
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend main
timeout client 86400000
bind :443 ssl crt /etc/haproxy/haproxy.crt
option http-buffer-request
declare capture request len 40000
http-request capture req.body id 0
capture request header origin len 128
# Many other URL mappings and other use_backend directives omitted here
acl url_apigee path_beg -i /apigee-connector
use_backend voda-apigee-conn-be if url_apigee
default_backend deny_be
# Many other backend definitions omitted here
backend voda-apigee-conn-be
balance roundrobin
option httpchk
http-check send meth GET uri /actuator/health
server api1 x.x.x.x:8002 check inter 10s fall 3 rise 2 ssl verify none
server api2 y.y.y.y:8002 check inter 10s fall 3 rise 2 ssl verify none
backend deny_be
http-request deny
当我直接使用 cURL 调用后端时,我得到 HTTP 200 响应,并且它也在内部执行预期的操作:
curl -vvv -k -w "@curl-format.txt" -X POST -H "X-API-Key: my-api-key-1" -H "Content-Type: application/json" -d @apigee-conn-email3.json https://x.x.x.x:8002/apigee-connector/outbound-communication
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying x.x.x.x...
* TCP_NODELAY set
* Connected to x.x.x.x (x.x.x.x) port 8002 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: <omitted>
* start date: Dec 8 07:51:33 2023 GMT
* expire date: Dec 7 07:51:32 2025 GMT
* issuer: <omitted>
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> POST /apigee-connector/outbound-communication HTTP/1.1
> Host: x.x.x.x:8002
> User-Agent: curl/7.61.1
> Accept: */*
> X-API-Key: my-api-key-1
> Content-Type: application/json
> Content-Length: 344
>
* upload completely sent off: 344 out of 344 bytes
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200
< Server: nginx
< Date: Wed, 28 Feb 2024 16:38:26 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< Expires: 0
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Set-Cookie: JSESSIONID=YWLvqrZAAMHBgbOLK9Q-nz2vrxhs9mKw_Yt92lQ5.cgwcsmfuatapp1; path=/; secure; HttpOnly
< X-XSS-Protection: 1; mode=block
< Pragma: no-cache
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
< Strict-Transport-Security: max-age=31536000 ; includeSubDomains
< RequestId: 59d32aad-4294-4811-a385-a4e65d68065f
< Quota-Reset: 1709139600925
< Quota-Allowed: 10000
< Quota-Available: 9998
< Content-Type: application/json
< Transfer-Encoding: chunked
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection #0 to host x.x.x.x left intact
{"id":"94440760","attachment":{"id":{"value":"DOC-20240228-173826-WNHPC"}}}
time_namelookup: 0.000076s
time_connect: 0.000414s
time_appconnect: 0.026246s
time_pretransfer: 0.026375s
time_redirect: 0.000000s
time_starttransfer: 2.132503s
----------
time_total: 2.133572s
当我通过 HAProxy 调用相同的方法时:
curl -vvv -k -w "@curl-format.txt" -X POST -H "X-API-Key: my-api-key-1" -H "Content-Type: application/json" -d @apigee-conn-email3.json https://z.z.z.z/apigee-connector/outbound-communication
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying z.z.z.z...
* TCP_NODELAY set
* Connected to z.z.z.z (z.z.z.z) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: <omitted>
* start date: Jan 31 20:50:09 2024 GMT
* expire date: Jan 30 20:50:08 2026 GMT
* issuer: <omitted>
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> POST /apigee-connector/outbound-communication HTTP/1.1
> Host: z.z.z.z
> User-Agent: curl/7.61.1
> Accept: */*
> X-API-Key: my-api-key-1
> Content-Type: application/json
> Content-Length: 344
>
* upload completely sent off: 344 out of 344 bytes
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* HTTP 1.0, assume close after body
< HTTP/1.0 502 Bad Gateway
< cache-control: no-cache
< content-type: text/html
<
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
* TLSv1.3 (IN), TLS alert, [no content] (0):
* TLSv1.3 (IN), TLS alert, close notify (256):
* Closing connection 0
* TLSv1.3 (OUT), TLS alert, [no content] (0):
* TLSv1.3 (OUT), TLS alert, close notify (256):
time_namelookup: 0.000073s
time_connect: 0.001454s
time_appconnect: 0.023841s
time_pretransfer: 0.023926s
time_redirect: 0.000000s
time_starttransfer: 2.589714s
----------
time_total: 2.589899s
HAProxy 日志:
Feb 28 18:01:38 localhost haproxy[3223599]: Client IP:port = [a.a.a.a:34660], Start Time = [28/Feb/2024:18:01:36.068], Frontend Name = [main~], Backend Name = [voda-apigee-conn-be], Backend Server = [api2], Time to receive full request = [0 ms], Response time = [-1 ms], Status Code = [502], Bytes Read = [208], Request = ["POST https://z.z.z.z/apigee-connector/outbound-communication HTTP/2.0"], Request Body = [<JSON body omitted>]
我总是得到 502 Bad Gateway,但是,后端仍然按预期完全执行请求,并且它说它生成了响应......至少我在这两种情况下在应用程序级别上看到完全相同的日志消息。
我注意到 HAProxy 切换到 HTTP/2,我尝试使用 alpn 指令在 HAProxy 配置中强制使用 HTTP 1.1,并在 cURL 命令行中使用 --http1.1 标志强制使用 HTTP 1.1,然后它是 HTTP 1.1,但仍然是 502 Bad Gateway 。
这里可能出了什么问题?
- - - 更新 - - -
在 AlexD 发表评论后,我修改了我的日志记录,我基本上添加了我能找到的所有 %Tx 参数,只是添加了人类可读的名称,因为我永远不会知道什么是 Tr 和 TR 和 Th 等:
log-format "Client IP:port = [%ci:%cp], Start Time = [%tr], Frontend Name = [%ft], Backend Name = [%b], Backend Server = [%s], Active time of the request = [%Ta ms], Time to establish TCP connection to the server = [%Tc ms], SSL handshake time = [%Th ms], Idle time before the HTTP request = [%Ti ms], Time to get the client's request = [%Tq ms], Time to receive full request = [%TR ms], Response time = [%Tr ms], Total session duration time = [%Tt ms], Status Code = [%ST], Bytes Read = [%B], Termination state = [%ts], Request = [%{+Q}r], Request Body = [%[capture.req.hdr(0)]]"
此后的新日志:
Feb 29 08:46:33 localhost haproxy[3388761]: Client IP:port = [10.215.30.29:37666], Start Time = [29/Feb/2024:08:46:31.067], Frontend Name = [main~], Backend Name = [voda-apigee-conn-be], Backend Server = [api1], Active time of the request = [2092 ms], Time to establish TCP connection to the server = [8 ms], SSL handshake time = [20 ms], Idle time before the HTTP request = [0 ms], Time to get the client's request = [20 ms], Time to receive full request = [0 ms], Response time = [-1 ms], Total session duration time = [2112 ms], Status Code = [502], Bytes Read = [208], Termination state = [PH], Request = ["POST /apigee-connector/outbound-communication HTTP/1.1"], Request Body = [<omitted>]
这似乎是有用的信息。终止状态是“PH”,这似乎意味着响应在标头处理期间被阻塞。这很奇怪,我不认为我们的后端应用程序返回任何无效的标头,您可以看到上面的直接 cURL 请求响应。我尝试在前端设置 http-response strict-mode 关闭,但它并没有改变行为。我也没有找到有关此“PH”终止状态的更多信息。
----- 更新 2 -----
我设法通过调用 systemctl reload haproxy 来启用管理套接字,但不知道为什么需要这样做。但查询“显示错误”后,它显示的是:
[admin@ccaas1t-postgres-t1 ~]$ echo "show errors" | sudo socat stdio /run/haproxy/haproxy.sock
Total events captured on [29/Feb/2024:11:24:00.722] : 1
[29/Feb/2024:11:23:53.236] backend voda-apigee-conn-be (#13): invalid response
frontend main (#5), server api1 (#1), event #0, src x.x.x.x:60930
buffer starts at 0 (including 0 out), 15627 free,
len 757, wraps at 16336, error at position 665
H1 connection flags 0x80000000, H1 stream flags 0x00004810
H1 msg state MSG_HDR_L2_LWS(24), H1 msg flags 0x00011654
H1 chunk len 0 bytes, H1 body len 0 bytes :
00000 HTTP/1.1 200 \r\n
00015 Server: nginx\r\n
00030 Date: Thu, 29 Feb 2024 10:23:53 GMT\r\n
00067 Transfer-Encoding: chunked\r\n
00095 Connection: keep-alive\r\n
00119 Expires: 0\r\n
00131 Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\n
00194 Set-Cookie: JSESSIONID=nGfILjPozmcVahj831vf21a6BXIxgqElGlE8zxqA.cgwcsm
00264+ fuatapp1; path=/; secure; HttpOnly\r\n
00300 X-XSS-Protection: 1; mode=block\r\n
00333 Pragma: no-cache\r\n
00351 X-Frame-Options: DENY\r\n
00374 X-Content-Type-Options: nosniff\r\n
00407 Strict-Transport-Security: max-age=31536000 ; includeSubDomains\r\n
00472 VFHU-RequestId: 20681405-102a-4eed-8eee-00ed8f3c37c8\r\n
00526 VFHU-Quota-Reset: 1709204400244\r\n
00559 VFHU-Quota-Allowed: 10000\r\n
00586 VFHU-Quota-Available: 9999\r\n
00614 Content-Type: application/json\r\n
00646 Transfer-Encoding: chunked\r\n
00674 \r\n
00676 4b\r\n
00680 {"id":"94464170","attachment":{"id":{"value":"DOC-20240229-112353-OTXV
00750+ J"}}}\r\n
“位置 665 处出错”。位置665是“Transfer-Encoding:chunked”头中的冒号(:)。这对我来说看起来没问题,块的大小以 4b 十六进制形式发送,即十进制 75,这是 JSON 的大小。另外,我们不是手动构建此 HTTP 响应,它是由 Java 库完成的,所以我很确定这应该没问题。