我最近按照基于 RPM 的 Linux 发行版的说明在 AWS 上部署了 RabbitMQ 服务器。
RabbitMQ 3.8.14
Erlang 23.3.1
这是一个只有一个虚拟主机(“/”)的单节点部署,我主要使用默认配置。我还修改了打开文件的限制,最初为 65536,目前为 150000。虽然使用的文件描述符保持在相对较低的水平。
我有两个连接到 RabitMQ 的应用程序。每个应用程序都使用自己的用户名和密码进行身份验证。两个用户都对虚拟主机和他们使用的主题拥有完全的权限。
我已经设置了 7 个交换和队列。都很耐用。
服务器启动没有任何问题,应用程序运行正常,能够与服务器通信,没有问题并从主题中消费,但它们随机崩溃。
在 rabbitmq 日志文件中,我可以看到以下内容
2021-04-21 02:49:27.342 [info] <0.32135.4> connection <0.32135.4> (10.11.234.236:39453 -> 172.24.22.82:5672): user 'mes' authenticated and granted access to vhost '/'
2021-04-21 02:49:27.374 [info] <0.32138.4> connection <0.32138.4> (10.11.252.54:23576 -> 172.24.22.82:5672): user 'mes' authenticated and granted access to vhost '/'
2021-04-21 03:02:37.689 [error] <0.31757.4> closing AMQP connection <0.31757.4> (10.11.161.117:45741 -> 172.24.22.82:5672):
{writer,send_failed,{error,timeout}}
2021-04-21 03:02:37.690 [info] <0.32596.4> Closing all channels from connection '10.11.161.117:45741 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:02:48.136 [info] <0.32614.4> accepting AMQP connection <0.32614.4> (10.11.161.117:2496 -> 172.24.22.82:5672)
2021-04-21 03:02:48.142 [info] <0.32614.4> connection <0.32614.4> (10.11.161.117:2496 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:03:18.346 [error] <0.32614.4> closing AMQP connection <0.32614.4> (10.11.161.117:2496 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:03:18.347 [info] <0.32674.4> Closing all channels from connection '10.11.161.117:2496 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:03:30.140 [info] <0.32694.4> accepting AMQP connection <0.32694.4> (10.11.161.117:54985 -> 172.24.22.82:5672)
2021-04-21 03:03:30.144 [info] <0.32694.4> connection <0.32694.4> (10.11.161.117:54985 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:04:00.387 [error] <0.32694.4> closing AMQP connection <0.32694.4> (10.11.161.117:54985 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:04:00.395 [info] <0.32752.4> Closing all channels from connection '10.11.161.117:54985 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:04:14.035 [info] <0.5.5> accepting AMQP connection <0.5.5> (10.11.161.117:63900 -> 172.24.22.82:5672)
2021-04-21 03:04:14.040 [info] <0.5.5> connection <0.5.5> (10.11.161.117:63900 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:04:44.270 [error] <0.5.5> closing AMQP connection <0.5.5> (10.11.161.117:63900 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:04:44.271 [info] <0.56.5> Closing all channels from connection '10.11.161.117:63900 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:04:44.316 [error] <0.14.5> ** Generic server <0.14.5> terminating
** Last message in was {'$gen_cast',terminate}
** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.5.5>,<0.12.5>,<0.5.5>,<<"10.11.161.117:63900 -> 172.24.22.82:5672">>,undefined,{user,<<"cron">>,[],[{rabbit_auth_backend_internal,none}]},<<"/">>,<<>>,<0.6.5>,[{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true},{<<"basic.nack">>,bool,true},{<<"publisher_confirms">>,bool,true},{<<"consumer_cancel_notify">>,bool,true}],none,0,134217728,undefined,#{},1000000000},{lstate,<0.13.5>,false},none,5514,{5438,{[{5513,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265917}},{5512,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265916}},{5511,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265915}},{5510,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265914}},{5509,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265913}},{5508,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265912}},{5507,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265911}},{5506,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265910}},{5505,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,1618988684271,{<0.522.0>,265909}},{5504,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmai...">>,...},...],...}},...}
** Reason for termination ==
** noproc
2021-04-21 03:04:44.317 [info] <0.14.5> [{initial_call,{rabbit_channel,init,['Argument__1']}},{pid,<0.14.5>},{registered_name,[]},{error_info,{exit,noproc,[{gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1183}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}},{ancestors,[<0.11.5>,<0.9.5>,<0.4.5>,<0.3.5>,<0.853.0>,<0.852.0>,<0.851.0>,rabbit_sup,<0.274.0>]},{message_queue_len,38},{messages,[{'$gen_cast',{deliver,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,true,{{resource,<<"/">>,queue,<<"q.etm.mes.tmce_results">>},<0.522.0>,265918,true,{basic_message,{resource,<<"/">>,exchange,<<"x.etm.mes.tmce_results">>},[<<"test">>],{content,60,{'P_basic',<<"application/json">>,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<128,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110>>,rabbit_framing_amqp_0_9_1,[<<"{\"customerId\":5435103,\"campaignRunId\":66836,\"status\":\"Sent\",\"messageKey\":\"66836~5435103\",\"timeSent\":1618988531.764735000}">>]},<<225,81,222,134,43,120,7,4,135,105,190,34,66,200,149,86>>,false}}}},{'EXIT',<0.11.5>,shutdown},{'$gen_cast',{deliver,<<"TmceResultConsumer#void handleTmceResult(Envelope envelope,CampaignEmailResponseResource resource)">>,true,{{resource,<<"/">>,queue,<<"q.etm.mes.tmce_results">>},<0.522.0>,265919,true,{basic_message,{resource,<<"/">>,exchange,<<"x.etm.mes.tmce_results">>},[<<"test">>],{content,60,{'P_basic',<<"application/json">>,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<128,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110>>,rabbit_framing_amqp_0_9_1,[<<"{\"customerId\":8288025,\"campaignRunId\":66836,\"status\":\"Sent\",\"messageKey\":\"66836~8288025\",\"timeSent\":1618988531.764737000}">>]},<<201,223,49,27,101,206,136,61,160,111,163,73,226,167,54,31>>,false}}}},{'$gen_cast',{deliver,<<"TmceResultConsu...">>,...}},...]},...], []
2021-04-21 03:04:44.318 [error] <0.14.5> CRASH REPORT Process <0.14.5> with 0 neighbours exited with reason: no such process or port in call to gen_server2:terminate/3 line 1183
2021-04-21 03:04:44.318 [info] <0.11.5> supervisor: {<0.11.5>,rabbit_channel_sup}, errorContext: shutdown_error, reason: noproc, offender: [{pid,<0.14.5>},{id,channel},{mfargs,{rabbit_channel,start_link,[1,<0.5.5>,<0.12.5>,<0.5.5>,<<"10.11.161.117:63900 -> 172.24.22.82:5672">>,rabbit_framing_amqp_0_9_1,{user,<<"cron">>,[],[{rabbit_auth_backend_internal,none}]},<<"/">>,[{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true},{<<"basic.nack">>,bool,true},{<<"publisher_confirms">>,bool,true},{<<"consumer_cancel_notify">>,bool,true}],<0.6.5>,<0.13.5>]}},{restart_type,intrinsic},{shutdown,70000},{child_type,worker}]
2021-04-21 03:04:44.318 [error] <0.11.5> Supervisor {<0.11.5>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.5.5>, <0.12.5>, <0.5.5>, <<"10.11.161.117:63900 -> 172.24.22.82:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"cron">>,[],[{rabbit_auth_backend_internal,none}]}, <<"/">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.6.5>, <0.13.5>) at <0.14.5> exit with reason noproc in context shutdown_error
2021-04-21 03:04:50.625 [info] <0.73.5> accepting AMQP connection <0.73.5> (10.11.161.117:42412 -> 172.24.22.82:5672)
2021-04-21 03:04:50.630 [info] <0.73.5> connection <0.73.5> (10.11.161.117:42412 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:05:20.862 [error] <0.73.5> closing AMQP connection <0.73.5> (10.11.161.117:42412 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:05:20.863 [info] <0.131.5> Closing all channels from connection '10.11.161.117:42412 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:05:32.643 [info] <0.149.5> accepting AMQP connection <0.149.5> (10.11.161.117:10578 -> 172.24.22.82:5672)
2021-04-21 03:05:32.647 [info] <0.149.5> connection <0.149.5> (10.11.161.117:10578 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:06:02.926 [error] <0.149.5> closing AMQP connection <0.149.5> (10.11.161.117:10578 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:06:02.927 [info] <0.206.5> Closing all channels from connection '10.11.161.117:10578 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:06:15.925 [info] <0.228.5> accepting AMQP connection <0.228.5> (10.11.161.117:41072 -> 172.24.22.82:5672)
2021-04-21 03:06:15.929 [info] <0.228.5> connection <0.228.5> (10.11.161.117:41072 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:06:48.903 [error] <0.228.5> closing AMQP connection <0.228.5> (10.11.161.117:41072 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:06:48.904 [info] <0.279.5> Closing all channels from connection '10.11.161.117:41072 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:06:57.332 [info] <0.305.5> accepting AMQP connection <0.305.5> (10.11.161.117:14359 -> 172.24.22.82:5672)
2021-04-21 03:06:57.337 [info] <0.305.5> connection <0.305.5> (10.11.161.117:14359 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:07:27.574 [error] <0.305.5> closing AMQP connection <0.305.5> (10.11.161.117:14359 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:07:27.575 [info] <0.357.5> Closing all channels from connection '10.11.161.117:14359 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:07:34.426 [info] <0.375.5> accepting AMQP connection <0.375.5> (10.11.161.117:48428 -> 172.24.22.82:5672)
2021-04-21 03:07:34.432 [info] <0.375.5> connection <0.375.5> (10.11.161.117:48428 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:08:04.683 [error] <0.375.5> closing AMQP connection <0.375.5> (10.11.161.117:48428 -> 172.24.22.82:5672):
{inet_error,enotconn}
2021-04-21 03:08:04.684 [info] <0.432.5> Closing all channels from connection '10.11.161.117:48428 -> 172.24.22.82:5672' because it has been closed
2021-04-21 03:08:13.436 [info] <0.449.5> accepting AMQP connection <0.449.5> (10.11.161.117:23879 -> 172.24.22.82:5672)
2021-04-21 03:08:13.442 [info] <0.449.5> connection <0.449.5> (10.11.161.117:23879 -> 172.24.22.82:5672): user 'cron' authenticated and granted access to vhost '/'
2021-04-21 03:08:43.668 [error] <0.449.5> closing AMQP connection <0.449.5> (10.11.161.117:23879 -> 172.24.22.82:5672):
{inet_error,enotconn}
崩溃后,除非我重新启动 RabbitMQ 服务器,否则应用程序无法恢复并与 RabbitMQ 通信。
此外,当我检查时,rabbitmq-server.service
我可以看到它处于活动状态。
当我检查时也会发生同样的事情rabbitmq-diagnostics is_running
Asking node rabbit@rabbitmq-test1 for its status ...
RabbitMQ on node rabbit@rabbitmq-test1 is fully booted and running
到目前为止,我还没有找到关于何时发生这种情况的模式。关于为什么会发生这种情况的任何想法?
事实证明,这个问题与 rabbitMQ 配置无关。这是由于应用程序和 MongoDB(我们也在使用)之间的通信缓慢导致应用程序和 RabbitMQ 之间的连接超时。
您正在处理错误的 forloop 并且内存不足,因为您正在打开许多通道或连接,试图处理每个交换和每个交换中持有的队列之间的通信。最有可能的。
当您再次启动脚本时,查看您有多少个连接以及您有多少个频道。如果您的应用程序正在接收不遵循任何模式的消息,并且您不正确地循环通过每个交换和每个队列以引导每条消息,那么您将耗尽您的内存而没有模式来复制问题,因为进入您的发布者的消息遵循没有因此,理性模式很难缩小问题所在。从开始到结束,您的连接时间戳似乎在很短的时间内,这表明您正在接收大量消息,并且您的 for 循环正在为每个交换和队列创建不必要的连接,因为您没有处理 forloop 和连接后者正确。