# cat /etc/sysctl.conf
fs.aio-max-nr=99999999
fs.file-max=99999999
kernel.pid_max=4194304
kernel.threads-max=99999999
kernel.sem=32768 1073741824 2000 32768
kernel.shmmni=32768
kernel.msgmni=32768
kernel.msgmax=65536
kernel.msgmnb=65536
vm.max_map_count=1048576
# cat /etc/security/limits.conf
* soft core unlimited
* hard core unlimited
* soft data unlimited
* hard data unlimited
* soft fsize unlimited
* hard fsize unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 1048576
* hard nofile 1048576
* soft rss unlimited
* hard rss unlimited
* soft stack unlimited
* hard stack unlimited
* soft cpu unlimited
* hard cpu unlimited
* soft nproc unlimited
* hard nproc unlimited
* soft as unlimited
* hard as unlimited
* soft maxlogins unlimited
* hard maxlogins unlimited
* soft maxsyslogins unlimited
* hard maxsyslogins unlimited
* soft locks unlimited
* hard locks unlimited
* soft sigpending unlimited
* hard sigpending unlimited
* soft msgqueue unlimited
* hard msgqueue unlimited
# cat /etc/systemd/logind.conf
[Login]
UserTasksMax=infinity
# free -g
total used free shared buff/cache available
Mem: 117 5 44 62 67 48
Swap: 15 8 7
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 194G 121G 74G 63% /
# cat /proc/meminfo
MemTotal: 123665416 kB
MemFree: 90979152 kB
MemAvailable: 95376636 kB
Buffers: 72260 kB
Cached: 25964076 kB
SwapCached: 0 kB
Active: 8706568 kB
Inactive: 22983044 kB
Active(anon): 7568968 kB
Inactive(anon): 18871224 kB
Active(file): 1137600 kB
Inactive(file): 4111820 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 16777212 kB
SwapFree: 16777212 kB
Dirty: 20 kB
Writeback: 0 kB
AnonPages: 5653128 kB
Mapped: 185100 kB
Shmem: 20786924 kB
KReclaimable: 281732 kB
Slab: 541000 kB
SReclaimable: 281732 kB
SUnreclaim: 259268 kB
KernelStack: 34384 kB
PageTables: 93216 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 78609920 kB
Committed_AS: 63750908 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 46584 kB
VmallocChunk: 0 kB
Percpu: 18944 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 183484 kB
DirectMap2M: 5058560 kB
DirectMap1G: 122683392 kB
And for the user account used to run the scripts:
$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
然而
./somescript.sh: fork: retry: Resource temporarily unavailable
服务器具有中等负载(约 20 个平均 atm 负载),并使用许多执行大量分叉的脚本(即$(comecode)
在许多脚本内)。服务器(谷歌云实例)有 16 个内核和 128GB 内存,100GB tmpfs 驱动器和 16GB 交换。即使 CPU、内存和交换都低于 50% 使用消息显示。
很难相信它会达到这些已经很高的上限。我怀疑还有其他一些设置会影响这一点。
还有什么可以调整来避免这个fork: retry: Resource temporarily unavailable
问题?
经过更多调试,我终于找到了答案。答案似乎非常有价值,因为其他人可能会遇到这个问题。它也可能是 Ubuntu 中的一个错误(待定)
我的脚本在各个地方进行了以下更改(脚本内);
根据脚本/情况,这个
20000
数字会从 2000 到 40000 不等。因此似乎发生的情况是,一旦许多进程以某种方式“最大化”打开文件的最大总数(1048576)——这似乎很容易做到,例如只有有限数量的脚本——每次乘以它们的相应的 ulimit 设置。结果是最多将启动大约 2000-2200 个线程。
我删除了所有
ulimit -u
设置,现在不再有任何设置fork: retry: resource temporarily unavailable
,也没有任何其他相关的 fork 错误。htop 现在还显示超过 2000-2200 个线程;
现在我的机器变得超载/无响应,但这是另一个问题(服务器可能正在交换),并且比
fork
问题更令人愉快:)(作为一个有趣的旁注和参考,https: //stackoverflow.com/questions/30757919/the-limit-of-ulimit-hn描述了如何将打开文件的最大数量增加到大于 1048576 的数量。)
为此设置测试应该很容易(bash 嵌套的 fork 脚本,
ulimit -n ${some_large_value}
每个分叉线程内都有一个集合)