更新:5 (20171209)
更新:5 (20171210)
mount -t nfs4 [SERVER IP]:/archlinux /mnt
作品。ss -ntp | grep 2049
客户端在 systemd 开始之前建立与服务器的连接。- NSF4 id mapper 只能与 Kerberos 一起使用?
问题
我正在尝试设置无盘节点/工作站/系统。操作系统(4.13.12-1-ARCH)安装在 SERVER 上/srv/archlinux
。从GRUB 成功网络引导到 NFSv4后,systemd 开始但在多个阶段失败,例如:
- 无法挂载内核配置文件系统。
- 无法挂载内核调试文件系统。
- 挂载大页面文件系统失败
- 无法启动加载/保存随机种子。
- 无法挂载 /tmp。
- 无法启动重建日志目录。
- 然后以
Not tainted 4.13.12-1-ARCH #1...
或者,
- 无法挂载 POSIX 消息队列文件系统。
- 无法启动重新挂载根和内核文件系统。
- 无法挂载 Huge Pages 文件系统。
- 无法挂载内核调试文件系统。
- 无法挂载内核配置文件系统。
- 然后以
Not tainted 4.13.12-1-ARCH #1...
我怀疑这些故障是由 NFSv4 或本地网络的错误配置引起的。
rpc.idmapd
/etc/idmapd.conf
[General]
Verbosity = 7
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
Domain = localdomain
[Mapping]
Nobody-User = nobody
Nobody-Group = nobody
[Translation]
Method = nnswitch
/etc/exports
(printed using # exportfs -v)
/srv <world>(rw,sync,wdelay,hide,no_subtree_check,fsid=0,sec=sys,no_root_squash,no_all_squash)
/srv/archlinux <world>(rw,sync,wdelay,hide,no_subtree_check,sec=sys,no_root_squash,no_all_squash)
(Exposed to "world" for debugging purposes)
rpc.idmapd -fvvv
在启动期间单独运行tty
会记录以下内容:
rpc.idmapd: libnfsidmap: using domain: localdomain
rpc.idmapd: libnfsidmap: Realms list: 'LOCALDOMAIN'
rpc.idmapd: libnfsidmap: processing 'Method' list
rpc.idmapd: libnfsidmap: loaded plugin /usr/lib/libnfsidmap/nsswitch.so for method nsswitch
rpc.idmapd: Expiration time is 600 seconds.
rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel
rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel
rpc.idmapd: nfsdcb: authbuf=* authtype=user
rpc.idmapd: nfs4_uid_to_name: calling nsswitch->uid_to_name
rpc.idmapd: nfs4_uid_to_name: nsswitch->uid_to_name returned 0
rpc.idmapd: nfs4_uid_to_name: final return value is 0
rpc.idmapd: Server : (user) id "0" -> name "root@localdomain"
如果exportfs
sec=sys
,则继续如下:
rpc.idmapd: nfsdch: authbuf=* authtype=user
rpc.idmapd: nfs4_name_to_uid: calling nsswitch->name_to_uid
rpc.idmapd: nss_getpwnam: name '0' domain 'localdomain': resulting localname '(null)'
rpc.idmapd: nss_getpwnam: name '0' does not map into domain 'localdomain'
rpc.idmapd: nfs4_name_to_uid: nsswitch->name_to_uid returned -22
rpc.idmapd: nfs4_name_to_uid: final return value is -22
rpc.idmapd: Server : (user) name "0" -> id "99"
(stops here)
+(20171209) 确保/etc/hostname
CLIENT 的 设置为client2
(duh) 后,如果exportfs
sec=none
或 sec=sys
,则继续如下:
rpc.idmapd: nfsdch: authbuf=* authtype=group
rpc.idmapd: nfs4_gid_to_name: calling nsswitch->gid_to_name
rpc.idmapd: nfs4_gid_to_name: nsswitch->gid_to_name returned 0
rpc.idmapd: nfs4_gid_to_name: final return value is 0
rpc.idmapd: Server : (group) id "190" -> name "systemd-journal@localdomain"
rpc.idmapd: nfsdch: authbuf=* authtype=user
rpc.idmapd: nfs4_name_to_uid: calling nsswitch->name_to_uid
rpc.idmapd: nss_getpwnam: name '0' domain 'localdomain': resulting localname '(null)'
rpc.idmapd: nss_getpwnam: name '0' does not map into domain 'localdomain'
rpc.idmapd: nfs4_name_to_uid: nsswitch->name_to_uid returned -22
rpc.idmapd: nfs4_name_to_uid: final return value is -22
rpc.idmapd: Server : (user) name "0" -> id "99"
(stops here)
如果我改为将方法从更改nsswitch
为static
(NFS 中的 UID 映射)
/etc/idmapd.conf
...
[Translation]
Method = static
[Static]
root@localdomain = root
在启动期间rpc.idmapd -fvvv
单独tty
记录以下内容:
rpc.idmapd: libnfsidmap: using domain: localdomain
rpc.idmapd: libnfsidmap: Realms list: 'LOCALDOMAIN'
rpc.idmapd: libnfsidmap: processing 'Method' list
rpc.idmapd: static_getpwnam: name 'root@localdomain' mapped to 'root'
rpc.idmapd: static_getpwnam: group 'root@localdomain' mapped to ' root'
rpc.idmapd: libnfsidmap: loaded plugin /usr/lib/libnfsidmap/static.so for method static
rpc.idmapd: Expiration time is 600 seconds.
rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel
rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel
rpc.idmapd: nfsdcb: authbuf=* authtype=user
rpc.idmapd: nfs4_uid_to_name: calling static->uid_to_name
rpc.idmapd: nfs4_uid_to_name: static->uid_to_name returned 0
rpc.idmapd: nfs4_uid_to_name: final return value is 0
rpc.idmapd: Server : (user) id "0" -> name "root@localdomain"
如果exportfs
sec=sys
,则继续如下:
rpc.idmapd: nfsdch: authbuf=* authtype=user
rpc.idmapd: nfs4_name_to_uid: calling static->name_to_uid
rpc.idmapd: nfs4_name_to_uid: static->name_to_uid returned -2
rpc.idmapd: nfs4_name_to_uid: final return value is -2
rpc.idmapd: Server : (user) name "0" -> id "99"
(stops here)
如果exportfs
sec=none
,则继续如下:
rpc.idmapd: nfsdch: authbuf=* authtype=group
rpc.idmapd: nfs4_gid_to_name: calling static->gid_to_name
rpc.idmapd: nfs4_gid_to_name: static->gid_to_name returned -2
rpc.idmapd: nfs4_gid_to_name: final return value is -2
rpc.idmapd: Server : (group) id "190" -> name "nobody"
rpc.idmapd: nfsdch: authbuf=* authtype=user
rpc.idmapd: nfs4_name_to_uid: calling static->name_to_uid
rpc.idmapd: nfs4_name_to_uid: static->name_to_uid returned -2
rpc.idmapd: nfs4_name_to_uid: final return value is -2
rpc.idmapd: Server : (user) name "0" -> id "99"
(stops here)
用户 ID 映射的类似问题:
- NFSv4 用户映射
- NFS 用户映射
- 将本地用户的 UID 和 GID 映射到挂载的 NFS 共享
- 还有更多……通常与从 NFSv3 到 NFSv4 的切换有关,很少与网络启动有关。
故障排除
- 没有防火墙
- 没有 Kerberos、LDAP 等。
- 没有 SELinux
- 用户
root
同时存在于 SERVER 和 CLIENT 上,具有相同的密码。
服务器
我可以在服务器上识别的 NFSv4 的所有其他相关配置文件。
/etc/nsswitch.conf
passwd: compat mymachines systemd
group: compat mymachines systemd
shadow: compat
publickey: files
hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname
networks: files
protocols: files
services: files
ethers: files
rpc: files
netgroup: files
/etc/nfs.conf
(all settings commented out)
/etc/conf.d/nfs-common.conf
(all settings commented out)
网络配置
SERVER 主机名是server
并且有 3 个网络设备 (nd[1-3])。网关default via 192.168.0.1 nd1
。
/etc/hosts
127.0.0.1 localhost.localdomain localhost
::1 ip6.localhost localhost
192.168.0.101 nd1.localdomain server servernd1
192.168.1.101 nd2.localdomain server servernd2
192.168.2.101 nd3.localdomain server servernd2
192.168.1.102 client1.localdomain client1
192.168.2.102 client2.localdomain client2
/etc/resolveconf.conf
name_servers=192.168.0.1
# hostname -f
# nd1.localdomain
# hostname -i
192.168.0.101 192.168.1.101 192.168.2.101
# getent hosts IP -> the corresponding line in /etc/hosts
# getent ahosts HOSTNAME -> the corresponding line in /etc/hosts
# ping -c 3 server.localdomain -> 0% packet loss
# id -u root -> 0
# id -un 0 -> root
Display the system's effective NFSv4 domain name on stdout.
# nfsidmap -d -> localdomain
Display on stdout all keys currently in the keyring used to cache ID mapping results. These keys are visible only to the superuser.
# nfsidmap -l -> nfsidmap: '.id_resolver' keyring was not found.
客户
/etc/hostname +(20171209)
client2
/etc/hosts
(exactly the same as the hosts file on the server)
/etc/resolveconf.conf
name_servers=192.168.0.1
/etc/idmapd.conf
(exactly the same as the idmapd.conf file on the server)
/etc/fstab
# sys=sec or sys=none to correspond to server export settings.
/dev/nfs / nfs rw,hard,rsize=9151,sec=sys,clientaddr=192.168.2.102 0 0
devtmpfs /dev devtmpfs defaults
proc /proc proc defaults
none /run tmpfs defaults
sys /sys sysfs defaults
run /run tmpfs defaults
tmp /tmp tmpfs defaults
是通过fstab
使用findmnt -A
.
net_nfs4
- +(20171210) SERVER 和 CLIENT 上的 NFS 版本
cat /proc/fs/nfsd/versions -> -2 +3 +4 +4.1 +4.2
- 在 SERVER 和 CLIENT
cat /sys/module/nfsd/parameters/nfs4_disable_idmapping -> N
上。 - 在服务器上
echo "options nfsd nfs4_disable_idmapping=0" > /etc/modprobe.d/nfsd.conf
。 - 在 CLIENT
/sys/module/nfs/parameters/nfs4_disable_idmapping
上不存在,并且不确定如何手动创建它,因为它/sys
是只读的。 - +(20171210) 在客户上
echo "options nfs nfs4_disable_idmapping=0" > /etc/modprobe.d/nfs.conf
。
客户端 IP 是192.168.2.102/24
。CLIENT 网络设备连接到 SERVER nd2 192.168.2.101/24
(主机名:servernd2)。
开机时的网络信息:
:: running early hook [udev]
starting version 235
:: running hook [udev]
:: Triggering uevents...
:: running hook [net_nfs4]
IP-Config: eth0 hardware address [CLIENT NETWORK DEVICE MAC] mtu 1500 DHCP
hostname client2 IP-Config: eth0 guessed broadcast address 192.168.2.255
IP-Config: eth0 complete (from 192.168.0.101):
address: 192.168.2.102 broadcast: 192.168.2.255 netmask: 255.255.255.0
gateway: 192.168.2.101 dns0 : 192.168.0.1 dns1 : 0.0.0.0
host : client2
domain : localdomain
rootserver: 192.168.0.101 rootpath: /srv/archlinux
filename : /netboot/grub/i386-pc/core.0
NFS-Mount: 192.168.2.101:/archlinux
Waiting 10 seconds for device /dev/nfs ...
(systemd takes over from here)
为什么会出现 NSFv4 错误?
Server : (group) id "190" -> name "nobody"
在 NFSv4 中,情况发生了变化:用户由用户名映射,用户名和用户 ID 之间的映射由称为“ID 映射守护程序”(idmapd) 的进程处理。特别是,NFSv4 客户端和服务器应该使用相同的域以使映射正常工作,否则请求将被映射到匿名用户/组。--试用 NFSv4(在 Linux 和 Solaris 上) -- 2012 年 3 月 15 日 - 13:03 / bronto
在理想情况下,请求客户端的用户和组将确定返回数据的权限。我们不是生活在一个理想的世界里。两个现实世界的问题介入:
- 您可能不信任对服务器文件具有 root 访问权限的客户端的 root 用户。
- 客户端和服务器上相同的用户名可能具有不同的数字 ID
问题 1 在概念上很简单。John Q. Programmer 获得了一台测试机器,他对其具有 root 访问权限。这绝不意味着 John Q. Programmer 应该能够更改服务器上的根拥有的文件。因此 NFS 提供了根压缩,该功能将 uid 0(根)映射到匿名 (nfsnobody) uid,默认为 -2(16 位数字为 65534)。-- NFS: Overview and Gotchas -- 版权所有 (C) 2003 by Steve Litt
+(20171209)rpc.idmapd: nss_getpwnam: name '0' domain 'localdomain': resulting localname '(null)'
根据Steve Dickson 在对 Red Hat Bugzilla 的评论 (2011-08-12 16:01:55 EDT) 中的评论 – 错误 715430 报告
[error] 语句解释了问题。本地机器上的 DNS 未设置(或返回 NULL),并且 /etc/idmapd.conf 中的 Domain= 变量未设置。
nss_getpwnam: name '0' does not map into domain
在 Debian 邮件列表中,Jonas Meurer 和 Christian Seiler (20150722) 之间有关“Kerberos-secured NFSv4”的电子邮件通信中详细解释了该错误。我对讨论的总结:
当 NFS 客户端发送nss_getpwnam: name '8' domain 'freesources.org': resulting localname '(null)'
NFS 客户端在某些情况下只发送转换为字符串的 uid,而不是正确转换的 NFS 用户名,然后服务器会拒绝该用户名。
客户应该发送nss_getpwnam: name '[email protected]' domain 'freesources.org': resulting localname 'mail'
在这里,您可以看到 NFS 客户端传输的所有者名称是“[email protected]”(而不仅仅是“8”),所以它确实包含一个 @;nss_getpwname 可以看到域名匹配,然后将其剥离,产生一个用户名“mail”,它在 /etc/passwd 中查找,返回用户 id(在本例中为 8,因为它在客户端和服务器),服务器非常高兴。
那么为什么客户端会发送错误的用户名呢?...每隔一段时间,idmapping 就会失败,所以内核只会发送一个数字。但是这个数字会导致 chown 命令失败,因为服务器不会把它翻译回来。
简短的回答:我不知道。
更长的答案:...
如果我正确理解较长的答案,则可能会出现问题,因为 NFS 客户端依赖于“内核的密钥缓存”。对于 NFS 服务器,这永远不会成为问题,因为从未使用过“内核的密钥缓存”。
尽管如此,
由于您只是通过 /etc/passwd 使用常规 nsswitch,因此 nss_getpwnam在您的情况下永远不会失败,除非您同时使用 /etc/passwd 做一些奇怪的事情。
答案还提到了 idmapd 的替代方法;nfsidmap
,虽然阅读man
我不太明白它将如何取代idmapd
.
+(20171209) nss_getpwnam: name '[email protected]' does not map into domain 'localdomain'
我似乎没有出现此错误消息,但是我包含了来自SUSE 支持知识库的答案 - 10-DEC-13 修改日期:12-OCT-17 -因为原因描述和建议的补救措施与其他发现的讨论形成鲜明对比。
NFSv4 处理用户身份的方式与 NFSv3 不同。在 v3 中,nfs 客户端只需在 chown(和其他请求)中传递一个 UID 号,nfs 服务器会接受它(即使 nfs 服务器不知道具有该 UID 号的帐户)。但是,v4 旨在以@ 的形式传递身份。为了正常运行,通常需要 idmapd(id 映射守护程序)在客户端和服务器上处于活动状态,并且每个都将自己视为同一 id 映射域的一部分。
像上面记录的那样,chown 失败或 idmapd 错误通常是以下任一原因的结果:
- 客户端知道用户名但服务器不知道用户名,或者
- idmapd 域名在客户端的设置与在服务器上的设置不同。
因此,可以通过确保 nfs 服务器和客户端配置有相同的 idmapd 域名 (/etc/idmapd.conf) 并且都知道相关的用户名/帐户来解决此问题。
但是,确保双方具有相同的用户帐户知识通常并不方便,尤其是在 nfs 服务器是文件管理器的情况下。NFS 社区已经认识到,NFSv4 的这个 idmapd 特性通常更麻烦,因此值得采取一些步骤和修改,以使 NFSv3 行为即使在 NFSv4 下也能正常工作。
建议的补救措施是禁用 idmapd。
nfs.nfs4_disable_idmapping=1
+(20171209) Wireshark
Analyzing the Wireshark log, it is quite extensive but begins with something like:
[IP CLIENT] -> [IP SERVER] NFS 226 V4 Call ACCESS FH: [HEX VALUE], [Check: RD LU MD XT DL]
[IP SERVER] -> [IP CLIENT] NFS 238 V4 Reply (Call In 34) ACCESS, [Allowed: RD LU MD XT DL]
[IP CLIENT] -> [IP SERVER] NFS 246 V4 Call LOOKUP DH: [HEX VALUE]/archlinux
where a similar pattern [A HEX VALUE]/[PATH]
can be discerned for
/sbin
, /usr
, /bin
, /init
, /lib
, /systemd
, /dev
, /proc
, /sys
, /run
, /
, /lib64
.
When the CLIENT requests /Id-linux-x86-64.so.2
the first errors start to appear:
[IP CLIENT] -> [IP SERVER] NFS 342 V4 Call OPEN DH: [HEX VALUE]/Id-linux-x86-64.so.2
[SERVER IP] -> [CLIENT IP] NFS 166 V4 Reply (Call In 124) OPEN Status: NFS4ERR_SYMLINK
The pattern more or less repeats itself with more frequent errors, for example, LOOKUP Status;
and OPEN Status:
reporting NFS4ERR_NOENT
.
Interestingly, it is at the very end of the log where to first and only reference to user permission is made,
[SERVER IP] -> [CLIENT IP] NFS 182 V4 Reply (Call In 9562) SETATTR Status: NFS4ERR_BADOWNER
RFC
According to
- RFC7530 (Network File System (NFS) Version 4 Protocol, 201503, PROPOSED STANDARD) -- Updated by RFC7931
- RFC5661 (Network File System (NFS) Version 4 Minor Version 1 Protocol, 201001, PROPOSED STANDARD) -- Updated by RFC8178
- RFC7862 (Network File System (NFS) Version 4 Minor Version 2 Protocol, 201001, PROPOSED STANDARD) -- Updated by RFC8178 -- which refers back to [RFC5661].
NFS4ERR_BADOWNER (Error Code 10039)
This error is returned when an owner or owner_group attribute value or the who field of an ACE within an ACL attribute value cannot be translated to a local representation.
The specifications discuss in Section 5.9. Interpreting owner and owner_group, I am not sure what to cite as relevant however.
NFS4ERR_SYMLINK (Error Code 10029)
The current filehandle designates a symbolic link when the current operation does not allow a symbolic link as the target.
NFS4ERR_NOENT (Error Code 2)
This indicates no such file or directory. The file system object referenced by the name specified does not exist.
The error could however be expected ...
The current filehandle is assumed to refer to a regular directory a named attribute directory. LOOKUPP assigns the filehandle for its parent directory to be the current filehandle. If there is no parent directory, an NFS4ERR_NOENT error must be returned. Therefore, NFS4ERR_NOENT will be returned by the server when the current filehandle is at the root or top of the server's file tree.
+(20171210) mount -t nfs4 [SERVER IP]:/archlinux /mnt
On the client computer, using the Archlinux "LiveUSB" I was able to mount the network drive, download the latest kernel (4.14-4-1-ARCH) via the SERVER internet connection, and install archlinux on the [SERVER IP]/archlinux
.
During install rpc.idmapd -fvvv
indicated a successful mapping of usernames, for example,
rpc.idmapd: Server : (user) id "0" -> name "root@localdomain"
rpc.idmapd: Server : (group) id "99" -> name "nobody@localdomain"
... -> name "tty@localdomain"
... -> name "systemd-journal-upload@localdomain"
... -> name rpc@localdomain
... -> name systemd-journal@localdomain
... -> name utmp@localdomain
The result of genfstab
was also different:
[SERVER IP]:/archlinux / nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,times=600,retrans=2,sec=sys,clientaddr=[CLIENT IP],local_lock=none,addr=[SERVER IP] 0 0
Nevertheless, after reboot systemd
failed again with the same failures as described at the beginning of the post.
+(20171210) Is the remote directory on the server mounted to /new_root
?
The mkinitcpio
script uses the variable mount_handler
to carry an assigned "mounting function", in this case nfs_mount_handler()
, to which the "root path" is passed $1
at a later stage; /new_root
.
I am trying to verify that the client has mounted the [SERVER IP]:/archlinux
to the /new_root
. On the server, I can only observe that the client has established a connection but not if the directory is mounted and to where?
showmount -a server -> All mount points on server: (empty)
ss -ntp | grep 2049 ->
ESTAB 0 0 192.168.2.101:2049 192.168.2.102:809 (random port)
+(20171210) NFS4, sec=sys
and id mapper are incompatible?
Reading the doco, it looks like sec=sys and the id mapper can be used to correctly map uid/gid to name where the client and server have different mappings in /etc/passwd and /etc/group. This simply isn't true.
That's because with sec=sys the id mapper doesn't come into play in the authentication part of the nfs protocol, only the file attributes part. With sec=sys authentication, nfs just passes the client uid/gid which is used directly by the server. So permissions checks will be screwed if client and server uid and gid don't align. To confuse things further, when the client creates a new file it is the authentication credentials that are used, so the file gets created at the server with the client's uid/gid. After that nfs uses idmap to get the file attributes, so the uid/gid (which originally came from the client) gets mapped at the server, and you end up seeing the server's name for a client uid/gid. Borkage! On the other hand, if the file was originally created at the server, you will see the correct name at the client, even if the uid/gid differs. But permissions checking will still be broken. -- kimmie -- Posted: Wed Feb 20, 2013 3:14 am Post subject: -- Emphasis in original
From the kernel documentation for kernel parameters
nfs.nfs4_disable_idmapping=
nfsd.nfs4_disable_idmapping=
nfs.nfs4_disable_idmapping=1
andnfsd.nfs4_disable_idmapping=1
Disabling the id mapper
nfsd.nfs4_disable_idmapping=1
andnfs.nfs4_disable_idmapping=1
on the SERVER and CLIENT resulted in systemd starting up to the user login prompt, with only 1 error:modconf
to themkinitcpio
hooks; together withblock keyboard
in an attempt to deal with the other apparent problem:The
rpc.idmapd -fvvv
did not output any messages.I am able to login as root using an external USB keyboard, read and create files. I have not done any extensive testing so there could still be problems with this solution.
nfs.nfs4_disable_idmapping=0
andnfsd.nfs4_disable_idmapping=0
It seems that
echo "options nfs nfs4_disable_idmapping=0" >> /etc/modprobe.d/nfs.conf
(orcat /sys/module/nfsd/parameters/nfs4_disable_idmapping -> N
) on the CLIENT did not have any effect.The CLIENT id mapper was disabled until I explicitly passed the parameter
nfs.nfs4_disable_idmapping=0
to the kernel during boot (GRUB).The
rpc.idmapd -fvvv
did not output any complaints. On the other hand, it did not print out anything else after establishing the firstrpc.idmapd: Server : (user) id "0" -> name "root@localdomain"
...The Wireshark log however no longer records a
NFS4ERR_BADOWNER
.Nonetheless, all the systemd startup failures persist...
Conclusion
nfs.nfs4_disable_idmapping=0
andnfsd.nfs4_disable_idmapping=0
Save for setting up Kerberos and troubleshooting, I am not sure what to try next. The
rpc.idmapd
still seems to be unable to map correct permissions, butrpc.idmapd -fvvv
no longer outputs any errors...? What to do? The boot errors could perhaps be caused by something else... I dunno...nfs.nfs4_disable_idmapping=1
andnfsd.nfs4_disable_idmapping=1
Although it works, the approach seems wrong; I am not migrating, and I should be able to set up the system using
rpc.idmapd
. For now it will have to do; it will probably come back and bite me in the future...