我有一个文件,其中包含非 ASCII、UTF-8 字符。当我用来less
查看该文件时,我收到一条警告说may be a binary file. See it anyway?
但该文件显然不是二进制文件。当我打开文件时,字符未正确呈现。是什么让人们不太相信该文件是二进制的?另外,请注意,这些文件还有更多行纯 ASCII 文本,为简洁起见,我已将其删除。这是一个重现该行为的半最小示例。
更多背景:
$ cat broken.log
⋮
⋮ =✓)
$ head broken.log
⋮
⋮ =✓)
$ less broken.log
"broken.log" may be a binary file. See it anyway?
<E2><8B><AE>
<E2><8B><AE> =<E2><9C><93>)
broken.log (END)
$ file broken.log
broken.log: UTF-8 Unicode text
操作系统:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
LESS:我很确定它是版本 487-0.1。
环境:
$ env | grep LANG
LANG=en_US.UTF-8
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ which less
/usr/bin/less
$ ls -la $(which less)
lrwxrwxrwx 1 root root 9 Jul 20 15:49 /usr/bin/less -> /bin/less
$ ls -la /bin/less
-rwxr-xr-x 1 root root 166664 May 7 2018 /bin/less
$ type -a less
less is /usr/bin/less
less is /bin/less