数字排序无法正确排序文件

Question

Jeff Schaller

Asked: 2018-04-07 16:58:53 +0800 CST2018-04-07 16:58:53 +0800 CST 2018-04-07 16:58:53 +0800 CST

如何对单行分隔项进行数字排序？

772

我有一行（或多行）由任意字符分隔的数字。我可以使用哪些 UNIX 工具对每行的项目进行数字排序，同时保留分隔符？

示例包括：

号码列表；输入：10 50 23 42; 排序：10 23 42 50
IP地址; 输入：10.1.200.42; 排序：1.10.42.200
CSV；输入：1,100,330,42; 排序：1,42,100,330
管道分隔；输入：400|500|404; 排序：400|404|500

由于分隔符是任意的，请随意使用您选择的单字符分隔符提供（或扩展）答案。

13 个回答

Voted

αғsнιη · Answer 1 · 2018-04-07T17:27:23+08:00

Best Answer

αғsнιη

2018-04-07T17:27:23+08:002018-04-07T17:27:23+08:00

使用gawk（GNU awk）的asort()功能：

gawk -v SEP='*' '{ i=0; split($0, arr, SEP); len=asort(arr);
    while ( ++i<=len ){ printf("%s%s", i>1?SEP:"", arr[i]) }; 
        print "" 
}' infile

用您的分隔符替换*为字段分隔符。SEP='*'

您也可以在单行的情况下使用以下命令（因为最好不要使用 shell-loops 进行文本处理）

tr '.' '\n' <<<"$aline" | sort -n | paste -sd'.' -

用你的分隔符替换点 .。
添加-u到sort上面的命令以删除重复项。

注意：
您可能需要使用-g, --general-numeric-sort选项sort而不是-n, --numeric-sort来处理任何类型的数字（整数、浮点数、科学、十六进制等）。

$ aline='2e-18,6.01e-17,1.4,-4,0xB000,0xB001,23,-3.e+11'
$ tr ',' '\n' <<<"$aline" |sort -g | paste -sd',' -
-3.e+11,-4,2e-18,6.01e-17,1.4,23,0xB000,0xB001

awk无需更改，它仍然会处理这些。

15

Stephen Harris · Answer 2 · 2018-04-07T17:25:28+08:00

Stephen Harris

2018-04-07T17:25:28+08:002018-04-07T17:25:28+08:00

使用perl有一个明显的版本；拆分数据，对其进行排序，然后将其重新连接起来。

分隔符需要列出两次（一次在中split，一次在中join）

例如对于一个,

perl -lpi -e '$_=join(",",sort {$a <=> $b} split(/,/))'

所以

echo 1,100,330,42 | perl -lpi -e '$_=join(",",sort {$a <=> $b} split(/,/))'
1,42,100,330

由于split是一个正则表达式，该字符可能需要引用：

echo 10.1.200.42 | perl -lpi -e '$_=join(".",sort {$a <=> $b} split(/\./))'
1.10.42.200

通过使用-aand-F选项，可以删除拆分。像以前一样使用-p循环并将结果设置为$_，这将自动打印：

perl -F'/\./' -aple '$_=join(".", sort {$a <=> $b} @F)'

12

muru · Answer 3 · 2018-04-07T18:02:34+08:00

muru

2018-04-07T18:02:34+08:002018-04-07T18:02:34+08:00

使用 Python 和Stephen Harris 的回答中类似的想法：

python3 -c 'import sys; c = sys.argv[1]; sys.stdout.writelines(map(lambda x: c.join(sorted(x.strip().split(c), key=int)) + "\n", sys.stdin))' <delmiter>

所以像：

$ cat foo
10.129.3.4
1.1.1.1
4.3.2.1
$ python3 -c 'import sys; c = sys.argv[1]; sys.stdout.writelines(map(lambda x: c.join(sorted(x.strip().split(c), key=int)) + "\n", sys.stdin))' . < foo
3.4.10.129
1.1.1.1
1.2.3.4

遗憾的是，必须手动执行 I/O 使得它远不如 Perl 版本优雅。

6

Jeff Schaller · Answer 4 · 2018-04-07T16:58:53+08:00

`sed`用于对 IP 地址的八位字节进行排序

sed没有内置sort函数，但如果您的数据在范围内受到足够的限制（例如使用 IP 地址），您可以生成一个手动实现简单冒泡排序的 sed 脚本。基本机制是寻找无序的相邻数字。如果数字不正确，请交换它们。

sed脚本本身包含两个用于每对无序数字的搜索和交换命令：一个用于前两对八位字节（强制出现尾随分隔符以标记第三个八位字节的结尾），以及一个第二对第三对八位位组（以 EOL 结尾）。如果发生交换，程序会跳转到脚本的顶部，寻找无序的数字。否则，它会退出。

生成的脚本部分是：

$ head -n 3 generated.sed
:top
s/255\.254\./254.255./g; s/255\.254$/254.255/
s/255\.253\./253.255./g; s/255\.253$/253.255/

# ... middle of the script omitted ...

$ tail -n 4 generated.sed
s/2\.1\./1.2./g; s/2\.1$/1.2/
s/2\.0\./0.2./g; s/2\.0$/0.2/
s/1\.0\./0.1./g; s/1\.0$/0.1/
ttop

这种方法将句点硬编码为分隔符，必须对其进行转义，否则它将对正则表达式语法“特殊”（允许任何字符）。

要生成这样的 sed 脚本，此循环将执行以下操作：

#!/bin/bash

echo ':top'

for (( n = 255; n >= 0; n-- )); do
  for (( m = n - 1; m >= 0; m-- )); do
    printf '%s; %s\n' "s/$n\\.$m\\./$m.$n./g" "s/$n\\.$m\$/$m.$n/"
  done
done

echo 'ttop'

将该脚本的输出重定向到另一个文件，例如sort-ips.sed.

示例运行可能如下所示：

ip=$((RANDOM % 256)).$((RANDOM % 256)).$((RANDOM % 256)).$((RANDOM % 256))
printf '%s\n' "$ip" | sed -f sort-ips.sed

生成脚本的以下变体使用单词边界标记\<并\>摆脱了第二次替换的需要。这还将生成脚本的大小从 1.3 MB 减少到略低于 900 KB，同时大大减少了sed自身的运行时间（减少到原始脚本的大约 50%-75%，具体取决于sed所使用的实现）：

#!/bin/bash

echo ':top'

for (( n = 255; n >= 0; --n )); do
  for (( m = n - 1; m >= 0; --m )); do
      printf '%s\n' "s/\\<$n\\>\\.\\<$m\\>/$m.$n/g"
  done
done

echo 'ttop'

user232326 · Answer 5 · 2018-04-07T21:19:58+08:00

壳

加载更高级别的语言需要时间。
对于几行，shell 本身可能是一个解决方案。
我们可以使用外部命令sort，和命令tr。一种在对行进行排序时非常有效，另一种在将分隔符转换为换行符时很有效：

#!/bin/bash
shsort(){
           while IFS='' read -r line; do
               echo "$line" | tr "$1" '\n' |
               sort -n   | paste -sd "$1" -
           done <<<"$2"
    }

shsort ' '    '10 50 23 42'
shsort '.'    '10.1.200.42'
shsort ','    '1,100,330,42'
shsort '|'    '400|500|404'
shsort ','    '3 b,2       x,45    f,*,8jk'
shsort '.'    '10.128.33.6
128.17.71.3
44.32.63.1'

这需要 bash 因为<<<只使用。如果将其替换为 here-doc，则该解决方案对 posix 有效。
这能够使用制表符、空格或 shell glob 字符（、、、）对字段*进行?排序[。不是换行符，因为每一行都在排序。

更改<<<"$2"为<"$2"处理文件名并将其称为：

shsort '.'    infile

整个文件的分隔符相同。如果这是一个限制，它可以改进。

然而，一个只有 6000 行的文件需要 15 秒来处理。确实，shell 并不是处理文件的最佳工具。

awk

对于多于几行（多于几十行），最好使用真正的编程语言。一个 awk 解决方案可能是：

#!/bin/bash
awksort(){
           gawk -v del="$1" '{
               split($0, fields, del)
               l=asort(fields)
               for(i=1;i<=l;i++){
                   printf( "%s%s" , (i==0)?"":del , fields[i] )
               }
               printf "\n"
           }' <"$2"
         }

awksort '.'    infile

对于上面提到的相同的 6000 行文件，这只需要 0.2 秒。

了解<"$2"for 文件可以更改回<<<"$2"shell 变量中的 for 行。

Perl

最快的解决方案是 perl。

#!/bin/bash
perlsort(){  perl -lp -e '$_=join("'"$1"'",sort {$a <=> $b} split(/['"$1"']/))' <<<"$2";   }

perlsort ' '    '10 50 23 42'
perlsort '.'    '10.1.200.42'
perlsort ','    '1,100,330,42'
perlsort '|'    '400|500|404'
perlsort ','    '3 b,2       x,45    f,*,8jk'
perlsort '.'    '10.128.33.6
128.17.71.3
44.32.63.1'

如果您想对文件更改<<<"$a"进行简单排序"$a"并添加-i到 perl 选项以使文件版本“就位”：

#!/bin/bash
perlsort(){  perl -lpi -e '$_=join("'"$1"'",sort {$a <=> $b} split(/['"$1"']/))' "$2"; }

perlsort '.' infile; exit

Sergiy Kolodyazhnyy · Answer 6 · 2018-04-07T22:14:34+08:00

Sergiy Kolodyazhnyy

2018-04-07T22:14:34+08:002018-04-07T22:14:34+08:00

bash 脚本：

#!/usr/bin/env bash

join_by(){ local IFS="$1"; shift; echo "$*"; }

IFS="$1" read -r -a tokens_array <<< "$2"
IFS=$'\n' sorted=($(sort -n <<<"${tokens_array[*]}"))
join_by "$1" "${sorted[@]}"

例子：

$ ./sort_delimited_string.sh "." "192.168.0.1"
0.1.168.192

基于

4

jkd · Answer 7 · 2018-04-08T03:10:28+08:00

这里有一些 bash 自己猜测分隔符：

#!/bin/bash

delimiter="${1//[[:digit:]]/}"
if echo $delimiter | grep -q "^\(.\)\1\+$"
then
  delimiter="${delimiter:0:1}"
  if [[ -z $(echo $1 | grep "^\([0-9]\+"$delimiter"\([0-9]\+\)*\)\+$") ]]
  then
    echo "You seem to have empty fields between the delimiters."
    exit 1
  fi
  if [[ './\' == *$delimiter* ]]
  then
    n=$( echo $1 | sed "s/\\"$delimiter"/\\n/g" | sort -n | tr '\n' ' ' | sed -e "s/\\s/\\"$delimiter"/g")
  else
    n=$( echo $1 | sed "s/"$delimiter"/\\n/g" | sort -n | tr '\n' ' ' | sed -e "s/\\s/"$delimiter"/g")
  fi
  echo ${n%$delimiter}
  exit 0
else
  echo "The string does not consist of digits separated by one unique delimiter."
  exit 1
fi

它可能不是很有效也不是很干净，但它确实有效。

使用喜欢bash my_script.sh "00/00/18/29838/2"。

当相同的分隔符未一致使用或两个或多个分隔符相互跟随时返回错误。

如果使用的定界符是特殊字符，则将其转义（否则sed返回错误）。

agc · Answer 8 · 2018-04-09T19:52:15+08:00

agc

2018-04-09T19:52:15+08:002018-04-09T19:52:15+08:00

This answer is based on a misunderstanding of the Q., but in some cases it happens to be correct anyway. If the input is entirely natural numbers, and has only one delimiter per-line, (as with the sample data in the Q.), it works correctly. It'll also handle files with lines that each have their own delimiter, which is a bit more than what was asked for.

This shell function reads from standard input, uses POSIX parameter substitution to find the specific delimiter on each line, (stored in $d), and uses tr to replace $d with a newline \n and sorts that line's data, then restores each line's original delimiters:

sdn() { while read x; do
            d="${x#${x%%[^0-9]*}}"   d="${d%%[0-9]*}"
            x=$(echo -n "$x" | tr "$d" '\n' | sort -g | tr '\n' "$d")
            echo ${x%?}
        done ; }

Applied to the data given in the OP:

printf "%s\n" "10 50 23 42" "10.1.200.42" "1,100,330,42" "400|500|404" | sdn

Output:

10 23 42 50
1.10.42.200
1,42,100,330
400|404|500

2

Stéphane Chazelas · Answer 9 · 2018-04-16T22:15:55+08:00

Stéphane Chazelas

2018-04-16T22:15:55+08:002018-04-16T22:15:55+08:00

For arbitrary delimiters:

perl -lne '
  @list = /\D+|\d+/g;
  @sorted = sort {$a <=> $b} grep /\d/, @list;
  for (@list) {$_ = shift@sorted if /\d/};
  print @list'

On an input like:

5,4,2,3
6|5,2|4
There are 10 numbers in those 3 lines

It gives:

2,3,4,5
2|4,5|6
There are 3 numbers in those 10 lines

2

Kusalananda · Answer 10 · 2018-04-08T12:04:30+08:00

以下是Jeff 答案的变体，因为它生成了一个sed可以进行冒泡排序的脚本，但其差异足以保证它自己的答案。

不同之处在于，它不是生成 O(n^2) 基本正则表达式，而是生成 O(n) 扩展正则表达式。生成的脚本大约有 15 KB 大。脚本的运行时间sed是几分之一秒（生成脚本需要更长的时间）。

它仅限于对由点分隔的正整数进行排序，但不限于整数的大小（仅255在主循环中增加）或整数的数量。可以通过更改delim='.'代码来更改分隔符。

我已经尽力让正则表达式正确，所以我将在另一天继续描述细节。

#!/bin/bash

# This function creates a extended regular expression
# that matches a positive number less than the given parameter.
lt_pattern() {
    local n="$1"  # Our number.
    local -a res  # Our result, an array of regular expressions that we
                  # later join into a string.

    for (( i = 1; i < ${#n}; ++i )); do
        d=$(( ${n: -i:1} - 1 )) # The i:th digit of the number, from right to left, minus one.

        if (( d >= 0 )); then
            res+=( "$( printf '%d[0-%d][0-9]{%d}' "${n:0:-i}" "$d" "$(( i - 1 ))" )" )
        fi
    done

    d=${n:0:1} # The first digit of the number.
    if (( d > 1 )); then
        res+=( "$( printf '[1-%d][0-9]{%d}' "$(( d - 1 ))" "$(( ${#n} - 1 ))" )" )
    fi

    if (( n > 9 )); then
        # The number is 10 or larger.
        res+=( "$( printf '[0-9]{1,%d}' "$(( ${#n} - 1 ))" )" )
    fi

    if (( n == 1 )); then
        # The number is 1. The only thing smaller is zero.
        res+=( 0 )
    fi

    # Join our res array of expressions into a '|'-delimited string.
    ( IFS='|'; printf '%s\n' "${res[*]}" )
}

echo ':top'

delim='.'

for (( n = 255; n > 0; --n )); do
    printf 's/\\<%d\\>\\%s\\<(%s)\\>/\\1%s%d/g\n' \
        "$n" "$delim" "$( lt_pattern "$n" )" "$delim" "$n"
done

echo 'ttop'

该脚本将如下所示：

$ bash generator.sh >script.sed
$ head -n 5 script.sed
:top
s/\<255\>\.\<(25[0-4][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})\>/\1.255/g
s/\<254\>\.\<(25[0-3][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})\>/\1.254/g
s/\<253\>\.\<(25[0-2][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})\>/\1.253/g
s/\<252\>\.\<(25[0-1][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})\>/\1.252/g
$ tail -n 5 script.sed
s/\<4\>\.\<([1-3][0-9]{0})\>/\1.4/g
s/\<3\>\.\<([1-2][0-9]{0})\>/\1.3/g
s/\<2\>\.\<([1-1][0-9]{0})\>/\1.2/g
s/\<1\>\.\<(0)\>/\1.1/g
ttop

The idea behind the generated regular expressions is to pattern match for numbers that are less than each integer; those two numbers would be out-of-order, and so are swapped. The regular expressions are grouped into several OR options. Pay close attention to the ranges appended to each item, sometimes they are {0}, meaning the immediately-previous item is to be omitted from the searching. The regex options, from left-to-right, match numbers that are smaller than the given number by:

the ones place
the tens place
the hundreds place
(continued as needed, for larger numbers)
or by being smaller in magnitude (number of digits)

To spell out an example, take 101 (with additional spaces for readability):

s/ \<101\> \. \<(10[0-0][0-9]{0} | [0-9]{1,2})\> / \1.101 /g

Here, the first alternation allows the numbers 100 through 100; the second alternation allows 0 through 99.

Another example is 154:

s/ \<154\> \. \<(15[0-3][0-9]{0} | 1[0-4][0-9]{1} | [0-9]{1,2})\> / \1.154 /g

Here the first option allows 150 through 153; the second allows 100 through 149, and the last allows 0 through 99.

Testing four times in a loop:

for test_run in {1..4}; do
    nums=$(( RANDOM%256 )).$(( RANDOM%256 )).$(( RANDOM%256 )).$(( RANDOM%256 ))
    printf 'nums=%s\n' "$nums"
    sed -E -f script.sed <<<"$nums"
done

Output:

nums=90.19.146.232
19.90.146.232
nums=8.226.70.154
8.70.154.226
nums=1.64.96.143
1.64.96.143
nums=67.6.203.56
6.56.67.203

如何对单行分隔项进行数字排序？

`sed`用于对 IP 地址的八位字节进行排序

壳

awk

Perl

如何将 GPG 私钥和公钥导出到文件

ssh 无法协商：“找不到匹配的密码”，正在拒绝 cbc

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

如何卸载内核模块“nvidia-drm”？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

如何对单行分隔项进行数字排序？

13 个回答

sed用于对 IP 地址的八位字节进行排序

壳

awk

Perl

相关问题

`sed`用于对 IP 地址的八位字节进行排序