我需要完成过滤日志文件中机器人活动的任务。
解决方案应仅显示满足以下条件的记录
- 用户登录,用户更改密码,用户在同一秒内注销。
- 这些操作(登录、更改密码、注销)一个接一个地发生,中间没有其他条目。
输入数据示例
[a lot of data]
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|faaaaaa11111| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|terdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|faaaa0a11111| - |user logged in| -
[a lot of data]
为了完成任务,我编写了下面的代码
awk 'BEGIN { FS=" " } { c[$5]++; l[$5,c[$5]]=$0 } END { for (i in c) { if (c[i] == 3) for (j = 1 ; j <= c[i]; j++) print l[i,j] } }' $1
用法:
./parse_log.sh 日志文件.log
输出:
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
但我想知道用 Perl 或 Python 编写的替代方案(使用最少的外部库)会是什么样子?
这不是一个答案,但它对于评论来说太大了并且需要格式化,所以要解决您的评论“Python 代码更容易阅读和理解它的作用。”,仅供参考,一个带有合理变量名的 AWK 脚本这就是我认为 你的 Python 脚本所做的看起来很像你的 Python 脚本,但更简洁,因为对于操作文本,awk 已经为你做了所有你必须在 python 中编写代码的常见事情:
但是在处理之前将整个文件读入内存是解决这个问题的一种非常低效的方法。每次时间更改时,您都应该进行测试并打印:
解决方案本身是用 Python 3 编写的。
用法:
parse_log.py ' ' 5 logfile.log
Python 代码更容易阅读和理解它的作用。
与任何
awk
:不使用
Perl
外部库:使用
python3
usgin 只是sys
为了读取文件并exit
使用有意义的值进行调用。与任何
sed
:所有解决方案都避免将整个数据存储在内存中
这是一个 Perl 解决方案:
第 1 行让 shell 知道这是一个 Perl 脚本 [1]。
第 2 行“strict pragma 禁用某些可能表现异常或难以调试的 Perl 表达式,将它们变成错误。” [2]
第 3 行“warnings pragma 控制在 Perl 程序的哪些部分中启用了哪些警告。” [3]
第 4 行声明了局部变量 [4, 5]:
$p2
是前第二行输入。$p1
是上一行输入。$x
用于保存当前输入行的初始部分。第 5-13 行形成一个
while
复合语句 [6]。Line 5 the
while
loop expression uses the null filehandle<>
(diamond operator) [7], allowing the solution to be used as a Unix filter [8]. For each line of input,while
will assign the current line of input to the default input general variable$_
[9] and evaluateBLOCK
.Lines 6-10 form an
if
compound statement using statement modifier syntax [10].Lines 7-10 form the
EXPR
portion of theif
statement, which is composed of three regular expressions [11], onedo BLOCK
function [12], and the C-style logical "and" operator&&
[13].Line 7 attempts to match the current line of input against the regular expression
/(.+)user logged off/
. If successful, the initial portion of the current line of input is captured into the global variable$1
[15].If line 7 was true, line 8 saves
$1
to the local variable$x
. (Subsequent regular expressions may clobber the value of$1
.) Thedo BLOCK
evaluates to the the value of the last statement ofBLOCK
, which will be a non-empty string, which Perl considers true.If line 8 was true, line 9 attempts to match the previous line of input against the regular expression
/\Q$x\Euser changed password/
.$x
must be escaped within the regular expression using the delimiters\Q
and\E
, so that its value is treated as a string.If line 9 was true, line 10 attempts to match the second previous line of input using the regular expression
/\Q$x\Euser logged in/
.If line 10 is true, line 6 prints the second previous line of input, the previous line of input, and the current line of input.
Lines 11-12 update the variables for second previous line of input and previous line of input.
The solution produces output in the same order as the input:
References:
[1] https://en.wikipedia.org/wiki/Shebang_(Unix)
[2] https://perldoc.perl.org/strict
[3] https://perldoc.perl.org/warnings
[4] https://perldoc.perl.org/perldata
[5] https://perldoc.perl.org/functions/my
[6] https://perldoc.perl.org/perlsyn#Compound-Statements
[7] https://perldoc.perl.org/perlop#I%2FO-Operators
[8] https://en.wikipedia.org/wiki/Filter_(software)#Unix
[9] https://perldoc.perl.org/perlvar#General-Variables
[10] https://perldoc.perl.org/perlsyn#Statement-Modifiers
[12] https://perldoc.perl.org/perlre
[13] https://perldoc.perl.org/functions/do
[14] https://perldoc.perl.org/perlop#C-style-Logical-And
[15] https://perldoc.perl.org/variables/$%3Cdigits%3E%20($1,%20$2,%20...)
在 Perl 中,这可以写成单行,但感觉有点混乱:
或者,作为脚本:
在这里你有我之前脚本的 python 版本(从标准输入读取)
使用Raku(以前称为 Perl_6)
样本输出:
如果您正在寻找在命令行中使用的脚本语言,您可能需要考虑 Raku。代码相对较短,包含一些细节——包括利用 Raku 的内置散列功能。
阅读上面的代码
lines
是读入的,每一行都split
在|
管道和/或\s "-" \s?
空白破折号上。这将返回每行四个元素(零索引 = 0..3)。每一行都被push
编入@a
数组。然后解析每行的第一列DateTime::Parse.new()
以返回posix
秒数,然后将其推送到@b
数组中。从这两个数组
%c
中,使用 Raku 的[Z=>]
Zip-reduction 元操作符创建一个散列。这给出了一个%c
posix 秒为 的散列key
,零索引列 1,2,3 为value
。当元素被append
编辑到%c
散列时,它们的值附加到适当的posix
键上。最后在 out 中put
,对于每个 posix-key 元素,检查值以确保它们contain
与请求的三个字符串完全相同,并检查是否存在恰好 9 个元素(三行三列)。样本输入:
[感谢@bduggan 和@sergot
DateTime::Parse
如此迅速地更新Raku 的模块!]https://andrewshitov.com/2020/06/06/some-tips-for-working-with-hashes-in-raku/
https://github.com/sergot/datetime-parse
https://raku.org
output