Eu tenho vários arquivos de log:
adsfs.demo.com_2022-07-11-0000-0001_cn.tgz
adsfs.demo.com_2022-07-11-0000-0002_cn.tgz
adsfs.demo.com_2022-07-11-0000-0003_cn.tgz
adsfs.demo.com_2022-07-11-0000-0004_cn.tgz
adsfs.demo.com_2022-07-11-0000-0005_cn.tgz
...
o conteúdo dele assim:
google 16.122.87.76 12.48.167.135 80 adsfs.demo.com [11/Jul/2022:00:45:03 +0800] 1657471503.000 "GET https://adsfs.demo.com/mp/app/feeds/index.js?age=11&name=jock 1.1" 304 - 395 - - 1 "https://dhfs.demo.com/" "Mozilla/5.0 (Linux; U; Android 11; zh-cn; PDVM00 Build/RKQ1.201217.002) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/90.0.4430.61 Mobile Safari/537.36 HeyTapBrowser/40.7.39.5" "16.11.87.76" "-" 1 - 1
meu requisito é obter os topURLs com params do 8º campo de linha, o 8º campo é este:
"GET https://adsfs.demo.com/mp/app/feeds/index.js?age=11&name=jock 1.1"
Eu quero o resultado está abaixo:
https://adsfs.demo.com/mp/app/feeds/index.js?age=11&name=jock 13549
https://adsfs.demo.com/mp/app/feeds/index.js?age=12&name=jock 12541
https://adsfs.demo.com/mp/app/feeds/index.js?age=13&name=rose 1142
https://adsfs.demo.com/mp/app1/index.css?age=11&name=jock 1074
https://adsfs.demo.com/mp/app2/index.html 874
...
Eu tentei isso, mas parece incorreto:
zcat * | awk '{print $10, $17}' | awk '{a[$1]+=$10} END{for(i in a){print i, a[i]}}' | sort -rn -k 2 | head
https://adsfs.demo.com/user 0
https://adsfs.demo.com/union/adlogo/o_1512387525231.png 0
https://adsfs.demo.com/union/adlogo/logo_wo_b.png 0
https://adsfs.demo.com/union/adlogo/logo_w_b.png?aaa=aa.png 0
https://adsfs.demo.com/union/adlogo/logo_w_b.png?aa=1.jpg 0
https://adsfs.demo.com/union/adlogo/logo_w_b.png 0
https://adsfs.demo.com/union/adlogo/gdt_logo.png 0
https://adsfs.demo.com/signin 0
https://adsfs.demo.com/res/v2/feeds/mat_pic/202101/05/1000096829_1609822941972.jpg.short.webp?region=cn-north-1&x-ocs-process=image%252fresize%252cm_fix%252cw_640%252ch_320%252ffallback 0
https://adsfs.demo.com/res/v2/feeds/mat_pic/202101/05/1000096829_1609822941972.jpg.short.webp 0
Mais completo que meu comentário. Completo no script awk e solução de chamada:
script awk
./topurllogs.awk
Executável com este comando:
Usado assim:
ou com outro valor MAX: