我需要提取一堆html文件中的文本(大约500K)要复制的文本看起来像<div class='cls '>text to be copied including some<span>and <p></p></span>and more text</div>
我决心(?:\<div\sclass\=\'cls\s\'\>)(.*)(?=\<\/div\>)
我已经阅读了有关如何使用 grep 执行此操作的其他问题,我认为该命令是
grep -r "/(?:\<div\sclass\=\'cls\s\'\>)(.*)(?=\<\/div\>)/" *.html > output.txt
它不起作用。我究竟做错了什么?
也试过pcregrep -r -regexp="/(?:\<div\sclass\=\'cls\s\'\>)(.*)(?=\<\/div\>)/" --file-list=fl.txt > output.txt
- 它什么都不做pcregrep -r -regexp="/(?:\<div\sclass\=\'cls\s\'\>)(.*)(?=\<\/div\>)/" > output.txt
- 什么都没有
编辑1:尝试以下格式的建议:
grep -f -r "/(?:\<div\sclass\=\'desc\s\'\>)(.*)(?=\<\/div\>)/" *.html >> touch output.txt
grep: -r: No such file or directory
grep -f -r "/(?:\<div\sclass\=\'desc\s\'\>)(.*)(?=\<\/div\>)/" *.html >> output.txt
grep: -r: No such file or directory
grep -f -r "/(?:\<div\sclass\=\'desc\s\'\>)(.*)(?=\<\/div\>)/" *.html >> output.txt
grep: -r: No such file or directory
grep -f "/(?:\<div\sclass\=\'desc\s\'\>)(.*)(?=\<\/div\>)/" file111.html >> touch output.txt
grep: /(?:\<div\sclass\=\'desc\s\'\>)(.*)(?=\<\/div\>)/: No such file or directory
和其他一些排列,仍然没有