AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / computer / 问题 / 1725719
Accepted
Nor.Z
Nor.Z
Asked: 2022-06-10 22:02:07 +0800 CST2022-06-10 22:02:07 +0800 CST 2022-06-10 22:02:07 +0800 CST

为什么 Dreamweaver 正则表达式替换为错误的字符?(--如何正则表达式替换特定于(内部)html标签的内容?)

  • 772

情况

这是我在 Dreamweaver 中用于 Regex 替换的 Regex & Substitution::

(( and )|( that )|( include )|( includes )|( including ))
$1<br/>\n

图片:Dreamweaver 中的正则表达式替换设置

2.

在我Replace All在当前文档中之后。我发现了一些错误。

例如::

create new Flux and Mono instances

正在被替换为(错误的)

create new Flux that <br/>
Mono instances

实际图像

而不是(正确的)

create new Flux and <br/>
Mono instances

(regex101 可视化 => https://regex101.com/r/hJTqFg/1)

问题

为什么会这样?

我怎样才能避免这种情况?(更安全的替换方法?)

你以前有过这种情况吗?

笔记:

  • 在这个文档中发生了很多替换——大约 300 多个——这个文档很大。
  • 当更换过程进行时。我可以看到 Dreamweaver快速滚动浏览源代码。

    而不是立即替换所有替换(这与其他文本编辑软件所做的不同)。

    这是正常的吗?

    这让我怀疑问题出在替换速度上——延迟使 Dreamweaver 以某种方式将文本替换为先前替换的文本...?--(但 idk,idk Dreamweaver 内部是如何实现的..)

  • (只有很少的错误——但是,这不应该发生。)

  • (替换是专门针对html标签的<p>(我不认为这是问题))

  • (更确切地说,

    原文是create new <code class="literal">Flux</code> and <code class="literal">Mono</code> instances

    而不仅仅是create new Flux and Mono instances

    (为了可读性,我把它简化了,但没关系)。)

  • 使用 Dreamweaver 2021 版

  • 替换在.xhtml文件上

  • 由于正则表达式模式,它不仅被替换为其他字符串(即正则表达式模式)|。

    也有一些字符消失的错误......(虽然,这种情况更加罕见)

    例如:blockhound变成lockhound;1.0.1.RELEASE变成.0.1.RELEASE;

    (我不只是做了 1 个正则表达式替换模式,还有其他我应用于此文档的模式;

    但是,上面的这两个,当然不应该在我用于本文档的任何正则表达式模式中匹配......)

  • 我做了另一个正则表达式替换测试,通过使用中的选项(Documents in) Folder ...,而不是Current Document

    -- 所以,scrolling不会出现的效果(并且这似乎过程更快)

    尽管如此,即使这样,仍然存在错误。

    scrolling- 因此,似乎错误的发生与Dreamweaver 中的正则表达式替换无关。



下面是上面的简化示例(如果上面包含冗余信息)

简而言之:

0 . 你有一些文字

<p>AA and BB</p>

1. 如果您使用的正则表达式模式包含or语法|,例如:

(( and )|( that ))

2. 并且您的替换包含一个捕获组$1,例如:

$1foobar

3. 并且您在 Dreamweaver 中针对特定标签执行正则表达式替换所有内容,例如:(我称之为)<p>specific-targeting-replacement tag

4.

  • 并且有一个标签(一个不是 type 的标签<p>),上面<p>说:(<li>我称之为non-specific-targeting-replacement tag)

  • 并<li> 包含单词that(该单词出现在与相邻that的正则表达式模式(带有语法)中),orand

    • (我称这个词that为adjacent-replacement word

    • 我称这个词and为to-be-replaced word)

<ul>
  <li>xxxx that xxxx</li>
</ul>

5. 那么该标签<p> 下方的标签中的文本<li>将被替换为错误的字符串(the adjacent-replacement word)。例如:

它应该被替换为(正确的)

<p>AA and foobarBB</p>

但它可能被替换为(错误)

<p>AA that foobarBB</p>

img:创建此错误的文件和过程

测试文件:

<!DOCTYPE html>
<html>
<body>
<p>AA and BB</p>
<p>AA and BB</p>
<ul>
  <li>xxxx that xxxx</li>
</ul>
<p>AA and BB</p>
<p>AA and BB</p>
</body>
</html>
regex notepad++
  • 2 2 个回答
  • 522 Views

2 个回答

  • Voted
  1. Nor.Z
    2022-06-23T08:43:26+08:002022-06-23T08:43:26+08:00

    这是一个错误。目前在 Dreamweaver 中没有很好的解决方案。但是有一些解决方法。

    解决方案1(解决方法,非常有限)

    使用Dreamweaver时。不要在您的正则表达式模式中使用|( or) 语法和语法。grouping

    您需要一一更换。(这是对正则表达式的巨大限制)

    • (不过,似乎有一种方法可以批处理replacement queries=>链接。)

    解决方案2(推荐)

    使用Calibre & Python函数进行替换。

    例如::

    import re
    
    # // finds and replaces the content inside tag <p> -- by `re.sub`
    def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    
        # // use the following Regex for finding in Calibre
        # (?P<tag_opening><p(|\s+[^>]*)>)(?P<content_inside_tag>.*?)(?P<tag_closing><\/p\s*>)
        m_p1 = match.group("tag_opening")
        m_main = match.group("content_inside_tag")
        m_p2 = match.group("tag_closing")
    
        m_main_re = m_main
        
        m_main_re = re.sub("(( and )|( that ))", "\\g<0><br>\n", m_main_re)
    
        # print(m_main_re)
        # return match.group(0)
        return m_p1 + m_main_re + m_p2
    
    

    解决方案3(推荐)

    与解决方案 2 相同,但使用Java代码。

    package com.ex.main;
    
    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileReader;
    import java.io.IOException;
    import java.nio.charset.StandardCharsets;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    /*
    @to_use: 
    1. put the files that need to be regex replaced in `input folder` 
    2. 
    change the regex pattern in 
    regexReplaceFileContent()
    regexReplaceFileContent_innerMatch()
    3. run 
    4. get output filed from `output folder`
    */
    public class RegexReplaceFile {
    
      //################################################################################################
    
      // loop & read the files in input folder & invoke regex replace for each file & output the replaced file to output folder
      // (files in input folder will not be modified)
      public static void regexReplaceAndReadWriteFilesInFolder(String path_InputFolder, String path_OutputFolder) {
        File dir_InputFolder = new File(path_InputFolder);
        File[] arr_file_InInputFolder = dir_InputFolder.listFiles();
        if (arr_file_InInputFolder.length == 0) {
          System.out.println("Folder is empty.");
          return;
        }
        for (File file_curr : arr_file_InInputFolder) {
          if (file_curr.isDirectory()) {
            System.out.println("Directory: " + file_curr.getAbsolutePath());
            // showFiles(file_curr.listFiles()); // recursion (if need to loop inside sub folders)
          } else {
            String path_InputFile = file_curr.getAbsolutePath();
            String path_OutputFile = path_OutputFolder + "/" + file_curr.getName();
            System.out.println("Input  File: " + path_InputFile);
            System.out.println("Output File: " + path_OutputFile);
            // do the regex replace for each file
            regexReplaceAndReadWriteFile(path_InputFile, path_OutputFile);
          }
        }
      }
    
      // ^^ do the regex replace for each file
      public static void regexReplaceAndReadWriteFile(String path_InputFile, String path_OutputFile) {
        // >> read content from input file
        StringBuilder contentBuilder = new StringBuilder();
        try {
          BufferedReader in = new BufferedReader(new FileReader(path_InputFile));
          String str;
          while ((str = in.readLine()) != null) {
            contentBuilder.append(str);
            contentBuilder.append("\n");
          }
          in.close();
        } catch (IOException e) {
          e.printStackTrace();
        }
        String content = contentBuilder.toString();
        //    System.out.println(content);
    
        // >> regex replace
        String content_re = regexReplaceContent_SpecificToATag(content);
    
        // >> write content to output file
        byte[] byte_Content = content_re.getBytes(StandardCharsets.UTF_8);
        Path file = Paths.get(path_OutputFile);
        try {
          Files.write(file, byte_Content);
        } catch (IOException e) {
          e.printStackTrace();
        }
    
      }
    
      //##########################################
    
      // ## loop replace with `StringBuilder.append & inner replace` 
      public static String regexReplaceContent_SpecificToATag(String content, boolean det_OmitSyso) {
        // >> find content specific to tag <p> 
        final String regexNamedGroup_tagOpening = "tagOpening";
        final String regexNamedGroup_contentInsideTag = "contentInsideTag";
        final String regexNamedGroup_tagClosing = "tagClosing";
    
        String content_SearchOn = content;
        String str_RegexPattern = "(?s)(?<tagOpening><p(|\s+[^>]*)>)(?<contentInsideTag>.*?)(?<tagClosing></p\s*>)"; // @to_use-param;
        Pattern pattern = Pattern.compile(str_RegexPattern);
        Matcher matcher = pattern.matcher(content_SearchOn);
    
        // >> for the content in each (found) paragraph <p>, do an replace
        StringBuilder sb_ContentSearchOn = new StringBuilder(content_SearchOn);
        StringBuilder content_Replaced = new StringBuilder();
        int ind_MatchGroupEnd_prev = 0;
        int ind_MatchGroupEnd_curr;
        int ind_MatchGroupStart_curr;
        while (matcher.find()) {
          // 
          ind_MatchGroupStart_curr = matcher.start(regexNamedGroup_contentInsideTag);
          ind_MatchGroupEnd_curr = matcher.end(regexNamedGroup_contentInsideTag);
    
          String content_BeforeMatchGroup = sb_ContentSearchOn.substring(ind_MatchGroupEnd_prev, ind_MatchGroupStart_curr); // prev end to curr start, not start to end
    
          content_Replaced.append(content_BeforeMatchGroup);
    
          // ^^ for the content in each (found) paragraph <p>, do an replace -- [inner match]
          String content_SearchOn_innerMatch__TheMatchGroup = matcher.group(regexNamedGroup_contentInsideTag);
          String content_Replaced_innerMatch = regexReplaceContent_SpecificToATag_innerMatch(content_SearchOn_innerMatch__TheMatchGroup);
    
          content_Replaced.append(content_Replaced_innerMatch);
    
          // 
          ind_MatchGroupEnd_prev = ind_MatchGroupEnd_curr;
        }
    
        // append the content after the last match group
        String content_AfterLastMatchGroup = sb_ContentSearchOn.substring(ind_MatchGroupEnd_prev, sb_ContentSearchOn.length());
        content_Replaced.append(content_AfterLastMatchGroup);
    
        // >>
        if (det_OmitSyso) {
          System.out.println(content_Replaced.substring(0, 300) + "\n[... omitted]\n");
        } else {
          System.out.println(content_Replaced);
        }
        return content_Replaced.toString();
      }
    
      public static String regexReplaceContent_SpecificToATag(String content) {
        boolean det_OmitSyso = true;
        return regexReplaceContent_SpecificToATag(content, det_OmitSyso); // 
      }
    
      // ^^ for the content in each (found) paragraph <p>, do an replace -- [inner match]
      private static String regexReplaceContent_SpecificToATag_innerMatch(String content_SearchOn_innerMatch__TheMatchGroup) {
        String str_RegexPattern_innerMatch;
        String str_Substitution_innerMatch;
        String content_Replaced_innerMatch = content_SearchOn_innerMatch__TheMatchGroup;
    
        // 
        //    str_RegexPattern_innerMatch = "( and then )";
        //    str_Substitution_innerMatch = " and then <br>\n";
        //    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
        //
        str_RegexPattern_innerMatch = "(( and )|( that ))";
        str_Substitution_innerMatch = "$0<br>\n";
        content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
        //
        //    str_RegexPattern_innerMatch = "((\\. )|(, ))";
        //    str_Substitution_innerMatch = "$1<br>\n";
        //    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
        //
        //    str_RegexPattern_innerMatch = "( by )";
        //    str_Substitution_innerMatch = " <br>\n++ by ";
        //    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
    
        return content_Replaced_innerMatch;
      }
    
      //################################################################################################
    
      static final String content_TESTING = "    <p>These days if you are an android developer, you might hear some hype about RxJava. \n"
                                            + "RxJava is library which can help you get rid of all you complex write-only code that deals with asynchronous events. Once you start using it in your project – you will use it everywhere.</p>\n"
                                            + "\n"
                                            + "<p>The main pitfall here is steep learning curve. If you have never used RxJava before, it will be hard or confusing to take full advantage of it for the first time. The whole way you think about writing code is a little different.\n"
                                            + "Such learning curve creates problems for massive RxJava adoption in most projects.</p>\n"
                                            + "\n"
                                            + "<p>Of course there are a lot of tutorials and code examples around that explain how to use RxJava. \n"
                                            + "Developer interested in learning and using RxJava can first visit  the official <a href=\"https://github.com/ReactiveX/RxJava/wiki\">Wiki</a> that contains great explanation of what Observable is, how it’s related to Iterable and Future. Another useful resource is <a href=\"https://github.com/ReactiveX/RxJava/wiki/How-To-Use-RxJava\">How To Use RxJava</a> page which shows code examples of how to emit items and <code class=\"highlighter-rouge\">println</code> them.</p>\n"
                                            + "\n"
                                            + "<p>But what one really wants to know, is what problem RxJava will solve and how it will help organize async code without actually learning what Observable is.</p>\n"
                                            + "\n"
                                            + "<p>My goal here is to show some “prequel” to read before the official documentation in order to better understand the problems that RxJava tries to solve.\n"
                                            + "This article is positioned as a small walk-through on how to reorganize messy Async code by ourselfs to see how we can implement basic principles of RxJava without actually using RxJava.</p>\n"
                                            + "\n"
                                            + "<p>So If you are still curious let’s get started!</p>\n"
                                            + "\n"
                                            + "<h2 id=\"cat-app\">Cat App</h2>\n"
                                            + "<p>So let’s create a <em>real world</em> example. So we know that cats are the engine of technology progress, so let’s build \n"
                                            + "a typical app for downloading cat pictures.</p>\n"
                                            + "\n"
                                            + "<h4 id=\"so-here-is-the-task\">So here is the task:</h4>\n"
                                            + "<blockquote>\n"
                                            + "  <p>We have a webservice that provides api to search the whole internet for\n"
                                            + "images of cats by given query. Every image will contain cuteness\n"
                                            + "parameter - integer value that describes how cute is that picture. Our\n"
                                            + "task will be download a list of cats, choose the most cutest, and save\n"
                                            + "it to local storage.</p>\n"
                                            + "</blockquote>\n"
                                            + "\n"
                                            + "<p>We will focus only on downloading, processing and saving cats data.</p>\n"
                                            + "\n"
                                            + "<p>So let’s start:</p>\n"
                                            + "";
    
      public static void main(String[] args) throws Exception {
        // >> @to_use @M2 direclty input text
        boolean det_OmitSyso = false;
        regexReplaceContent_SpecificToATag(content_TESTING, det_OmitSyso);
    
        //    // >> @to_use @M1 input text from folder
        //    regexReplaceAndReadWriteFilesInFolder("G:/wsp/eclipse/RegexReplaceFile_AT_tool_AT_NT/src/main/resources/input__RegexReplace",
        //                                          "G:/wsp/eclipse/RegexReplaceFile_AT_tool_AT_NT/src/main/resources/output_RegexReplace");
    
      }
    
    }
    
    
    • 0
  2. Best Answer
    Toto
    2022-06-27T04:12:24+08:002022-06-27T04:12:24+08:00

    这是使用 Notepad++ 的解决方案:

    • Ctrl+H
    • 找什么:(?:<p>|\G)(?:(?!</p>).)*?\b(?:and|that|includes?|including)\b\K(?=.*?</p>)
    • 用。。。来代替:foobar
    • 检查 环绕
    • CHECK 正则表达式
    • 查看 . matches newline
    • Replace all

    解释:

    (?:         # non capture group
        <p>         # openning tag
      |           # OR
        \G          # restart from last match position
    )           # end group
            # Tempered Greedy Token
    (?:         # non capture group
        (?!</p>)    # negative lookahead, make sure we haven't </p> just after
        .           # any character
    )*?         # end group, may appear 0 or more times, not greedy
    \b          # word boundary
    (?:         # non capture   group
        and         # literally
      |           # OR
        that        # literally
      |           # OR
        includes?   # literally include OR includes
      |           # OR
        including   # literally
    )           # end group
    \b          # word boundary
    \K          # reset operator, forget all we have seen until this position
    (?=         # positive lookahead, make sure we have after:
        .*?         # 0 or more any character, noot greedy
        </p>        # closing tag
    )           # end lookahead
    

    截图(之前):

    在此处输入图像描述

    截图(之后):

    在此处输入图像描述

    • 0

相关问题

  • Notepad++ 删除直到冒号替换所有行

  • OneDrive 有 .gitignore 吗?

  • NotePad++ 用户定义语言不显示条件语句

  • 如果一个字符串出现在正则表达式中的另一个字符串之前,如何停止搜索

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何减少“vmmem”进程的消耗?

    • 11 个回答
  • Marko Smith

    从 Microsoft Stream 下载视频

    • 4 个回答
  • Marko Smith

    Google Chrome DevTools 无法解析 SourceMap:chrome-extension

    • 6 个回答
  • Marko Smith

    Windows 照片查看器因为内存不足而无法运行?

    • 5 个回答
  • Marko Smith

    支持结束后如何激活 WindowsXP?

    • 6 个回答
  • Marko Smith

    远程桌面间歇性冻结

    • 7 个回答
  • Marko Smith

    子网掩码 /32 是什么意思?

    • 6 个回答
  • Marko Smith

    鼠标指针在 Windows 中按下的箭头键上移动?

    • 1 个回答
  • Marko Smith

    VirtualBox 无法以 VERR_NEM_VM_CREATE_FAILED 启动

    • 8 个回答
  • Marko Smith

    应用程序不会出现在 MacBook 的摄像头和麦克风隐私设置中

    • 5 个回答
  • Martin Hope
    Saaru Lindestøkke 为什么使用 Python 的 tar 库时 tar.xz 文件比 macOS tar 小 15 倍? 2021-03-14 09:37:48 +0800 CST
  • Martin Hope
    CiaranWelsh 如何减少“vmmem”进程的消耗? 2020-06-10 02:06:58 +0800 CST
  • Martin Hope
    Jim Windows 10 搜索未加载,显示空白窗口 2020-02-06 03:28:26 +0800 CST
  • Martin Hope
    v15 为什么通过电缆(同轴电缆)的千兆位/秒 Internet 连接不能像光纤一样提供对称速度? 2020-01-25 08:53:31 +0800 CST
  • Martin Hope
    andre_ss6 远程桌面间歇性冻结 2019-09-11 12:56:40 +0800 CST
  • Martin Hope
    Riley Carney 为什么在 URL 后面加一个点会删除登录信息? 2019-08-06 10:59:24 +0800 CST
  • Martin Hope
    zdimension 鼠标指针在 Windows 中按下的箭头键上移动? 2019-08-04 06:39:57 +0800 CST
  • Martin Hope
    jonsca 我所有的 Firefox 附加组件突然被禁用了,我该如何重新启用它们? 2019-05-04 17:58:52 +0800 CST
  • Martin Hope
    MCK 是否可以使用文本创建二维码? 2019-04-02 06:32:14 +0800 CST
  • Martin Hope
    SoniEx2 更改 git init 默认分支名称 2019-04-01 06:16:56 +0800 CST

热门标签

windows-10 linux windows microsoft-excel networking ubuntu worksheet-function bash command-line hard-drive

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve