对于给定的字符串(通常是一个段落),我想替换一些单词/短语,但如果它们碰巧以某种方式被标签包围,则忽略它们。这也需要不区分大小写。
以此为例:
You can find a link here <a href="#">link</a> and a lot of things in different styles. Public platform can appear in bold: <b>public platform</b>, and we also have italics here too: <i>italics</i>. While I like soft pillows I am picky about soft <i>pillows</i>. While I want to find fox, I din't want foxes to show up. The text "shiny fruits" is in a span tag: one of the <span>shiny fruits</span>.
假设我想替换这些词:
link:出现 2 次。第一个是纯文本(匹配),第二个是 A 标记(忽略)公共平台:纯文本(匹配,不区分大小写),B 标记中的第二个(忽略)softpillows:1 个纯文本匹配。fox:1 个纯文本匹配。它查看完整的单词。fruits:纯文本(匹配),span 标记中的第二个(忽略)与其他文本作为背景;我正在搜索短语匹配(不是单个单词)并将匹配链接到相关页面。
我想避免嵌套 HTML(粗体标签内没有链接,反之亦然)或其他错误(例如:the <a href="#">phrase <b>goes</ a> 这里</b>)
我尝试了几种方法,例如搜索已删除 HTML 内容的经过清理的文本副本,虽然这告诉我存在匹配项,但我遇到了将其映射回原始内容的全新问题。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号
我发现了关于正则表达式否定前瞻的提及,并且在打破我的想法之后得到这个正则表达式(假设你有VALID html标签配对)
// made function a bit ugly just to try to show how it comes together public function replaceTextOutsideTags($sourceText = null, $toReplace = 'inner text', $dummyText = '(REPLACED TEXT HERE)') { $string = $sourceText ?? "Inner text You can find a link here link and a lot of things in different styles. Public platform can appear in bold: public platform, and we also have italics here too: italics. While I like soft pillows I am picky about soft pillows. While I want to find fox, I din't want foxes to show up. The text \"shiny fruits\" is in a span tag: one of the shiny fruits. The inner text like this inner inner text here to test too, event inner text omg thats sad... or not "; // it would be nice to use [[:punct:]] but somehow regex thinks that are also punctuation marks $punctuation = "\.,!\?:;\\|\/=\"#"; // this part might take additional attention but you get the point $stringPart = "\b$toReplace\b"; $excludeSequence = "(?![\w\n\s>$punctuation]*?"; $excludeOutside = "$excludeSequence)"; // note on closing ) $pattern = "/" . $stringPart . $excludeOutside . $excludeTag . "/im"; return preg_replace($pattern, $dummyText, $string); }带有默认参数的示例输出
""" (REPLACED TEXT HERE)\r\n You can find a link here link and a lot \r\n of things in different styles. Public platform can appear in bold: \r\n public platform, and we also have italics here too: italics. \r\n While I like soft pillows I am picky about soft pillows. \r\n While I want to find fox, I din't want foxes to show up.\r\n The text "shiny fruits" is in a span tag: one of the shiny fruits.\r\n The (REPLACED TEXT HERE) like this inner inner text here to test too, event (REPLACED TEXT HERE)\r\n omg thats sad... or not """现在一步一步
pillowS,我们就不需要pillow)\w单词符号、\s空格或\n换行符和 允许以开始结束标记结尾的标点符号 - 我们不需要这个匹配,这里出现了否定的先行(?![\w\n\s>$标点符号]*?。在这里我们可以确定匹配不会进入新标签,因为不在描述的序列中($excludeOutside变量)$excludeTag变量与$excludeOutside基本相同,但适用于$toReplace可以是 html 标签本身的情况,例如一个请注意,此代码无法使用
或>覆盖文本,并且使用这些符号可能会导致意外行为