正则表达式 lookaround

    技术2025-06-15  23

     

    正则表达式

     

    Lookaround

    正则表达式中两个有用的结构: 向前查找 & 向后查找 (lookahead & lookbehind), 统称为 lookaround.

    lookaround 去判断一个模式在字符串中是否存在, 不管匹配是否成功, 都不返回, 因此 lookaround 也称为断言(assertions).

    lookaround 就像字符串的开始结束, 单词边界锚点(^ $ /b)一样, 是 "zero-width assertions".

     

    Lookahead

     

    Positive lookahead

    my(?=here) 会匹配 myhere 成功,匹配其中的 my , (?=here) 用来判断 my 之后是否有 here ,有就匹配成功, 否则匹配失败; 匹配成功后 here 不返回;

    my(?=here)$ 匹配 myhere 失败, 因为 (?=here) 只是判断 my 之后是否有 here, 他只是一种判定, 并不匹配, 因此 myehre 中 here

    不能与 $ 匹配, 因此 my(?=here)$ 匹配 myhere 失败;

     

    Negative lookahead

    my(?!here) 若字符串中有 my, 且 my 后跟的不是 here, 则匹配成功

     

     

    Lookbebind

     

    Positive lookbehind

    (?<=my)here 若字符串中有 here, 且之前是 my, 则匹配成功

     

    Negative lookbind

    (?<!my)here 若字符串中有 here, 且之前不是 my, 则匹配成功

     

    JavaScript 中只有 lookahead, 没有 lookbehind

    'myhere'.match(/my(?=here)/); // 匹配成功, 得到 my 'myhere'.match(/my(?=here)$/); // 匹配失败 

     

    PHP 中有 lookahead 和 lookbehind

    $str = 'myhere'; $positive_lookahead = '/my(?=here)/'; $negative_lookahead = '/my(?!here)/'; $positive_lookbehind = '/(?<=my)here/'; $negative_lookbehind = '/(?<!my)here/'; preg_match( $positive_lookahead , $str, $matches1); // 匹配成功 preg_match( $negative_lookahead , $str, $matches2); // 匹配失败 preg_match( $positive_lookbehind, $str, $matches3); // 匹配成功 preg_match( $negative_lookbehind, $str, $matches4); // 匹配失败  

     

    Lookaround 和 non-capturing group 的区别:

    1, lookaround 是判断, 不匹配字符(不消耗字符);他只是判断字符是否存在, lookaround 之后的正则会当 lookaround 不存在接着匹配(匹配成功的情况)

    2, non-capturing group 匹配, 但是不编号

    // 利用 lookaround 在字符串中去掉重复单词 var str= 'a aa a a-b a-a a-b '; var reg = /(?:^|/s)(/S+)(?=/s+(?:/S+/s+)*/1(?:/s|$))/g; console.log(str.replace(reg,"")); // 如果正则中的 lookahead 换成 non-capturing group 会达不到去重的效果, 因为 non-capturing group 会消耗字符。 // // 'a aa a a-b a-a a-b '.match(/(?:^|/s)(/S+)(?=/s+(?:/S+/s+)*/1(?:/s|$))/g); // ["a", " a-b"] // 'a aa a a-b a-a a-b '.match(/(?:^|/s)(/S+)(?=/s+(?:/S+/s+)*/1(?:/s|$))/g); // ["a aa a "] // // 正则来自(有改动) // http://www.cnblogs.com/rubylouvre/archive/2011/01/27/1946397.html 

    non-capturing group 与 capturing group 的异同:

    1, 两者都是分组

    2, non-capturing group 分组不编号

    3, capturing group 分组编号

    4, non-capturing group 用在需要对一些字符组合构成一个分组(一个整体), 但是不需要从匹配结果中单独获取这个分组(non-capturing group 不编号); non-capturing group 性能会比 capturing group 好

     

     

     

    正则表达式的效率:

    1, Eliminated unnecessary use of capturing group.

    2, Eliminated unnecessary use of alternation.

    3, Eliminated inefficient use of lazy quantifier.

    4, Implemented "unrolling-the-loop" efficiency technique.

    5, A successful match is significantly faster.

    6, Added non-capture group and beginning of string anchor.

    7, A successful non-match is significantly faster.

    8, Added non-capture group and word boundary anchor.

    9, Linefeed added to regex to allow table to fit on screen.

    10, Negligible performance improvement.

    11, Reduced code size.

    12, Added "i" ignore-case modifier.

    13, The old and new regexes can match different text.

    14, Removed unnecessary backslash escapes.

    15, Removed unnecessary callback function.

    16, Removed unnecessary local variables.

    17, Added non-capture group and exposed literal starting text.

    18, Literal RegExp object should be cached.

     

     

    参考:

    Lookaround

    正则表达式的效率

     

    最新回复(0)