一:编译boost的正则表达式需要编译(如果不需要全部Boost的功能的话,请不要build all boost,那会花掉好几个小时。我推荐仅仅build需要的库就好。)原有的boost 1.33似乎使用vc8编译的时候有问题。下载boost 1.34.1,使用“Visual Studio 2005 Command Prompt”,进入到boost_1_34_1/libs/regex/build:nmake vc8.makOK,生成的文件在vc80下。二:学习正则表达式deelx_zh.rar不错的正则表达式的学习资料,顺便推荐一下:http://www.regexlab.com/ 这个站长还与我有个一信之缘(我写的P2P之UDP穿透NAT的原理与实现(附源代码))。站长的这个正则库在CodeProject获得了不错的评价。三:简单的例子 std::string regstr = "a+"; boost::regex expression_r(regstr); std::string testString = "aaa"; // 匹配至少一个a if( boost::regex_match(testString, expression) ) { std::cout<< "Match" << std::endl; } else { std::cout<< "Not Match" << std::endl; }
四:regex_match例子代码学习 1 我们经常会看一个字符串是不是合法的IP地址,合法的IP地址需要符合以下这个特征: xxx.xxx.xxx.xxx 其中xxx是不超过255的整数 正则表达式找到上面的这种形式的字符串相当容易,只是判断xxx是否超过255就比较困难了(因为正则表达式是处理的文本,而非数字) OK,我们先来处理一个数字,即:xxx。找到一种表达式来处理这个数字,并且保证这个数字不会超过255 第一种情况:x,即只有一个数字,它可以是0~9 ,用/d 表示 第二种情况:xx,即有两个数字,它可以是00~99,用/d/d 表示 第三种情况:xxx,这种情况分为两种,一种是 1xx,可以用 1/d/d 表示 另外一种是 2xx,这又分为两种 2[1234]/d 和 25[12345] 好了组合起来 1?/d{1,2}|2[1234]/d|25[12345] 既可以标识一个不大于255的数字字符串 嗯,我们现在需要重复这种情况既可: (1?/d{1,2}|2[1234]/d|25[12345])/.(1?/d{1,2}|2[1234]/d|25[12345])/.(1?/d{1,2}|2[1234]/d|25[12345])/.(1?/d{1,2}|2[1234]/d|25[12345]) 呵呵,长是长了点,我试图用boost支持的子表达式缩短,但是没有达到效果,请各位了解boost的正则表达式的达人指点: (1?/d{1,2}|2[1234]/d|25[12345])/./1$/./1$/./1$ (参看反向索引: http://www.boost.org/libs/regex/doc/syntax_perl.html 似乎反向只能匹配与第一个字符完全一样的字符串,与我们的需求不同) Example: std:: string regstr = " (1?//d{1,2}|2[1234]//d|25[12345])//. (1?//d{1,2}|2[1234]//d|25[12345])//. (1?//d{1,2}|2[1234]//d|25[12345])//. (1?//d{1,2}|2[1234]//d|25[12345]) " ;boost::regex expression_r(regstr);std:: string testString = " 192.168.4.1 " ; if ( boost::regex_match(testString, expression) ){ std::cout << " This is ip address " << std::endl;} else { std::cout << " This is not ip address " << std::endl;} 2 我们来看看 regex_match的另外一个函数原型 template <class ST, class SA, class Allocator, class charT, class traits> bool regex_match(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const basic_regex <charT, traits>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class Allocator, class charT, class traits>bool regex_match(BidirectionalIterator first, BidirectionalIterator last,match_results<BidirectionalIterator, Allocator>& m,const basic_regex <charT, traits>& e,match_flag_type flags = match_default); 注意参数m,如果这个函数返回false的话,m无定义。如果返回true的话,m的定义如下Element
Value
m.size()
e.mark_count()
m.empty()
false
m.prefix().first
first
m.prefix().last
first
m.prefix().matched
false
m.suffix().first
last
m.suffix().last
last
m.suffix().matched
false
m[0].first
first
m[0].second
last
m[0].matched
true if a full match was found, and false if it was a partial match (found as a result of thematch_partial flag being set).
m[n].first
For all integers n < m.size(), the start of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last.
m[n].second
For all integers n < m.size(), the end of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last.
m[n].matched
For all integers n < m.size(), true if sub-expression n participated in the match, false otherwise.
Example: std:: string regstr = " (1?//d{1,2}|2[1234]//d|25[12345])//.(1?//d{1,2}|2[1234]//d|25[12345])//.(1?//d{1,2}|2[1234]//d|25[12345])//.(1?//d{1,2}|2[1234]//d|25[12345]) " ;boost::regex expression_r(regstr);std:: string testString = " 192.168.4.1 " ;boost::smatch what; if ( boost::regex_match(testString, what, expression) ){ std::cout << " This is ip address " << std::endl; for ( int i = 1 ;i <= 4 ;i ++ ) { std:: string msg(what[i].first, what[i].second); std::cout << i << " : " << msg.c_str() << std::endl; }} else { std::cout << " This is not ip address " << std::endl;} 这个例子会把所有的IP的单个数字答应出来: This is ip address 1:192 2:168 3:4 4:1五:regex_search学习regex_search与regex_match基本相同,只不过regex_search不要求全部匹配,即部份匹配(查找)即可。简单例子:
std:: string regstr = " (//d+) " ;boost::regex expression_r(regstr);std:: string testString = " 192.168.4.1 " ;boost::smatch what; if ( boost::regex_search(testString, expression) ){ std::cout << " Have digit " << std::endl; }上面这个例子检测给出的字符串中是否包含数字。好了,再来一个例子,用于打印出所有的数字
std:: string regstr = " (//d+) " ;boost::regex expression_r(regstr);std:: string testString = " 192.168.4.1 " ;boost::smatch what;std:: string ::const_iterator start = testString.begin();std:: string ::const_iterator end = testString.end(); while ( boost::regex_search(start, end, what, expression) ){ std::cout << " Have digit: " ; std:: string msg(what[ 1 ].first, what[ 1 ].second); std::cout << msg.c_str() << std::endl; start = what[ 0 ].second;}打印出:Have digit:192Have digit:168Have digit:4Have digit:1
六:关于重复的贪婪 我们先来一个例子: std:: string regstr = " (.*)(age)(.*)(//d{2}) " ;boost::regex expression_r(regstr);std:: string testString = " My age is 28 His age is 27 " ;boost::smatch what;std:: string ::const_iterator start = testString.begin();std:: string ::const_iterator end = testString.end(); while ( boost::regex_search(start, end, what, expression) ){ std:: string name(what[ 1 ].first, what[ 1 ].second); std:: string age(what[ 4 ].first, what[ 4 ].second); std::cout << " Name: " << name.c_str() << std::endl; std::cout << " Age: " << age.c_str() << std::endl; start = what[ 0 ].second;} 我们希望得到的是打印人名,然后打印年龄。但是效果令我们大失所望: Name:My age is 28 His Age:27 嗯,查找原因:这是由于"+"号或者"*"号等重复符号带来的副作用,这些符号会消耗尽可能多的输入,使之是“贪婪”的。即正则表达式(.*)会匹配最长的串,而不是匹配最短的成功串。 如何使得这些重复的符号不再“贪婪”,我们在重复符号后加上"?"即可。 std:: string regstr = " (.*?)(age)(.*?)(//d{2}) " ;boost::regex expression_r(regstr);std:: string testString = " My age is 28 His age is 27 " ;boost::smatch what;std:: string ::const_iterator start = testString.begin();std:: string ::const_iterator end = testString.end(); while ( boost::regex_search(start, end, what, expression) ){ std:: string name(what[ 1 ].first, what[ 1 ].second); std:: string age(what[ 4 ].first, what[ 4 ].second); std::cout << " Name: " << name.c_str() << std::endl; std::cout << " Age: " << age.c_str() << std::endl; start = what[ 0 ].second;} 打印输出: Name:My Age:28 Name: His Age:27 七: regex_replace 学习 写了个去除左侧无效字符(空格,回车,TAB)的正则表达式。 std:: string testString = " /r/n Hello World ! GoodBye World/r/n " ;std:: string TrimLeft = " ([//s//r//n//t]*)(//w*.*) " ;boost::regex expression_r(TrimLeft);testString = boost::regex_replace( testString, expression, " $2 " );std::cout << " TrimLeft: " << testString << std::endl; 打印输出: TrimLeft:Hello World ! GoodBye World 原文地址: http://www.kuqin.com/cpluspluslib/20070912/1033.html