tokenizer用法,提取字符串中的单词

    技术2022-05-20  30

    #include <boost/tokenizer.hpp>

     

    template < class TokenizerFunc = char_delimiters_separator<char>, class Iterator = std::string::const_iterator, class Type = std::string > class tokenizer

     

    string s = "this/is-a,word."; tokenizer<>tok(s); for(tokenizer<>::iterator beg = tok.begin(); beg != tok.end(); ++beg) cout<<*beg<<" "; 

    tokenizer会忽略字符串中除了字母以外所有符号,将字母提取出来,这段程序的结果就是 this is a word

     

    如果不想忽略一些符号,则需要配合char_separator来使用,指定要忽略的符号

     

    example:

    #include <iostream> #include <boost/tokenizer.hpp> #include <string> int main() { std::string str = ";;Hello|world||-foo--bar;yow;baz|"; typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> sep("-;|"); tokenizer tokens(str, sep); for (tokenizer::iterator tok_iter = tokens.begin(); tok_iter != tokens.end(); ++tok_iter) std::cout << "<" << *tok_iter << "> "; std::cout << "/n"; return EXIT_SUCCESS; } 

     

    下面是保留空格的方法:

    #include <iostream> #include <boost/tokenizer.hpp> #include <string> int main() { std::string str = ";;Hello|world||-foo--bar;yow;baz|"; typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens); tokenizer tokens(str, sep); for (tokenizer::iterator tok_iter = tokens.begin(); tok_iter != tokens.end(); ++tok_iter) std::cout << "<" << *tok_iter << "> "; std::cout << "/n"; return EXIT_SUCCESS; } 

     


    最新回复(0)