tokenizer用法，提取字符串中的单词

技术2022-05-20 30

#include <boost/tokenizer.hpp>

template < class TokenizerFunc = char_delimiters_separator<char>, class Iterator = std::string::const_iterator, class Type = std::string > class tokenizer

string s = "this/is-a,word."; tokenizer<>tok(s); for(tokenizer<>::iterator beg = tok.begin(); beg != tok.end(); ++beg) cout<<*beg<<" ";

tokenizer会忽略字符串中除了字母以外所有符号，将字母提取出来，这段程序的结果就是 this is a word

如果不想忽略一些符号，则需要配合char_separator来使用，指定要忽略的符号

example：

#include <iostream> #include <boost/tokenizer.hpp> #include <string> int main() { std::string str = ";;Hello|world||-foo--bar;yow;baz|"; typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> sep("-;|"); tokenizer tokens(str, sep); for (tokenizer::iterator tok_iter = tokens.begin(); tok_iter != tokens.end(); ++tok_iter) std::cout << "<" << *tok_iter << "> "; std::cout << "/n"; return EXIT_SUCCESS; }

下面是保留空格的方法：

#include <iostream> #include <boost/tokenizer.hpp> #include <string> int main() { std::string str = ";;Hello|world||-foo--bar;yow;baz|"; typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens); tokenizer tokens(str, sep); for (tokenizer::iterator tok_iter = tokens.begin(); tok_iter != tokens.end(); ++tok_iter) std::cout << "<" << *tok_iter << "> "; std::cout << "/n"; return EXIT_SUCCESS; }

专利

最新回复(0)