Just use the StringToWordVector filter in batch mode ("-b") fromcommandline. Then you can supply another input/ouput pair of files ("-r"and "-s") which will be processed with the same filter setup as the firstpair ("-i" and "-o"). In other words -i/-o is for your training file and-r/-s is for your test file.java -Xmx1024m weka.filters.unsupervised.attribute.StringToWordVector -C -R 2 -W 10000 -I -N 1 -L -tokenizer weka.core.tokenizers.AlphabeticTokenizer -M 2 -b -i ../blog07_train.arff -o ../blog07_train_vec.arff -r ../blog07_test.arff -s ../blog07_test_vec.arff