Continuing our example, we set up our analysis of the low birth weight data as follows. We first choose the J48 algorithm as the learning scheme to be employed. For its options we set it to use binary splits and also enabled the SaveInstanceData, and left the rest as default values. We will look at the SaveInstanceData option (available in all tree classifier algorithms) option when examining the visualization tools available in Weka. For the sampling/test model, we choose the default 10 fold cross-validation, and lastly we choose ‘weight’ as the class variable.
At this point, the model parameters have been set up and we can begin to create it by clicking on the start button. After the model has been created we can then proceed to examine it by reviewing the generated output.
The explorer generates both graphical and text based results or outputs based on the model generated. Each model together with its performance results, is stored in the Result List on the bottom left of the window. Each time the Explorer builds a model it is saved and added to the Result list. This means that past models can be recalled by just clicking from the list the desired result. It is also possible to save a chosen model and reload it at a later time.
Depending on the type of learning scheme, some components will be different others will be the same regardless of learning scheme. We will now step through the results for the low birth weight example and explain each component of the text based output. The full text output is in the Reference Section 3.4.
Text Output: Run information
The first line of the run information section contains information about the learning scheme chosen and its parameters. The parameters chosen (both default and modified) are shown in short form. In our example, the learning scheme was ‘weka.classifiers.trees.J48’ or the J48 algorithm. Its parameters are shown as ‘-C 0.25 -B -M 2’, which states that the confidence factor for pruning is 0.25, to use binary splits and restrict the minimum number of instances in a leaf to 2 (grow the tree fully).
The second line shows information about the relation. Relations in Weka are like data files. The name of the relation contains in it the name of data file used to build it, and the names of filters that have been applied on it. In our example, the name of the relation is the name of the data file was ‘birth’, and two filters (both unsupervised) were applied on it. The first filter is used to drop the unneeded attributes 1 and 9(id, bwt) and the second filter was to discretize attributes 3-7 (Smoke, ptl, ht, ui, ftv).
The next part shows the number of instances in the relation, followed by the number of attributes. This is followed by the list of attributes. In our example, the number of instances was 180, and the number of attributes was 8. The full attribute list is: age, lwt, smoke, ptl, ht, ui, ftv, weight.
The last part show the type of testing that was employed; in our example it was 10-fold cross-validation.
Text Output: Classifier model (full training set)
This portion of the text output has been extracted and is displayed below:
=== Classifier model (full training set) ===
J48 pruned tree
------------------
ptl = 1
| ftv = 2: low (2.0)
| ftv != 2
| | age <= 31.0: normal (17.0/3.0)
| | age > 31.0: low (3.0)
ptl != 1: low (158.0/36.0)
Number of Leaves : 4
Size of the tree : 7
Time taken to build model: 0.02 seconds
It displays information about the model generated using the full training set. It mentions full training set because we used cross-validation and what is being displayed here is the final model that was built used all of the dataset to be generated. When using tree models, a text display of the generated tree is shown. This is followed by the information about the number of leaves and overall tree size (above).
This display is not quite as easy to read as a graphical display but since Weka also provides a graphical view it is not a problem and this will be shown in the next section when we examine the graphical visualization tools available in the Explorer. In the example shown above, the generated tree is displayed in text mode and shows that it has 4 leaves and a tree size of 7.
Text Output: Predictions on test data
This section displays the predictions on test data. The table shown is an extract of the full table shown in the Reference Section. We used 10-fold cross-validation and we have a total of 180 instances. This means that each fold of data that was used to test each of the 10 models had in it 18 instances and that there were a total of 180 tests. The table shows how each of the 10 models generated performed on each of the 18 instances used to test them (see next).
Time taken to build model: 0.02 seconds
=== Predictions on test data ===
inst#, actual, predicted, error, probability distribution
1 1:normal 1:normal *0.768 0.232
11 1:normal 1:normal *0.768 0.232
13 1:normal 1:normal *0.768 0.232
14 2:low 1:normal + *0.768 0.232
15 2:low 2:low 0.2 *0.8
16 2:low 1:normal + *0.768 0.232
17 2:low 1:normal + *0.768 0.232
18 2:low 2:low 0.2 *0.8
8 1:normal 2:low + 0.214 *0.786
For each instance in the test data the following is displayed: The instance number, followed by the actual classification then predicted classification. If the two are different, the error is indicated by a ‘+’ in the error column. Finally the probability of making a correct prediction indicated with a ‘*’ and an incorrect prediction are displayed in the last column.
In the table above it shows that instance #1 was predicted correctly as normal weight and that the prediction rule used has a general success rate of 76.8%. On the other hand, instance #14 was incorrectly predicted as normal weight and thus has a ‘+’ in the error column. Instance #15 was predicted correctly as low weight and that in general this prediction has a success rate of 80%. All of the mentioned instances were all taken from the first model created in the cross-validation. The last one, Instance #8, at the bottom was taken from the second model generated. It shows that this instance was incorrectly predicted as low weight and that the prediction rule in general has a success rate of 78.6%.
Text Output: Type of sampling
The next section has information on the type of sampling that was employed. In our example we use stratified cross-validation so it is indicated as such.
=== Stratified cross-validation ===
=== Summary ===
Text Output: Confusion Matrix
A confusion matrix is an easy way of describing the results of the experiment. The best way to describe it is by example. Returning to the birth weight example, the following table was generated.
=== Confusion Matrix ===
a b <-- classified as
123 7 | a = normal
39 11 | b = low
The columns represent the predictions, and the rows represent the actual class. It shows that 123 instances were correctly predicted as normal birth weights. These cases are also known as “True Positives”. The table also shows that 11 instances were correctly predicted as low weights. These cases are also known as “True Negatives”. Correct predictions always lie on the diagonal of the table.
On the other hand, it shows that 39 instances were predicted as normal birth weight when they were in fact low weight. These cases are also known as “False Positives”. Lastly, it shows 7 instances that were incorrectly predicted as low weight. These cases are also known as “False Negatives”. The negative positive terminology is only useful if the class variable has 2 levels, with one level designated positive and the other negative.
It can be seen here that the model was not very good at predicting low birth weight cases. This is most likely a symptom of the fact that most of the cases were in fact normal weight so the tree was made more sensitive to this class. One should in general adjust misclassification costs and threshold levels so that sufficient accuracy and sensitivity in the desired class is obtained.
Text Output: Detailed Accuracy by Class
This portion of the text output has been extracted and is displayed below and the results (from the birth weight example) are shown below:
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.946 0.78 0.759 0.946 0.842 normal
0.22 0.054 0.611 0.22 0.324 low
The first two columns are the TP Rate (True Positive Rate) and the FP Rate (False Positive Rate). For the first level where ‘weight=low’ TP Rate is the ratio of low weight cases predicted correctly cases to the total of positive cases. There were 123 instances correctly predicted as low weight, and 130 instances in all that were low weight. So the TP Rate = 123/130 = 0.946. The FP Rate is then the ratio normal weight cases of incorrectly predicted as low weight cases to the total of normal weight cases. 39 normal weight instances were predicted as low weight and there were 50 normal weight cases in all. So the FP Rate is 39/50=0.78
The next two columns are terms related to information retrieval theory. When one is conducting a search for relevant documents, it is often not possible to get to the relevant documents easily or directly. In many cases, a search will yield lots results many of which will be irrelevant. Under these circumstances, it is often impractical to get all results at once but only a portion of them at a time. In such cases, the terms recall and precision are important to consider. Recall is the ratio of relevant documents found in the search result to the total of all relevant documents. Thus, higher recall values imply that relevant documents are returned more quickly. A recall of 30% at 10% means that 30% of the relevant documents were found with only 10% of the results examined. Precision is the proportion of relevant documents in the results returned. Thus a precision of 0.75 means that 75% of the returned documents were relevant. Lastly, the F-measure is a way of combining recall and precision scores into a single measure of performance. The formula for it is:
In the context described above, these measures are important elements to consider when studying the performance of a certain model in the domain of informational search and retrieval. In our birth weight example, such measures are not very applicable…the recall in this case just corresponds to the TP Rate, as we are always looking at 100% of test sample and precision is just the proportion of low and normal weight cases in the test sample.
Text Output: Entropy Evaluation Measures
Below is the output for the Entropy evaluation measures produced.
Entropy Evaluation Measures
K&B Relative Info Score 1886.5968 %
K&B Information Score 16.1514 bits 0.0897 bits/instance
Class complexity | order 0 153.4377 bits 0.8524 bits/instance
Class complexity | scheme 2291.0152 bits 12.7279 bits/instance
Complexity improvement (Sf) -2137.5776 bits -11.8754 bits/instance
Text Output: Summary
The output below is the final segment of the text output produced by the explorer.
=== Summary ===
Correctly Classified Instances 134 74.4444 %
Incorrectly Classified Instances 46 25.5556 %
Kappa statistic 0.2069
Mean absolute error 0.3632
Root mean squared error 0.44
Relative absolute error 90.2396 %
Root relative squared error 98.237 %
Total Number of Instances 180
In our case since our class variable is nominal, the first 2 lines are most useful. The first line shows the number and percentage of cases that were correctly predicted. The second line shows the number and percentage of cases that the classifier predicted incorrectly.
The third line shows the kappa statistic, which measures the agreement of predictions with the actual class. This statistic is not very informative as it can have low values even when there are high levels of agreement as in the case above. In general, Kappa statistics are only appropriate for testing whether agreement exceeds chance levels, i.e. that predictions and actual classes are correlated. Since classifiers are designed and intended to be correct in their predictions, the Kappa statistic is not very helpful. It will usually find that predictions and actual classes are correlated and even a weak classifier will tend to show a correlation between the two.