[Back]
Decision Tree

Decision Tree

 

A decision tree is an arrangement of tests that prescribes an appropriate test at every step in an analysis.

 

In general, decision trees represent a disjunction of conjunctions of constraints on the attribute-values of instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself to a disjunction of these conjunctions.

More specifically, decision trees classify instances by sorting them down the tree from the root node to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute.

An instance is classified by starting at the root node of the decision tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute. This process is then repeated at the node on this branch and so on until a leaf node is reached (see figure Decision Tree).

 

Diagram

·         Each non-leaf node is connected to a test that splits its set of possible answers into subsets corresponding to different test results.

·         Each branch carries a particular test result's subset to another node.

·         Each node is connected to a set of possible answers.

 

 

A decision tree creates a model as either a graphical tree or a set of text rules that can predict (classify) a given situation. A decision tree is a model that is both predictive and descriptive. It is called a decision tree because the resulting model is presented in the form of a tree structure. The visual presentation makes the decision tree model very easy to understand and assimilate. As a result, the decision tree has become a very popular data mining technique.

 

For example decision trees graphically display the relationships found in data, i.e. If Income = low and Years on job < 5 Then Credit risk = Poor. Decision tree algorithms are very similar to rule induction algorithms, which produce rule sets without a decision tree.

 

 

Problems suited for Decision Trees:

 

Decision trees are best suited for problems with the following characteristics:

 

·         Decision trees are most commonly used for classification (predicting what group a case belongs to), but can also be used for regression (predicting a specific value).

 

·         When instances are represented by attribute-value pairs.

·         Instances are described by a fixed set of attributes (e.g., temperature) and their values (e.g., hot).

·         The easiest situation for decision tree learning occurs when each attribute takes on a small number of disjoint possible values (e.g., hot, mild, cold).

·         Extensions to the basic algorithm allow handling real-valued attributes as well (e.g., a floating point temperature).

 

·         When the target function has discrete output values.

·         A decision tree assigns a classification to each example.

·         Simplest case exists when there are only two possible classes (Boolean classification).

·         Decision tree methods can also be easily extended to learning functions with more than two possible output values.

·         A more substantial extension allows learning target functions with real-valued outputs, although the application of decision trees in this setting is less common.

 

·         When disjunctive descriptions may be required.

·         Decision trees naturally represent disjunctive expressions.

 

·         When the training data may contain errors.

·         Decision tree learning methods are robust to errors - both errors in classifications of the training examples and errors in the attribute values that describe these examples.

 

·         The training data may contain missing attribute values. Decision tree methods can be used even when some training examples have unknown values

 

·         When you need to make a decision based on several factors: (i.e. Insurance underwriter determining the cost of damage, determining bankruptcy risk, determining stock portfolio risk)

 

 

[Back]