How to calculate threshold value for numeric attributes in Quinlan C4.5 algorithm?

I am trying to find how the C4.5 algorithm determines a threshold value for numeric attributes. I researched and cannot understand, in most places I found this information:

Training samples are first sorted by the values ​​of the Y attribute in question. There is only a finite number of these values, so we denote them in sorted order as {v1, v2, ..., vm}. Any threshold value lying between vi and vi + 1 will have the same effect as dividing cases into those whose attribute value Y lies in {v1, v2, ..., vi} and those whose value is in { vi + 1, vi +2, ..., vm}. Thus, there are only m-1 possible cleavages on Y, all of which must be systematically analyzed to obtain the optimal split.

Usually the midpoint of each interval is selected: (vi + vi + 1) / 2 as a representative threshold. C4.5 selects the threshold vi as the threshold for each interval {vi, vi + 1}, rather than the middle itself.

I am studying the Play / Dont Play example ( table of values ) and don’t understand how you get the number 75 ( tree generated ) for the humidity attribute when the state is sunny, because the humidity values ​​in the sunny state are {70,85,90,95}.

Somebody knows?

+5
source share
3 answers

As you can see from your generated tree, you consider the attributes in order. Your example 75 refers to outlook = sunny branch. If you filter your data according to appearance = sunny, you will get the following table.

outlook temperature humidity    windy   play
sunny   69           70         FALSE   yes
sunny   75           70         TRUE    yes
sunny   85           85         FALSE   no
sunny   80           90         TRUE    no
sunny   72           95         FALSE   no

, "< 75" .

j4.8 ID3. , . wikipedia

The attribute with the smallest entropy 
is used to split the set on this iteration. 
The higher the entropy, 
the higher the potential to improve the classification here.
+4

J48, , C4.5, (.. ). . {70,85,90,95} {70 | 85,90,95} {70,85 | 90,95} {70,85,90 | 95} .

Quinlan C4.5 (https://goo.gl/J2SsPf). ., , . 25.

+2

All Articles