Skip to content

effector.space_partitioning.Best(min_heterogeneity_decrease_pcg=0.1, heter_small_enough=0.001, max_depth=2, min_samples_leaf=10, numerical_features_grid_size=20, search_partitions_when_categorical=False)

Bases: Base

Choose the algorithm Cart. The algorithm is a greedy algorithm that finds the best split for each level in a greedy fashion.

Parameters:

Name Type Description Default
min_heterogeneity_decrease_pcg float

Minimum percentage of heterogeneity decrease to accept a split.

Example
  • 0.1: if the heterogeneity before any split is 1, the heterogeneity after the first split must be at most 0.9 to be accepted. Otherwise, no split will be accepted.
0.1
heter_small_enough float

When heterogeneity is smaller than this value, no more splits are performed.

Default is 0.001

Value 0.001 is small enough for most cases. It is advisable to set this value to a small number to avoid unnecessary splits.

Custom value

If you know a priori that a specific heterogeneity value is small enough, you can set this parameter to a higher value than the default.

0.001
max_depth int

Maximum number of splits to perform

Default is 2

2 splits already create 4 subregions, i.e. 4 regional plots per feature, which are already enough. Setting this value to a higher number will increase the number of subregions and plots, which may be too much for the user to analyze.

2
min_samples_leaf int

Minimum number of instances per subregion

Default is 10

If a subregion has less than 10 instances, it may not be representative enough to be analyzed.

10
numerical_features_grid_size int

Number of candidate split positions for numerical features

Default is 20

For numerical features, the algorithm will create a grid of 20 equally spaced values between the minimum and maximum values of the feature.

20
search_partitions_when_categorical bool

Whether to search for partitions when the feature is categorical

refers to a categorical feature of interest

This argument asks whether to search for partitions when the feature of interest is categorical. If the feature of interest is numerical, the algorithm will always search for partitions and will consider categorical features for conditioning.

Default is False

It is difficult to compute the heterogeneity for categorical features, so by default, the algorithm will not search for partitions when the feature of interest is categorical.

False
Source code in effector/space_partitioning.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
def __init__(
        self,
        min_heterogeneity_decrease_pcg: float = 0.1,
        heter_small_enough: float = 0.001,
        max_depth: int = 2,
        min_samples_leaf: int = 10,
        numerical_features_grid_size: int = 20,
        search_partitions_when_categorical: bool = False,
):
    """Choose the algorithm `Cart`.
    The algorithm is a greedy algorithm that finds the best split for each level in a greedy fashion.


    Args:
        min_heterogeneity_decrease_pcg: Minimum percentage of heterogeneity decrease to accept a split.

            ??? Example "Example"
                - `0.1`: if the heterogeneity before any split is 1, the heterogeneity after the first split must be at most 0.9 to be accepted. Otherwise, no split will be accepted.

        heter_small_enough: When heterogeneity is smaller than this value, no more splits are performed.

            ??? Note "Default is `0.001`"
                Value 0.001 is small enough for most cases.
                It is advisable to set this value to a small number to avoid unnecessary splits.

            ??? Note "Custom value"
                If you know a priori that a specific heterogeneity value is small enough,
                you can set this parameter to a higher value than the default.

        max_depth: Maximum number of splits to perform

            ??? Note "Default is `2`"
                2 splits already create 4 subregions, i.e. 4 regional plots per feature, which are already enough.
                Setting this value to a higher number will increase the number of subregions and plots, which may be too much for the user to analyze.

        min_samples_leaf: Minimum number of instances per subregion

            ??? Note "Default is `10`"
                If a subregion has less than 10 instances, it may not be representative enough to be analyzed.

        numerical_features_grid_size: Number of candidate split positions for numerical features

            ??? Note "Default is `20`"
                For numerical features, the algorithm will create a grid of 20 equally spaced values between the minimum and maximum values of the feature.

        search_partitions_when_categorical: Whether to search for partitions when the feature is categorical

            ??? warning "refers to a categorical feature of interest"
                This argument asks whether to search for partitions when the feature of interest is categorical.
                If the feature of interest is numerical, the algorithm will always search for partitions and will consider
                categorical features for conditioning.

            ??? Note "Default is `False`"
                It is difficult to compute the heterogeneity for categorical features, so by default, the algorithm will not search for partitions when the feature of interest is categorical.

    """
    # setters
    self.min_points_per_subregion = min_samples_leaf
    self.nof_candidate_splits_for_numerical = numerical_features_grid_size
    self.max_split_levels = max_depth
    self.heter_pcg_drop_thres = min_heterogeneity_decrease_pcg
    self.heter_small_enough = heter_small_enough
    self.split_categorical_features = search_partitions_when_categorical

    super().__init__("Cart")