Skip to content

Api regional

Summary

All methods share a similar interface:

effector.RegionalPDP(data=X, model=predict)
effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian)
   effector.RegionalShapDP(data=X, model=predict)
effector.RegionalALE(data=X, model=predict)
effector.DerPDP(data=X, model=predict, model_jac=jacobian)

They all have the four methods:

  • .fit()
  • .summary()
  • .plot()
  • .eval()

.fit(features, centering, **method_specific_args)

Fits the regional effect method to the data.

This is the place for customization

The .fit() step can be omitted if you are ok with the default settings. However, if you want more control over the fitting process, you can pass additional arguments to the .fit() method. Check some examples below:

Usage
features = [0, 1]

# customize the space partitioning process
space_partitioner = effector.space_partitioning.Regions(
    heter_pcg_drop_thres=0.3 # percentage drop threshold (default: 0.1),
    max_split_levels=1 # maximum number of split levels (default: 2)
)
regional_method = effector.RegionalPDP(data=X, model=predict)
regional_method.fit(
    features, 
    space_partitioner=space_partitioner,
    centering=True # center the data (default: False
)
regional_method = effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian)
regional_method.fit(features, space_partitioner=space_partitioner)
effector.RegionalShapDP(data=X, model=predict)
regional_method.fit(features, space_partitioner=space_partitioner)
regional_method = effector.RegionalALE(data=X, model=predict)
regional_method.fit(features, space_partitioner=space_partitioner)
regional_method = effector.DerPDP(data=X, model=predict, model_jac=jacobian)
regional_method.fit(features, space_partitioner=space_partitioner)
.summary(feature)

Prints a summary of the partition tree that is found for feature.

Usage
effector.RegionalPDP(data=X, model=predict).summary(0)
effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian).summary(0)
   effector.RegionalShapDP(data=X, model=predict).summary(0)
effector.RegionalALE(data=X, model=predict).summary(0)
effector.DerPDP(data=X, model=predict, model_jac=jacobian).summary(0)
Output
effector.RegionalPDP(data=X, model=predict).summary(0)
 Feature 0 - Full partition tree:
 Node id: 0, name: x_0, heter: 34.79 || nof_instances:  1000 || weight: 1.00
         Node id: 1, name: x_0 | x_1 <= 0.0, heter: 0.09 || nof_instances:  1000 || weight: 1.00
         Node id: 2, name: x_0 | x_1  > 0.0, heter: 0.09 || nof_instances:  1000 || weight: 1.00
 --------------------------------------------------
 Feature 0 - Statistics per tree level:
 Level 0, heter: 34.79
    Level 1, heter: 0.18 || heter drop : 34.61 (units), 99.48% (pcg)
effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian).summary(0)
 Feature 0 - Full partition tree:
 Node id: 0, name: x_0, heter: 93.45 || nof_instances:  1000 || weight: 1.00
         Node id: 1, name: x_0 | x_1 <= 0.0, heter: 0.00 || nof_instances:  1000 || weight: 1.00
         Node id: 2, name: x_0 | x_1  > 0.0, heter: 0.00 || nof_instances:  1000 || weight: 1.00
 --------------------------------------------------
 Feature 0 - Statistics per tree level:
 Level 0, heter: 93.45
         Level 1, heter: 0.00 || heter drop : 93.45 (units), 100.00% (pcg)
effector.RegionalShapDP(data=X, model=predict).summary(0)
Feature 0 - Full partition tree:
Node id: 0, name: x_0, heter: 8.33 || nof_instances:  1000 || weight: 1.00
        Node id: 1, name: x_0 | x_1 <= 0.0, heter: 0.00 || nof_instances:  1000 || weight: 1.00
        Node id: 2, name: x_0 | x_1  > 0.0, heter: 0.00 || nof_instances:  1000 || weight: 1.00
--------------------------------------------------
Feature 0 - Statistics per tree level:
Level 0, heter: 8.33
        Level 1, heter: 0.00 || heter drop : 8.33 (units), 99.94% (pcg)
effector.RegionalALE(data=X, model=predict).summary(0)
 Feature 0 - Full partition tree:
 Node id: 0, name: x_0, heter: 114.57 || nof_instances:  1000 || weight: 1.00
         Node id: 1, name: x_0 | x_1 <= 0.0, heter: 16.48 || nof_instances:  1000 || weight: 1.00
         Node id: 2, name: x_0 | x_1  > 0.0, heter: 17.41 || nof_instances:  1000 || weight: 1.00
 --------------------------------------------------
 Feature 0 - Statistics per tree level:
 Level 0, heter: 114.57
         Level 1, heter: 33.89 || heter drop : 80.68 (units), 70.42% (pcg)
effector.DerPDP(data=X, model=predict, model_jac=jacobian).summary(0)
 Feature 0 - Full partition tree:
 Node id: 0, name: x_0, heter: 100.00 || nof_instances:  1000 || weight: 1.00
         Node id: 1, name: x_0 | x_1 <= 0.0, heter: 0.00 || nof_instances:  1000 || weight: 1.00
         Node id: 2, name: x_0 | x_1  > 0.0, heter: 0.00 || nof_instances:  1000 || weight: 1.00
 --------------------------------------------------
 Feature 0 - Statistics per tree level:
 Level 0, heter: 100.00
         Level 1, heter: 0.00 || heter drop : 100.00 (units), 100.00% (pcg)
.plot(feature, node_idx)

Plots the regional effect of the feature feature at the node node_idx.

Usage
regional_effect = effector.RegionalPDP(data=X, model=predict)
[regional_effect.plot(0, node_idx) for node_idx in [1, 2]]
regional_effect = effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian)
[regional_effect.plot(0, node_idx) for node_idx in [1, 2]]
regional_effect = effector.RegionalShapDP(data=X, model=predict)
[regional_effect.plot(0, node_idx) for node_idx in [1, 2]]
regional_effect = effector.RegionalALE(data=X, model=predict)
[regional_effect.plot(0, node_idx) for node_idx in [1, 2]]
regional_effect = effector.DerPDP(data=X, model=predict, model_jac=jacobian)
[regional_effect.plot(0, node_idx) for node_idx in [1, 2]]
Output
node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
Alt text Alt text
node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
Alt text Alt text
node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
Alt text Alt text
node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
Alt text Alt text
node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
Alt text Alt text
.eval(feature, node_idx, xs)

Evaluate the regional effect at a specific grid of points.

Usage
# Example input
feature = 0
node_idx = 1
xs = np.linspace(-1, 1, 100)
regional_effect = effector.RegionalPDP(data=X, model=predict)
y, het = regional_effect.eval(feature, node_idx, xs)
regional_effect = effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian)
y, het = regional_effect.eval(feature, node_idx, xs)
regional_effect = effector.RegionalShapDP(data=X, model=predict)
y, het = regional_effect.eval(feature, node_idx, xs)
regional_effect = effector.RegionalALE(data=X, model=predict)
y, het = regional_effect.eval(feature, node_idx, xs)
regional_effect = effector.DerPDP(data=X, model=predict, model_jac=jacobian)
y, het = regional_effect.eval(feature, node_idx, xs)

API

Constructor for the RegionalEffect class.

Methods:

Name Description
eval

👉 Evaluate the regional effect for a given feature and node.

summary

Summarize the partition tree for the selected features.

Source code in effector/regional_effect.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def __init__(
    self,
    method_name: str,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    data_effect: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 10_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_types: Optional[List] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
) -> None:
    """
    Constructor for the RegionalEffect class.
    """
    assert data.ndim == 2

    self.method_name = method_name.lower()
    self.model = model
    self.model_jac = model_jac

    self.dim = data.shape[1]

    # data preprocessing (i): if axis_limits passed manually,
    # keep only the points within,
    # otherwise, compute the axis limits from the data
    if axis_limits is not None:
        assert axis_limits.shape == (2, self.dim)
        assert np.all(axis_limits[0, :] <= axis_limits[1, :])

        # drop points outside of limits
        accept_indices = helpers.indices_within_limits(data, axis_limits)
        data = data[accept_indices, :]
        data_effect = data_effect[accept_indices, :] if data_effect is not None else None
    else:
        axis_limits = helpers.axis_limits_from_data(data)
    self.axis_limits: np.ndarray = axis_limits


    # data preprocessing (ii): select nof_instances from the remaining data
    self.nof_instances, self.indices = helpers.prep_nof_instances(nof_instances, data.shape[0])
    data = data[self.indices, :]
    data_effect = data_effect[self.indices, :] if data_effect is not None else None

    # store the data
    self.data: np.ndarray = data
    self.data_effect: Optional[np.ndarray] = data_effect

    # set feature types
    self.cat_limit = cat_limit
    feature_types = (
        utils.get_feature_types(data, cat_limit)
        if feature_types is None
        else feature_types
    )
    self.feature_types: list = feature_types

    # set feature names
    feature_names: list[str] = (
        helpers.get_feature_names(axis_limits.shape[1])
        if feature_names is None
        else feature_names
    )
    self.feature_names: list = feature_names

    # set target name
    self.target_name = "y" if target_name is None else target_name

    # state variables
    self.is_fitted: np.ndarray = np.ones([self.dim]) < 0

    # parameters used when fitting the regional effect
    # self.method_args: typing.Dict = {}
    self.kwargs_subregion_detection: typing.Dict = {} # subregion specific arguments
    self.kwargs_fitting: typing.Dict = {} # fitting specific arguments

    # dictionary with all the information required for plotting or evaluating the regional effects
    self.partitioners: typing.Dict[str, Best] = {}
    # self.tree_full: typing.Dict[str, Tree] = {}
    self.tree: typing.Dict[str, Tree] = {}

eval(feature, node_idx, xs, heterogeneity=False, centering=True)

👉 Evaluate the regional effect for a given feature and node.

Example usage
axis_limits = ... # define the axis limits
xs = np.linspace(axis_limits[0], axis_limits[1], 100)
effector.RegionalPDP(data=X, model=predict).eval(0, 0, xs, centering=True)
effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian).eval(0, 0, xs, centering=True)
effector.RegionalALE(data=X, model=predict).eval(0, 0, xs, centering=True)
   effector.RegionalShapDP(data=X, model=predict).eval(0, 0, xs, centering=True)
   effector.DerPDP(data=X, model=predict, model_jac=jacobian).eval(0, 0, xs, centering=False)

This is a common method for all regional effect methods, so use the arguments carefully.

  • centering=True is a good option for most methods, but not for all.
    • DerPDP, use centering=False
    • [RegionalPDP, RegionalShapDP], it depends on you 😎
    • [RegionalALE, RegionalRHALE], use centering=True

The heterogeneity argument changes the return value of the function.

  • If heterogeneity=False, the function returns y
  • If heterogeneity=True, the function returns a tuple (y, std)

Parameters:

Name Type Description Default
feature int

index of the feature

required
node_idx int

index of the node

required
xs ndarray

horizontal grid of points to evaluate on

required
heterogeneity bool

whether to return the heterogeneity.

  • if heterogeneity=False, the function returns y, a numpy array of the mean effect at grid points xs
  • If heterogeneity=True, the function returns (y, std) where y is the mean effect and std is the standard deviation of the mean effect at grid points xs
False
centering Union[bool, str]

whether to center the regional effect. The following options are available:

  • If centering is False, the regional effect is not centered
  • If centering is True or zero_integral, the regional effect is centered around the y axis.
  • If centering is zero_start, the regional effect starts from y=0.
True

Returns:

Type Description
Union[ndarray, Tuple[ndarray, ndarray]]

the mean effect y, if heterogeneity=False (default) or a tuple (y, std) otherwise

Source code in effector/regional_effect.py
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def eval(
        self,
        feature: int,
        node_idx: int,
        xs: np.ndarray,
        heterogeneity: bool = False,
        centering: Union[bool, str] = True,
) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]:
    """
    :point_right: Evaluate the regional effect for a given feature and node.

    ??? Example "Example usage"

        ```python
        axis_limits = ... # define the axis limits
        xs = np.linspace(axis_limits[0], axis_limits[1], 100)
        ```

        === "PDP"

            ```python
            effector.RegionalPDP(data=X, model=predict).eval(0, 0, xs, centering=True)
            ```

        === "RHALE"

            ```python
            effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian).eval(0, 0, xs, centering=True)
            ```

        === "ALE"

            ```python
            effector.RegionalALE(data=X, model=predict).eval(0, 0, xs, centering=True)
            ```

        === "ShapDP"

             ```python
                effector.RegionalShapDP(data=X, model=predict).eval(0, 0, xs, centering=True)
             ```

        === "DerPDP"

             ```python
                effector.DerPDP(data=X, model=predict, model_jac=jacobian).eval(0, 0, xs, centering=False)
             ```


    !!! note "This is a common method for all regional effect methods, so use the arguments carefully."

        - `centering=True` is a good option for most methods, but not for all.
            - `DerPDP`, use `centering=False`
            - `[RegionalPDP, RegionalShapDP]`, it depends on you :sunglasses:
            - `[RegionalALE, RegionalRHALE]`, use `centering=True`

    !!! note "The `heterogeneity` argument changes the return value of the function."

        - If `heterogeneity=False`, the function returns `y`
        - If `heterogeneity=True`, the function returns a tuple `(y, std)`

    Args:
        feature: index of the feature
        node_idx: index of the node
        xs: horizontal grid of points to evaluate on
        heterogeneity: whether to return the heterogeneity.

              - if `heterogeneity=False`, the function returns `y`, a numpy array of the mean effect at grid points `xs`
              - If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect at grid points `xs`

        centering: whether to center the regional effect. The following options are available:

            - If `centering` is `False`, the regional effect is not centered
            - If `centering` is `True` or `zero_integral`, the regional effect is centered around the `y` axis.
            - If `centering` is `zero_start`, the regional effect starts from `y=0`.

    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    self.refit(feature)
    centering = helpers.prep_centering(centering)

    kwargs = copy.deepcopy(self.kwargs_fitting)
    kwargs['centering'] = centering

    # select only the three out of all
    fe_method = self._create_fe_object(feature, node_idx, None)
    fe_method.fit(features=feature, **kwargs)
    return fe_method.eval(feature, xs, heterogeneity, centering)

summary(features, scale_x_list=None)

Summarize the partition tree for the selected features.

Example usage
effector.RegionalPDP(data=X, model=predict).summary(0)
effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian).summary(0)
effector.RegionalALE(data=X, model=predict).summary(0)
   effector.RegionalShapDP(data=X, model=predict).summary(0)
   effector.DerPDP(data=X, model=predict, model_jac=jacobian).summary(0)
Example output
Feature 0 - Full partition tree:
     Node id: 0, name: x_0, heter: 34.79 || nof_instances:  1000 || weight: 1.00
             Node id: 1, name: x_0 | x_1 <= 0.0, heter: 0.09 || nof_instances:  1000 || weight: 1.00
             Node id: 2, name: x_0 | x_1  > 0.0, heter: 0.09 || nof_instances:  1000 || weight: 1.00
     --------------------------------------------------
     Feature 0 - Statistics per tree level:
     Level 0, heter: 34.79
        Level 1, heter: 0.18 || heter drop : 34.61 (units), 99.48% (pcg)

Parameters:

Name Type Description Default
features List[int]

indices of the features to summarize

required
scale_x_list Optional[List]

list of scaling factors for each feature

  • None, for no scaling
  • [{"mean": 0, "std": 1}, {"mean": 3, "std": 0.1}], to manually scale the features
None
Source code in effector/regional_effect.py
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
def summary(self, features: List[int], scale_x_list: Optional[List] = None):
    """Summarize the partition tree for the selected features.

    ???+ Example "Example usage"

        === "PDP"

            ```python
            effector.RegionalPDP(data=X, model=predict).summary(0)
            ```

        === "RHALE"

            ```python
            effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian).summary(0)
            ```

        === "ALE"

            ```python
            effector.RegionalALE(data=X, model=predict).summary(0)
            ```

        === "ShapDP"

             ```python
                effector.RegionalShapDP(data=X, model=predict).summary(0)
             ```

        === "DerPDP"

             ```python
                effector.DerPDP(data=X, model=predict, model_jac=jacobian).summary(0)
             ```

    ???+ Example "Example output"

        ```python
        Feature 0 - Full partition tree:
             Node id: 0, name: x_0, heter: 34.79 || nof_instances:  1000 || weight: 1.00
                     Node id: 1, name: x_0 | x_1 <= 0.0, heter: 0.09 || nof_instances:  1000 || weight: 1.00
                     Node id: 2, name: x_0 | x_1  > 0.0, heter: 0.09 || nof_instances:  1000 || weight: 1.00
             --------------------------------------------------
             Feature 0 - Statistics per tree level:
             Level 0, heter: 34.79
                Level 1, heter: 0.18 || heter drop : 34.61 (units), 99.48% (pcg)
        ```

    Args:
        features: indices of the features to summarize
        scale_x_list: list of scaling factors for each feature

            - `None`, for no scaling
            - `[{"mean": 0, "std": 1}, {"mean": 3, "std": 0.1}]`, to manually scale the features

    """
    features = helpers.prep_features(features, self.dim)

    for feat in features:
        self.refit(feat)

        feat_str = "feature_{}".format(feat)
        tree_dict = self.tree[feat_str]

        print("\n")
        print("Feature {} - Full partition tree:".format(feat))

        if tree_dict is None:
            print("No splits found for feature {}".format(feat))
        else:
            tree_dict.show_full_tree(scale_x_list=scale_x_list)

        print("-" * 50)
        print("Feature {} - Statistics per tree level:".format(feat))

        if tree_dict is None:
            print("No splits found for feature {}".format(feat))
        else:
            tree_dict.show_level_stats()
        print("\n")

effector.regional_effect_ale.RegionalALE(data, model, nof_instances=100000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model callable

the black-box model, Callable with signature x -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
axis_limits Union[None, ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

100_000 (default) is a good choice; RegionalALE can handle large datasets. 😎

100000
feature_types Union[list, None]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Union[int, None]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Union[list, None]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Union[str, None]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the ALE-based heterogeneity.

plot
Source code in effector/regional_effect_ale.py
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: typing.Union[int, str] = 100_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`100_000` (default) is a good choice; RegionalALE can handle large datasets. :sunglasses:"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    self.global_bin_limits = {}
    self.global_data_effect = {}
    super(RegionalALE, self).__init__(
        "ale",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features, candidate_conditioning_features='all', space_partitioner='best', binning_method='fixed', points_for_mean_heterogeneity=30)

Find subregions by minimizing the ALE-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
required
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

'all'
space_partitioner Union[str, Best]

the space partitioner to use

'best'
binning_method Union[str, Fixed]

must be the Fixed binning method

  • If set to "fixed", the ALE plot will be computed with the default values, which are 20 bins with at least 0 points per bin
  • If you want to change the parameters of the method, you pass an instance of the class effector.binning_methods.Fixed with the desired parameters. For example: Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)
'fixed'
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
Source code in effector/regional_effect_ale.py
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
def fit(
    self,
    features: typing.Union[int, str, list],
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, effector.space_partitioning.Best] = "best",
    binning_method: typing.Union[str, ap.Fixed] = "fixed",
    points_for_mean_heterogeneity: int = 30
):
    """
    Find subregions by minimizing the ALE-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features
        space_partitioner: the space partitioner to use

        binning_method: must be the Fixed binning method

            - If set to `"fixed"`, the ALE plot will be computed with the  default values, which are
            `20` bins with at least `0` points per bin
            - If you want to change the parameters of the method, you pass an instance of the
            class `effector.binning_methods.Fixed` with the desired parameters.
            For example: `Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)`

        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
    """
    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # fit global method
        global_ale = ALE(self.data, self.model, nof_instances="all", axis_limits=self.axis_limits)
        global_ale.fit(features=feat, binning_method=binning_method, centering=False)
        self.global_data_effect["feature_" + str(feat)] = global_ale.data_effect_ale["feature_" + str(feat)]
        self.global_bin_limits["feature_" + str(feat)] = global_ale.bin_limits["feature_" + str(feat)]

        # create heterogeneity function
        heter = self._create_heterogeneity_function(feat, space_partitioner.min_points_per_subregion, points_for_mean_heterogeneity)

        # fit feature
        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["binnning_method"]}

plot(feature, node_idx, heterogeneity=True, centering=True, scale_x_list=None, scale_y=None, y_limits=None, dy_limits=None)

Source code in effector/regional_effect_ale.py
391
392
393
394
395
396
397
398
399
400
401
402
403
404
def plot(
    self,
    feature,
    node_idx,
    heterogeneity=True,
    centering=True,
    scale_x_list=None,
    scale_y=None,
    y_limits=None,
    dy_limits=None,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_ale.RegionalRHALE(data, model, model_jac=None, data_effect=None, nof_instances=100000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model Callable

the black-box model, Callable with signature x -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
model_jac Optional[Callable]

the black-box model's Jacobian, Callable with signature x -> dy_dx where:

  • x: ndarray of shape (N, D)
  • dy_dx: ndarray of shape (N, D)
None
data_effect Optional[ndarray]

The jacobian of the model on the data

  • None, infers the Jacobian internally using model_jac(data) or numerically
  • np.ndarray, to provide the Jacobian directly

When possible, provide the Jacobian directly

Computing the jacobian on the whole dataset can be memory demanding. If you have the jacobian already computed, provide it directly to the constructor.

None
axis_limits Optional[ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

100_000 (default), is a good choice. RHALE can handle large datasets 😎 😎

100000
feature_types Optional[List]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Optional[int]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Optional[List]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Optional[str]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the RHALE-based heterogeneity.

plot
Source code in effector/regional_effect_ale.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    data_effect: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 100_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_types: Optional[List] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        model_jac: the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `dy_dx`: `ndarray` of shape `(N, D)`

        data_effect: The jacobian of the `model` on the `data`

            - `None`, infers the Jacobian internally using `model_jac(data)` or numerically
            - `np.ndarray`, to provide the Jacobian directly

            !!! tip "When possible, provide the Jacobian directly"

                Computing the jacobian on the whole dataset can be memory demanding.
                If you have the jacobian already computed, provide it directly to the constructor.

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`100_000` (default), is a good choice. RHALE can handle large datasets :sunglasses: :sunglasses: "

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalRHALE, self).__init__(
        "rhale",
        data,
        model,
        model_jac,
        data_effect,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features='all', candidate_conditioning_features='all', space_partitioner='best', binning_method='greedy', points_for_mean_heterogeneity=30)

Find subregions by minimizing the RHALE-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
'all'
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

'all'
space_partitioner Union[str, Best]

the space partitioner to use

'best'
binning_method str

the binning method to use.

  • Use "greedy" for using the Greedy binning solution with the default parameters. For custom parameters initialize a binning_methods.Greedy object
  • Use "dp" for using a Dynamic Programming binning solution with the default parameters. For custom parameters initialize a binning_methods.DynamicProgramming object
  • Use "fixed" for using a Fixed binning solution with the default parameters. For custom parameters initialize a binning_methods.Fixed object
'greedy'
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
Source code in effector/regional_effect_ale.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union[str, list] = "all",
    space_partitioner: typing.Union[str, effector.space_partitioning.Best] = "best",
    binning_method: typing.Union[str, ap.Fixed, ap.DynamicProgramming, ap.Greedy,] = "greedy",
    points_for_mean_heterogeneity: int = 30,
):
    """
    Find subregions by minimizing the RHALE-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features
        space_partitioner: the space partitioner to use
        binning_method (str): the binning method to use.

            - Use `"greedy"` for using the Greedy binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Greedy` object
            - Use `"dp"` for using a Dynamic Programming binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.DynamicProgramming` object
            - Use `"fixed"` for using a Fixed binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Fixed` object

        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
    """
    if self.data_effect is None:
        self.compile()

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # find global axis limits
        heter = self._create_heterogeneity_function(
            feat, binning_method, space_partitioner.min_points_per_subregion, points_for_mean_heterogeneity
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k: v for k, v in all_arguments.items() if k in ["binnning_method"]}

plot(feature, node_idx, heterogeneity=True, centering=True, scale_x_list=None, scale_y=None, y_limits=None, dy_limits=None)

Source code in effector/regional_effect_ale.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
def plot(
    self,
    feature,
    node_idx,
    heterogeneity=True,
    centering=True,
    scale_x_list=None,
    scale_y=None,
    y_limits=None,
    dy_limits=None,
):

    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_pdp.RegionalPDP(data, model, nof_instances=10000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalPDPBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model callable

the black-box model, Callable with signature f(x) -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
axis_limits Union[None, ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

10_000 (default), is a good balance between speed and accuracy

10000
feature_types Union[list, None]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Union[int, None]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Union[list, None]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Union[str, None]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the PDP-based heterogeneity.

plot
Source code in effector/regional_effect_pdp.py
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `f(x) -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`10_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalPDP, self).__init__(
        "pdp",
        data,
        model,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features='all', candidate_conditioning_features='all', space_partitioner='best', centering=False, points_for_centering=30, points_for_mean_heterogeneity=30, use_vectorized=True)

Find subregions by minimizing the PDP-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
'all'
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

  • use "all", for all features, e.g. candidate_conditioning_features="all"
  • use a list, for multiple features, e.g. candidate_conditioning_features=[0, 1, 2]
  • it means that for each feature in the feature list, the algorithm will consider applying a split conditioned on each feature in the candidate_conditioning_features list
'all'
space_partitioner Union[str, None]

the method to use for partitioning the space

'best'
centering Union[bool, str]

whether to center the PDP and ICE curves, before computing the heterogeneity

  • If centering is False, the PDP not centered
  • If centering is True or zero_integral, the PDP is centered around the y axis.
  • If centering is zero_start, the PDP starts from y=0.
False
points_for_centering int

number of equidistant points along the feature axis used for centering ICE plots

30
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
use_vectorized bool

whether to use vectorized operations for the PDP and ICE curves

True
Source code in effector/regional_effect_pdp.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, None] = "best",
    centering: typing.Union[bool, str] = False,
    points_for_centering: int = 30,
    points_for_mean_heterogeneity: int = 30,
    use_vectorized: bool = True,
):
    """
    Find subregions by minimizing the PDP-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features

            - use `"all"`, for all features, e.g. `candidate_conditioning_features="all"`
            - use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]`
            - it means that for each feature in the `feature` list, the algorithm will consider applying a split
            conditioned on each feature in the `candidate_conditioning_features` list

        space_partitioner: the method to use for partitioning the space
        centering: whether to center the PDP and ICE curves, before computing the heterogeneity

            - If `centering` is `False`, the PDP not centered
            - If `centering` is `True` or `zero_integral`, the PDP is centered around the `y` axis.
            - If `centering` is `zero_start`, the PDP starts from `y=0`.

        points_for_centering: number of equidistant points along the feature axis used for centering ICE plots
        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves


    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # define the global method
        if self.method_name == "pdp":
            pdp = PDP(self.data, self.model, self.axis_limits, nof_instances="all")
        else:
            pdp = DerPDP(self.data, self.model, self.model_jac, self.axis_limits, nof_instances="all")

        pdp.fit(
            features=feat,
            centering=centering,
            points_for_centering=points_for_centering,
            use_vectorized=use_vectorized,
        )

        xx = np.linspace(self.axis_limits[:, feat][0], self.axis_limits[:, feat][1], points_for_mean_heterogeneity)
        y_ice = pdp.eval(
                feature=feat,
                xs=xx,
                heterogeneity=True,
                use_vectorized=use_vectorized,
                return_all=True
            )
        self.y_ice["feature_" + str(feat)] = y_ice.T

        heter = self._create_heterogeneity_function(
            foi = feat,
            min_points=space_partitioner.min_points_per_subregion,
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}
    self.kwargs_subregion_detection["points_for_mean_heterogeneity"] = points_for_mean_heterogeneity

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["centering", "points_for_centering", "use_vectorized"]}

plot(feature, node_idx, heterogeneity='ice', centering=False, nof_points=30, scale_x_list=None, scale_y=None, nof_ice=100, show_avg_output=False, y_limits=None, use_vectorized=True)

Source code in effector/regional_effect_pdp.py
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
def plot(
    self,
    feature: int,
    node_idx: int,
    heterogeneity: bool = "ice",
    centering: typing.Union[bool, str] = False,
    nof_points: int = 30,
    scale_x_list: typing.Union[None, list] = None,
    scale_y: typing.Union[None, list] = None,
    nof_ice: int = 100,
    show_avg_output: bool = False,
    y_limits: typing.Union[None, list] = None,
    use_vectorized: bool = True,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_pdp.RegionalDerPDP(data, model, model_jac=None, nof_instances=10000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalPDPBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model callable

the black-box model, Callable with signature x -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
model_jac Optional[callable]

the black-box model's Jacobian, Callable with signature x -> dy_dx where:

  • x: ndarray of shape (N, D)
  • dy_dx: ndarray of shape (N, D)
None
axis_limits Union[None, ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

10_000 (default), is a good balance between speed and accuracy

10000
feature_types Union[list, None]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Union[int, None]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Union[list, None]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Union[str, None]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the PDP-based heterogeneity.

plot
Source code in effector/regional_effect_pdp.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    model_jac: typing.Optional[callable] = None,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        model_jac: the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `dy_dx`: `ndarray` of shape `(N, D)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`10_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalDerPDP, self).__init__(
        "d-pdp",
        data,
        model,
        model_jac,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features='all', candidate_conditioning_features='all', space_partitioner='best', centering=False, points_for_centering=30, points_for_mean_heterogeneity=30, use_vectorized=True)

Find subregions by minimizing the PDP-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
'all'
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

  • use "all", for all features, e.g. candidate_conditioning_features="all"
  • use a list, for multiple features, e.g. candidate_conditioning_features=[0, 1, 2]
  • it means that for each feature in the feature list, the algorithm will consider applying a split conditioned on each feature in the candidate_conditioning_features list
'all'
space_partitioner Union[str, None]

the method to use for partitioning the space

'best'
centering Union[bool, str]

whether to center the PDP and ICE curves, before computing the heterogeneity

  • If centering is False, the PDP not centered
  • If centering is True or zero_integral, the PDP is centered around the y axis.
  • If centering is zero_start, the PDP starts from y=0.
False
points_for_centering int

number of equidistant points along the feature axis used for centering ICE plots

30
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
use_vectorized bool

whether to use vectorized operations for the PDP and ICE curves

True
Source code in effector/regional_effect_pdp.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, None] = "best",
    centering: typing.Union[bool, str] = False,
    points_for_centering: int = 30,
    points_for_mean_heterogeneity: int = 30,
    use_vectorized: bool = True,
):
    """
    Find subregions by minimizing the PDP-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features

            - use `"all"`, for all features, e.g. `candidate_conditioning_features="all"`
            - use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]`
            - it means that for each feature in the `feature` list, the algorithm will consider applying a split
            conditioned on each feature in the `candidate_conditioning_features` list

        space_partitioner: the method to use for partitioning the space
        centering: whether to center the PDP and ICE curves, before computing the heterogeneity

            - If `centering` is `False`, the PDP not centered
            - If `centering` is `True` or `zero_integral`, the PDP is centered around the `y` axis.
            - If `centering` is `zero_start`, the PDP starts from `y=0`.

        points_for_centering: number of equidistant points along the feature axis used for centering ICE plots
        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves


    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # define the global method
        if self.method_name == "pdp":
            pdp = PDP(self.data, self.model, self.axis_limits, nof_instances="all")
        else:
            pdp = DerPDP(self.data, self.model, self.model_jac, self.axis_limits, nof_instances="all")

        pdp.fit(
            features=feat,
            centering=centering,
            points_for_centering=points_for_centering,
            use_vectorized=use_vectorized,
        )

        xx = np.linspace(self.axis_limits[:, feat][0], self.axis_limits[:, feat][1], points_for_mean_heterogeneity)
        y_ice = pdp.eval(
                feature=feat,
                xs=xx,
                heterogeneity=True,
                use_vectorized=use_vectorized,
                return_all=True
            )
        self.y_ice["feature_" + str(feat)] = y_ice.T

        heter = self._create_heterogeneity_function(
            foi = feat,
            min_points=space_partitioner.min_points_per_subregion,
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}
    self.kwargs_subregion_detection["points_for_mean_heterogeneity"] = points_for_mean_heterogeneity

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["centering", "points_for_centering", "use_vectorized"]}

plot(feature, node_idx=0, heterogeneity='ice', centering=False, nof_points=30, scale_x_list=None, scale_y=None, nof_ice=100, show_avg_output=False, dy_limits=None, use_vectorized=True)

Source code in effector/regional_effect_pdp.py
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
def plot(
    self,
    feature: int,
    node_idx: int = 0,
    heterogeneity: bool = "ice",
    centering: typing.Union[bool, str] = False,
    nof_points: int = 30,
    scale_x_list: typing.Union[None, list] = None,
    scale_y: typing.Union[None, list] = None,
    nof_ice: int = 100,
    show_avg_output: bool = False,
    dy_limits: typing.Union[None, list] = None,
    use_vectorized: bool = True,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_shap.RegionalShapDP(data, model, axis_limits=None, nof_instances=1000, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model Callable

the black-box model, Callable with signature f(x) -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
axis_limits Optional[ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

1_000 (default), is a good balance between speed and accuracy

1000
feature_types Optional[List[str]]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Optional[int]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Optional[List[str]]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Optional[str]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Fit the regional SHAP.

plot

Plot the regional SHAP.

Source code in effector/regional_effect_shap.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    axis_limits: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 1_000,
    feature_types: Optional[List[str]] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List[str]] = None,
    target_name: Optional[str] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `f(x) -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`1_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """
    self.global_shap_values = None
    super(RegionalShapDP, self).__init__(
        "shap",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features, candidate_conditioning_features='all', space_partitioner='best', binning_method='greedy')

Fit the regional SHAP.

Parameters:

Name Type Description Default
features Union[int, str, list]

the features to fit. - If set to "all", all the features will be fitted.

required
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features for the candidate splits - If set to "all", all the features will be considered as conditioning features.

'all'
space_partitioner Union[str, Best]

the space partitioner to use - If set to "greedy", the greedy space partitioner will be used.

'best'
binning_method Union[str, Greedy, Fixed]

the binning method to use

'greedy'
Source code in effector/regional_effect_shap.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def fit(
    self,
    features: typing.Union[int, str, list],
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union["str", effector.space_partitioning.Best] = "best",
    binning_method: Union[str, ap.Greedy, ap.Fixed] = "greedy",
):
    """
    Fit the regional SHAP.

    Args:
        features: the features to fit.
            - If set to "all", all the features will be fitted.

        candidate_conditioning_features: list of features to consider as conditioning features for the candidate splits
            - If set to "all", all the features will be considered as conditioning features.

        space_partitioner: the space partitioner to use
            - If set to "greedy", the greedy space partitioner will be used.

        binning_method: the binning method to use
    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)

    for feat in tqdm(features):
        # assert global SHAP values are available
        if self.global_shap_values is None:
            global_shap_dp = effector.ShapDP(self.data, self.model, self.axis_limits, "all")
            global_shap_dp.fit(feat, centering=False, binning_method=binning_method)
            self.global_shap_values = global_shap_dp.shap_values

        heter = self._create_heterogeneity_function(feat, space_partitioner.min_points_per_subregion, binning_method)

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # fit kwargs
    self.kwargs_fitting = {"binning_method": binning_method}

plot(feature, node_idx, heterogeneity='shap_values', centering=True, nof_points=30, scale_x_list=None, scale_y=None, nof_shap_values='all', show_avg_output=False, y_limits=None, only_shap_values=False)

Plot the regional SHAP.

Parameters:

Name Type Description Default
feature

the feature to plot

required
node_idx

the index of the node to plot

required
heterogeneity

whether to plot the heterogeneity

'shap_values'
centering

whether to center the SHAP values

True
nof_points

number of points to plot

30
scale_x_list

the list of scaling factors for the feature names

None
scale_y

the scaling factor for the SHAP values

None
nof_shap_values

number of SHAP values to plot

'all'
show_avg_output

whether to show the average output

False
y_limits

the limits of the y-axis

None
only_shap_values

whether to plot only the SHAP values

False
Source code in effector/regional_effect_shap.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
def plot(self,
         feature,
         node_idx,
         heterogeneity="shap_values",
         centering=True,
         nof_points=30,
         scale_x_list=None,
         scale_y=None,
         nof_shap_values='all',
         show_avg_output=False,
         y_limits=None,
         only_shap_values=False
):
    """
    Plot the regional SHAP.

    Args:
        feature: the feature to plot
        node_idx: the index of the node to plot
        heterogeneity: whether to plot the heterogeneity
        centering: whether to center the SHAP values
        nof_points: number of points to plot
        scale_x_list: the list of scaling factors for the feature names
        scale_y: the scaling factor for the SHAP values
        nof_shap_values: number of SHAP values to plot
        show_avg_output: whether to show the average output
        y_limits: the limits of the y-axis
        only_shap_values: whether to plot only the SHAP values
    """
    kwargs = locals()
    kwargs.pop("self")
    return self._plot(kwargs)