Skip to content

Api regional

Summary

All regional effect methods have a similar interface and workflow:

  1. create an instance of the regional effect method you want to use
  2. (optional) .fit() to customize the method
  3. .summary() to print the partition tree found for each feature
  4. .plot() to plot the regional effect of a feature at a specific node
  5. .eval() to evaluate the regional effect of a feature at a specific node at a specific grid of points

Usage

# set up the input
X = ... # input data
predict = ... # model to be explained
jacobian = ... # jacobian of the model
  1. Create an instance of the regional effect method you want to use:

    effector.RegionalPDP(data=X, model=predict)
    
    effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian)
    
    effector.RegionalShapDP(data=X, model=predict)
    
    effector.RegionalALE(data=X, model=predict)
    
    effector.DerPDP(data=X, model=predict, model_jac=jacobian)
    
  2. Customize the regional effect method (optional):

    .fit(features, **method_specific_args)

    This is the place for customization

    The .fit() step can be omitted if you are ok with the default settings; you can directly call the .summary(), .plot(), or .eval() methods. However, if you want more control over the fitting process, you can pass additional arguments to the .fit() method. Check the Usage section below and the method-specific documentation for more information.

    Usage
    # customize the space partitioning algorithm
    space_partitioner = effector.space_partitioning.Best(
        heter_pcg_drop_thres=0.3 # percentage drop threshold (default: 0.1),
        max_split_levels=1 # maximum number of split levels (default: 2)
    )
    r_method.fit(
        features=[0, 1], # list of features to be analyzed
        space_partitioner=space_partitioner # space partitioning algorithm (default: effector.space_partitioning.Best)
    )
    
  3. Print the partition tree found for each feature in features:

    .summary(features)

    Usage
    features = [...] # list of features to be analyzed
    r_method.summary(features)
    
    Example Output
    Feature 3 - Full partition tree:
    🌳 Full Tree Structure:
    ───────────────────────
    
    hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
        workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
            temp  6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
            temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
        workingday  0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
            temp  6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
            temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
    --------------------------------------------------
    Feature 3 - Statistics per tree level:
    🌳 Tree Summary:
    ─────────────────
    Level 0🔹heter: 0.43
        Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
            Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)
    
  4. Plot the regional effect of a feature at a specific node:

    .plot(feature, node_idx)

    Usage
    feature = ...
    node_idx = ...
    r_method.plot(feature, node_idx, **plot_specific_args)
    
    Output
    node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
    r_method.plot(0, 1) r_method.plot(0, 2)
    Alt text Alt text
    node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
    r_method.plot(0, 1) r_method.plot(0, 2)
    Alt text Alt text
    node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
    r_method.plot(0, 1) r_method.plot(0, 2)
    Alt text Alt text
    node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
    r_method.plot(0, 1) r_method.plot(0, 2)
    Alt text Alt text
    node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)
    r_method.plot(0, 1) r_method.plot(0, 2)
    Alt text Alt text
  5. Evaluate the regional effect of a feature at a specific node at a specific grid of points:

    .eval(feature, node_idx, xs)

    Usage
    # Example input
    feature = ... # feature to be analyzed
    node_idx = ... # node index
    xs = ... # grid of points to evaluate the regional effect, e.g., np.linspace(0, 1, 100)
    
    y, het = r_method.eval(feature, node_idx, xs)
    

API

Constructor for the RegionalEffect class.

Methods:

Name Description
eval

👉 Evaluate the regional effect for a given feature and node.

summary

👉 Summarize the partition tree for the selected features.

Source code in effector/regional_effect.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def __init__(
    self,
    method_name: str,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    data_effect: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 10_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_types: Optional[List] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
) -> None:
    """
    Constructor for the RegionalEffect class.
    """
    assert data.ndim == 2

    self.method_name = method_name.lower()
    self.model = model
    self.model_jac = model_jac

    self.dim = data.shape[1]

    # data preprocessing (i): if axis_limits passed manually,
    # keep only the points within,
    # otherwise, compute the axis limits from the data
    if axis_limits is not None:
        assert axis_limits.shape == (2, self.dim)
        assert np.all(axis_limits[0, :] <= axis_limits[1, :])

        # drop points outside of limits
        accept_indices = helpers.indices_within_limits(data, axis_limits)
        data = data[accept_indices, :]
        data_effect = data_effect[accept_indices, :] if data_effect is not None else None
    else:
        axis_limits = helpers.axis_limits_from_data(data)
    self.axis_limits: np.ndarray = axis_limits


    # data preprocessing (ii): select nof_instances from the remaining data
    self.nof_instances, self.indices = helpers.prep_nof_instances(nof_instances, data.shape[0])
    data = data[self.indices, :]
    data_effect = data_effect[self.indices, :] if data_effect is not None else None

    # store the data
    self.data: np.ndarray = data
    self.data_effect: Optional[np.ndarray] = data_effect

    # set feature types
    self.cat_limit = cat_limit
    feature_types = (
        utils.get_feature_types(data, cat_limit)
        if feature_types is None
        else feature_types
    )
    self.feature_types: list = feature_types

    # set feature names
    feature_names: list[str] = (
        helpers.get_feature_names(axis_limits.shape[1])
        if feature_names is None
        else feature_names
    )
    self.feature_names: list = feature_names

    # set target name
    self.target_name = "y" if target_name is None else target_name

    # state variables
    self.is_fitted: np.ndarray = np.ones([self.dim]) < 0

    # parameters used when fitting the regional effect
    # self.method_args: typing.Dict = {}
    self.kwargs_subregion_detection: typing.Dict = {} # subregion specific arguments
    self.kwargs_fitting: typing.Dict = {} # fitting specific arguments

    # dictionary with all the information required for plotting or evaluating the regional effects
    self.partitioners: typing.Dict[str, Best] = {}
    # self.tree_full: typing.Dict[str, Tree] = {}
    self.tree: typing.Dict[str, Tree] = {}

eval(feature, node_idx, xs, heterogeneity=False, centering=True)

👉 Evaluate the regional effect for a given feature and node.

This is a common method for all regional effect methods, so use the arguments carefully.

  • centering=True is a good option for most methods, but not for all.
    • DerPDP, use centering=False
    • [RegionalPDP, RegionalShapDP], it depends on you 😎
    • [RegionalALE, RegionalRHALE], use centering=True

The heterogeneity argument changes the return value of the function.

  • If heterogeneity=False, the function returns y
  • If heterogeneity=True, the function returns a tuple (y, std)

Parameters:

Name Type Description Default
feature int

index of the feature

required
node_idx int

index of the node

required
xs ndarray

horizontal grid of points to evaluate on

required
heterogeneity bool

whether to return the heterogeneity.

  • if heterogeneity=False, the function returns y, a numpy array of the mean effect at grid points xs
  • If heterogeneity=True, the function returns (y, std) where y is the mean effect and std is the standard deviation of the mean effect at grid points xs
False
centering Union[bool, str]

whether to center the regional effect. The following options are available:

  • If centering is False, the regional effect is not centered
  • If centering is True or zero_integral, the regional effect is centered around the y axis.
  • If centering is zero_start, the regional effect starts from y=0.
True

Returns:

Type Description
Union[ndarray, Tuple[ndarray, ndarray]]

the mean effect y, if heterogeneity=False (default) or a tuple (y, std) otherwise

Source code in effector/regional_effect.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
def eval(
        self,
        feature: int,
        node_idx: int,
        xs: np.ndarray,
        heterogeneity: bool = False,
        centering: Union[bool, str] = True,
) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]:
    """
    :point_right: Evaluate the regional effect for a given feature and node.

    !!! note "This is a common method for all regional effect methods, so use the arguments carefully."

        - `centering=True` is a good option for most methods, but not for all.
            - `DerPDP`, use `centering=False`
            - `[RegionalPDP, RegionalShapDP]`, it depends on you :sunglasses:
            - `[RegionalALE, RegionalRHALE]`, use `centering=True`

    !!! note "The `heterogeneity` argument changes the return value of the function."

        - If `heterogeneity=False`, the function returns `y`
        - If `heterogeneity=True`, the function returns a tuple `(y, std)`

    Args:
        feature: index of the feature
        node_idx: index of the node
        xs: horizontal grid of points to evaluate on
        heterogeneity: whether to return the heterogeneity.

              - if `heterogeneity=False`, the function returns `y`, a numpy array of the mean effect at grid points `xs`
              - If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect at grid points `xs`

        centering: whether to center the regional effect. The following options are available:

            - If `centering` is `False`, the regional effect is not centered
            - If `centering` is `True` or `zero_integral`, the regional effect is centered around the `y` axis.
            - If `centering` is `zero_start`, the regional effect starts from `y=0`.

    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    self.refit(feature)
    centering = helpers.prep_centering(centering)

    kwargs = copy.deepcopy(self.kwargs_fitting)
    kwargs['centering'] = centering

    # select only the three out of all
    fe_method = self._create_fe_object(feature, node_idx, None)
    fe_method.fit(features=feature, **kwargs)
    return fe_method.eval(feature, xs, heterogeneity, centering)

summary(features, scale_x_list=None)

👉 Summarize the partition tree for the selected features.

Example output
Feature 3 - Full partition tree:
🌳 Full Tree Structure:
───────────────────────
hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
    workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
        temp  6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
        temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
    workingday  0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
        temp  6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
        temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
--------------------------------------------------
Feature 3 - Statistics per tree level:
🌳 Tree Summary:
─────────────────
Level 0🔹heter: 0.43
    Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
        Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)

Parameters:

Name Type Description Default
features List[int]

indices of the features to summarize

required
scale_x_list Optional[List]

list of scaling factors for each feature

  • None, for no scaling
  • [{"mean": 0, "std": 1}, {"mean": 3, "std": 0.1}], to manually scale the features
None
Source code in effector/regional_effect.py
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
def summary(self, features: List[int], scale_x_list: Optional[List] = None):
    """:point_right: Summarize the partition tree for the selected features.

    ???+ Example "Example output"

        ```python
        Feature 3 - Full partition tree:
        🌳 Full Tree Structure:
        ───────────────────────
        hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
            workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
                temp ≤ 6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
                temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
            workingday ≠ 0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
                temp ≤ 6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
                temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
        --------------------------------------------------
        Feature 3 - Statistics per tree level:
        🌳 Tree Summary:
        ─────────────────
        Level 0🔹heter: 0.43
            Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
                Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)
        ```

    Args:
        features: indices of the features to summarize
        scale_x_list: list of scaling factors for each feature

            - `None`, for no scaling
            - `[{"mean": 0, "std": 1}, {"mean": 3, "std": 0.1}]`, to manually scale the features

    """
    features = helpers.prep_features(features, self.dim)

    for feat in features:
        self.refit(feat)

        feat_str = "feature_{}".format(feat)
        tree_dict = self.tree[feat_str]

        print("\n")
        print("Feature {} - Full partition tree:".format(feat))

        if tree_dict is None:
            print("No splits found for feature {}".format(feat))
        else:
            tree_dict.show_full_tree(scale_x_list=scale_x_list)

        print("-" * 50)
        print("Feature {} - Statistics per tree level:".format(feat))

        if tree_dict is None:
            print("No splits found for feature {}".format(feat))
        else:
            tree_dict.show_level_stats()
        print("\n")

effector.regional_effect_ale.RegionalALE(data, model, nof_instances=100000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model callable

the black-box model, Callable with signature x -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
axis_limits Union[None, ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

100_000 (default) is a good choice; RegionalALE can handle large datasets. 😎

100000
feature_types Union[list, None]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Union[int, None]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Union[list, None]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Union[str, None]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the ALE-based heterogeneity.

plot
Source code in effector/regional_effect_ale.py
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: typing.Union[int, str] = 100_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`100_000` (default) is a good choice; RegionalALE can handle large datasets. :sunglasses:"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    self.global_bin_limits = {}
    self.global_data_effect = {}
    super(RegionalALE, self).__init__(
        "ale",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features, candidate_conditioning_features='all', space_partitioner='best', binning_method='fixed', points_for_mean_heterogeneity=30)

Find subregions by minimizing the ALE-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
required
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

'all'
space_partitioner Union[str, Best]

the space partitioner to use

'best'
binning_method Union[str, Fixed]

must be the Fixed binning method

  • If set to "fixed", the ALE plot will be computed with the default values, which are 20 bins with at least 0 points per bin
  • If you want to change the parameters of the method, you pass an instance of the class effector.binning_methods.Fixed with the desired parameters. For example: Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)
'fixed'
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
Source code in effector/regional_effect_ale.py
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
def fit(
    self,
    features: typing.Union[int, str, list],
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, effector.space_partitioning.Best] = "best",
    binning_method: typing.Union[str, ap.Fixed] = "fixed",
    points_for_mean_heterogeneity: int = 30
):
    """
    Find subregions by minimizing the ALE-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features
        space_partitioner: the space partitioner to use

        binning_method: must be the Fixed binning method

            - If set to `"fixed"`, the ALE plot will be computed with the  default values, which are
            `20` bins with at least `0` points per bin
            - If you want to change the parameters of the method, you pass an instance of the
            class `effector.binning_methods.Fixed` with the desired parameters.
            For example: `Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)`

        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
    """
    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # fit global method
        global_ale = ALE(self.data, self.model, nof_instances="all", axis_limits=self.axis_limits)
        global_ale.fit(features=feat, binning_method=binning_method, centering=False)
        self.global_data_effect["feature_" + str(feat)] = global_ale.data_effect_ale["feature_" + str(feat)]
        self.global_bin_limits["feature_" + str(feat)] = global_ale.bin_limits["feature_" + str(feat)]

        # create heterogeneity function
        heter = self._create_heterogeneity_function(feat, space_partitioner.min_points_per_subregion, points_for_mean_heterogeneity)

        # fit feature
        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["binnning_method"]}

plot(feature, node_idx, heterogeneity=True, centering=True, scale_x_list=None, scale_y=None, y_limits=None, dy_limits=None)

Source code in effector/regional_effect_ale.py
391
392
393
394
395
396
397
398
399
400
401
402
403
404
def plot(
    self,
    feature,
    node_idx,
    heterogeneity=True,
    centering=True,
    scale_x_list=None,
    scale_y=None,
    y_limits=None,
    dy_limits=None,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_ale.RegionalRHALE(data, model, model_jac=None, data_effect=None, nof_instances=100000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model Callable

the black-box model, Callable with signature x -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
model_jac Optional[Callable]

the black-box model's Jacobian, Callable with signature x -> dy_dx where:

  • x: ndarray of shape (N, D)
  • dy_dx: ndarray of shape (N, D)
None
data_effect Optional[ndarray]

The jacobian of the model on the data

  • None, infers the Jacobian internally using model_jac(data) or numerically
  • np.ndarray, to provide the Jacobian directly

When possible, provide the Jacobian directly

Computing the jacobian on the whole dataset can be memory demanding. If you have the jacobian already computed, provide it directly to the constructor.

None
axis_limits Optional[ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

100_000 (default), is a good choice. RHALE can handle large datasets 😎 😎

100000
feature_types Optional[List]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Optional[int]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Optional[List]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Optional[str]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the RHALE-based heterogeneity.

plot
Source code in effector/regional_effect_ale.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    data_effect: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 100_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_types: Optional[List] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        model_jac: the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `dy_dx`: `ndarray` of shape `(N, D)`

        data_effect: The jacobian of the `model` on the `data`

            - `None`, infers the Jacobian internally using `model_jac(data)` or numerically
            - `np.ndarray`, to provide the Jacobian directly

            !!! tip "When possible, provide the Jacobian directly"

                Computing the jacobian on the whole dataset can be memory demanding.
                If you have the jacobian already computed, provide it directly to the constructor.

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`100_000` (default), is a good choice. RHALE can handle large datasets :sunglasses: :sunglasses: "

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalRHALE, self).__init__(
        "rhale",
        data,
        model,
        model_jac,
        data_effect,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features='all', candidate_conditioning_features='all', space_partitioner='best', binning_method='greedy', points_for_mean_heterogeneity=30)

Find subregions by minimizing the RHALE-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
'all'
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

'all'
space_partitioner Union[str, Best]

the space partitioner to use

'best'
binning_method str

the binning method to use.

  • Use "greedy" for using the Greedy binning solution with the default parameters. For custom parameters initialize a binning_methods.Greedy object
  • Use "dp" for using a Dynamic Programming binning solution with the default parameters. For custom parameters initialize a binning_methods.DynamicProgramming object
  • Use "fixed" for using a Fixed binning solution with the default parameters. For custom parameters initialize a binning_methods.Fixed object
'greedy'
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
Source code in effector/regional_effect_ale.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union[str, list] = "all",
    space_partitioner: typing.Union[str, effector.space_partitioning.Best] = "best",
    binning_method: typing.Union[str, ap.Fixed, ap.DynamicProgramming, ap.Greedy,] = "greedy",
    points_for_mean_heterogeneity: int = 30,
):
    """
    Find subregions by minimizing the RHALE-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features
        space_partitioner: the space partitioner to use
        binning_method (str): the binning method to use.

            - Use `"greedy"` for using the Greedy binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Greedy` object
            - Use `"dp"` for using a Dynamic Programming binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.DynamicProgramming` object
            - Use `"fixed"` for using a Fixed binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Fixed` object

        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
    """
    if self.data_effect is None:
        self.compile()

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # find global axis limits
        heter = self._create_heterogeneity_function(
            feat, binning_method, space_partitioner.min_points_per_subregion, points_for_mean_heterogeneity
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k: v for k, v in all_arguments.items() if k in ["binnning_method"]}

plot(feature, node_idx, heterogeneity=True, centering=True, scale_x_list=None, scale_y=None, y_limits=None, dy_limits=None)

Source code in effector/regional_effect_ale.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
def plot(
    self,
    feature,
    node_idx,
    heterogeneity=True,
    centering=True,
    scale_x_list=None,
    scale_y=None,
    y_limits=None,
    dy_limits=None,
):

    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_pdp.RegionalPDP(data, model, nof_instances=10000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalPDPBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model callable

the black-box model, Callable with signature f(x) -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
axis_limits Union[None, ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

10_000 (default), is a good balance between speed and accuracy

10000
feature_types Union[list, None]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Union[int, None]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Union[list, None]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Union[str, None]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the PDP-based heterogeneity.

plot
Source code in effector/regional_effect_pdp.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `f(x) -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`10_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalPDP, self).__init__(
        "pdp",
        data,
        model,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features='all', candidate_conditioning_features='all', space_partitioner='best', points_for_centering=30, points_for_mean_heterogeneity=30, use_vectorized=True)

Find subregions by minimizing the PDP-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
'all'
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

  • use "all", for all features, e.g. candidate_conditioning_features="all"
  • use a list, for multiple features, e.g. candidate_conditioning_features=[0, 1, 2]
  • it means that for each feature in the feature list, the algorithm will consider applying a split conditioned on each feature in the candidate_conditioning_features list
'all'
space_partitioner Union[str, None]

the method to use for partitioning the space

'best'
points_for_centering int

number of equidistant points along the feature axis used for centering ICE plots

30
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
use_vectorized bool

whether to use vectorized operations for the PDP and ICE curves

True
Source code in effector/regional_effect_pdp.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, None] = "best",
    points_for_centering: int = 30,
    points_for_mean_heterogeneity: int = 30,
    use_vectorized: bool = True,
):
    """
    Find subregions by minimizing the PDP-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features

            - use `"all"`, for all features, e.g. `candidate_conditioning_features="all"`
            - use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]`
            - it means that for each feature in the `feature` list, the algorithm will consider applying a split
            conditioned on each feature in the `candidate_conditioning_features` list

        space_partitioner: the method to use for partitioning the space
        points_for_centering: number of equidistant points along the feature axis used for centering ICE plots
        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves


    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # define the global method
        pdp = PDP(self.data, self.model, self.axis_limits, nof_instances="all")

        pdp.fit(
            features=feat,
            centering=True,
            points_for_centering=points_for_centering,
            use_vectorized=use_vectorized,
        )

        xx = np.linspace(self.axis_limits[:, feat][0], self.axis_limits[:, feat][1], points_for_mean_heterogeneity)
        y_ice = pdp.eval(
                feature=feat,
                xs=xx,
                heterogeneity=True,
                centering=True,
                use_vectorized=use_vectorized,
                return_all=True
            )
        self.y_ice["feature_" + str(feat)] = y_ice.T

        heter = self._create_heterogeneity_function(
            foi = feat,
            min_points=space_partitioner.min_points_per_subregion,
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}
    self.kwargs_subregion_detection["points_for_mean_heterogeneity"] = points_for_mean_heterogeneity

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["centering", "points_for_centering", "use_vectorized"]}

plot(feature, node_idx, heterogeneity='ice', centering=False, nof_points=30, scale_x_list=None, scale_y=None, nof_ice=100, show_avg_output=False, y_limits=None, use_vectorized=True)

Source code in effector/regional_effect_pdp.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def plot(
    self,
    feature: int,
    node_idx: int,
    heterogeneity: bool = "ice",
    centering: typing.Union[bool, str] = False,
    nof_points: int = 30,
    scale_x_list: typing.Union[None, list] = None,
    scale_y: typing.Union[None, list] = None,
    nof_ice: int = 100,
    show_avg_output: bool = False,
    y_limits: typing.Union[None, list] = None,
    use_vectorized: bool = True,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_pdp.RegionalDerPDP(data, model, model_jac=None, nof_instances=10000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)

Bases: RegionalPDPBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model callable

the black-box model, Callable with signature x -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
model_jac Optional[callable]

the black-box model's Jacobian, Callable with signature x -> dy_dx where:

  • x: ndarray of shape (N, D)
  • dy_dx: ndarray of shape (N, D)
None
axis_limits Union[None, ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

10_000 (default), is a good balance between speed and accuracy

10000
feature_types Union[list, None]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Union[int, None]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Union[list, None]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Union[str, None]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None

Methods:

Name Description
fit

Find subregions by minimizing the PDP-based heterogeneity.

plot
Source code in effector/regional_effect_pdp.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    model_jac: typing.Optional[callable] = None,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        model_jac: the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `dy_dx`: `ndarray` of shape `(N, D)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`10_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalDerPDP, self).__init__(
        "d-pdp",
        data,
        model,
        model_jac,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features='all', candidate_conditioning_features='all', space_partitioner='best', points_for_mean_heterogeneity=30, use_vectorized=True)

Find subregions by minimizing the PDP-based heterogeneity.

Parameters:

Name Type Description Default
features Union[int, str, list]

for which features to search for subregions

  • use "all", for all features, e.g. features="all"
  • use an int, for a single feature, e.g. features=0
  • use a list, for multiple features, e.g. features=[0, 1, 2]
'all'
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features

  • use "all", for all features, e.g. candidate_conditioning_features="all"
  • use a list, for multiple features, e.g. candidate_conditioning_features=[0, 1, 2]
  • it means that for each feature in the feature list, the algorithm will consider applying a split conditioned on each feature in the candidate_conditioning_features list
'all'
space_partitioner Union[str, None]

the method to use for partitioning the space

'best'
points_for_mean_heterogeneity int

number of equidistant points along the feature axis used for computing the mean heterogeneity

30
use_vectorized bool

whether to use vectorized operations for the PDP and ICE curves

True
Source code in effector/regional_effect_pdp.py
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, None] = "best",
    points_for_mean_heterogeneity: int = 30,
    use_vectorized: bool = True,
):
    """
    Find subregions by minimizing the PDP-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features

            - use `"all"`, for all features, e.g. `candidate_conditioning_features="all"`
            - use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]`
            - it means that for each feature in the `feature` list, the algorithm will consider applying a split
            conditioned on each feature in the `candidate_conditioning_features` list

        space_partitioner: the method to use for partitioning the space
        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves


    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # define the global method
        pdp = DerPDP(self.data, self.model, self.model_jac, self.axis_limits, nof_instances="all")

        pdp.fit(
            features=feat,
            centering=False,
            use_vectorized=use_vectorized,
        )

        xx = np.linspace(self.axis_limits[:, feat][0], self.axis_limits[:, feat][1], points_for_mean_heterogeneity)
        y_ice = pdp.eval(
                feature=feat,
                xs=xx,
                heterogeneity=True,
                centering=False,
                use_vectorized=use_vectorized,
                return_all=True
            )
        self.y_ice["feature_" + str(feat)] = y_ice.T

        heter = self._create_heterogeneity_function(
            foi = feat,
            min_points=space_partitioner.min_points_per_subregion,
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}
    self.kwargs_subregion_detection["points_for_mean_heterogeneity"] = points_for_mean_heterogeneity

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["centering", "points_for_centering", "use_vectorized"]}

plot(feature, node_idx=0, heterogeneity='ice', centering=False, nof_points=30, scale_x_list=None, scale_y=None, nof_ice=100, show_avg_output=False, dy_limits=None, use_vectorized=True)

Source code in effector/regional_effect_pdp.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
def plot(
    self,
    feature: int,
    node_idx: int = 0,
    heterogeneity: bool = "ice",
    centering: typing.Union[bool, str] = False,
    nof_points: int = 30,
    scale_x_list: typing.Union[None, list] = None,
    scale_y: typing.Union[None, list] = None,
    nof_ice: int = 100,
    show_avg_output: bool = False,
    dy_limits: typing.Union[None, list] = None,
    use_vectorized: bool = True,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

effector.regional_effect_shap.RegionalShapDP(data, model, axis_limits=None, nof_instances=1000, feature_types=None, cat_limit=10, feature_names=None, target_name=None, backend='shap')

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name Type Description Default
data ndarray

the design matrix, ndarray of shape (N,D)

required
model Callable

the black-box model, Callable with signature f(x) -> y where:

  • x: ndarray of shape (N, D)
  • y: ndarray of shape (N)
required
axis_limits Optional[ndarray]

Feature effect limits along each axis

  • None, infers them from data (min and max of each feature)
  • array of shape (D, 2), manually specify the limits for each feature.

When possible, specify the axis limits manually

  • they help to discard outliers and improve the quality of the fit
  • axis_limits define the .plot method's x-axis limits; manual specification leads to better visualizations

Their shape is (2, D), not (D, 2)

axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
None
nof_instances Union[int, str]

Max instances to use

  • "all", uses all data
  • int, randomly selects int instances from data

1_000 (default), is a good balance between speed and accuracy

1000
feature_types Optional[List[str]]

The feature types.

  • None, infers them from data; if the number of unique values is less than cat_limit, it is considered categorical.
  • ['cat', 'cont', ...], manually specify the types of the features
None
cat_limit Optional[int]

The minimum number of unique values for a feature to be considered categorical

  • if feature_types is manually specified, this parameter is ignored
10
feature_names Optional[List[str]]

The names of the features

  • None, defaults to: ["x_0", "x_1", ...]
  • ["age", "weight", ...] to manually specify the names of the features
None
target_name Optional[str]

The name of the target variable

  • None, to keep the default name: "y"
  • "price", to manually specify the name of the target variable
None
backend str

Package to compute SHAP values

  • use "shap" for the shap package (default)
  • use "shapiq" for the shapiq package
'shap'

Methods:

Name Description
fit

Fit the regional SHAP.

plot

Plot the regional SHAP.

Source code in effector/regional_effect_shap.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    axis_limits: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 1_000,
    feature_types: Optional[List[str]] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List[str]] = None,
    target_name: Optional[str] = None,
    backend: str = "shap",
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `f(x) -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`1_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable

        backend: Package to compute SHAP values

            - use `"shap"` for the `shap` package (default)
            - use `"shapiq"` for the `shapiq` package
    """
    self.global_shap_values = None
    self.backend = backend
    super(RegionalShapDP, self).__init__(
        "shap",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

fit(features, candidate_conditioning_features='all', space_partitioner='best', binning_method='greedy', budget=512, shap_explainer_kwargs=None, shap_explanation_kwargs=None)

Fit the regional SHAP.

Parameters:

Name Type Description Default
features Union[int, str, list]

the features to fit. - If set to "all", all the features will be fitted.

required
candidate_conditioning_features Union[str, list]

list of features to consider as conditioning features for the candidate splits - If set to "all", all the features will be considered as conditioning features.

'all'
space_partitioner Union[str, Best]

the space partitioner to use - If set to "greedy", the greedy space partitioner will be used.

'best'
binning_method Union[str, Greedy, Fixed]

the binning method to use

'greedy'
budget int

Budget to use for the approximation. Defaults to 512. - Increasing the budget improves the approximation at the cost of slower computation. - Decrease the budget for faster computation at the cost of approximation error.

512
shap_explainer_kwargs Optional[dict]

the keyword arguments to be passed to the shap.Explainer or shapiq.Explainer class, depending on the backend.

Code behind the scene

Check the code that is running behind the scene before customizing shap_explainer_kwargs.

explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {}
explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {}
if self.backend == "shap":
    explainer_defaults = {"masker": data}
    explanation_defaults = {"max_evals": budget}
elif self.backend == "shapiq":
    explainer_defaults = {
        "data": data,
        "index": "SV",
        "max_order": 1,
        "approximator": "permutation",
        "imputer": "marginal",
    }
    explanation_defaults = {"budget": budget}
else:
    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
explainer_kwargs = {**explainer_defaults, **explainer_kwargs}  # User args override defaults
explanation_kwargs = {**explanation_defaults, **explanation_kwargs}  # User args override defaults

if self.backend == "shap":
    explainer = shap.Explainer(model, **explainer_kwargs)
    explanation = explainer(data, **explanation_kwargs)
    self.shap_values = explanation.values
elif self.backend == "shapiq":
    explainer = shapiq.Explainer(model, **explainer_kwargs)
    explanations = explainer.explain_X(data, **explanation_kwargs)
    self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations])
else:
    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
Be careful with custom arguments

For customizing shap_explainer_kwargs and shap_explanation_kwargs args, check the official documentation of shap and shapiq packages.

None
shap_explanation_kwargs Optional[dict]

the keyword arguments to be passed to the shap or shapiq Explainer to compute the SHAP values.

Code behind the scene

Check the code that is running behind the scene before customizing shap_explanation_kwargs.

explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {}
explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {}
if self.backend == "shap":
    explainer_defaults = {"masker": data}
    explanation_defaults = {"max_evals": budget}
elif self.backend == "shapiq":
    explainer_defaults = {
        "data": data,
        "index": "SV",
        "max_order": 1,
        "approximator": "permutation",
        "imputer": "marginal",
    }
    explanation_defaults = {"budget": budget}
else:
    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
explainer_kwargs = {**explainer_defaults, **explainer_kwargs}  # User args override defaults
explanation_kwargs = {**explanation_defaults, **explanation_kwargs}  # User args override defaults

if self.backend == "shap":
    explainer = shap.Explainer(model, **explainer_kwargs)
    explanation = explainer(data, **explanation_kwargs)
    self.shap_values = explanation.values
elif self.backend == "shapiq":
    explainer = shapiq.Explainer(model, **explainer_kwargs)
    explanations = explainer.explain_X(data, **explanation_kwargs)
    self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations])
else:
    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
Be careful with custom arguments

For customizing shap_explainer_kwargs and shap_explanation_kwargs args, check the official documentation of shap and shapiq packages.

None
Source code in effector/regional_effect_shap.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
def fit(
    self,
    features: typing.Union[int, str, list],
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union["str", effector.space_partitioning.Best] = "best",
    binning_method: Union[str, ap.Greedy, ap.Fixed] = "greedy",
    budget: int = 512,
    shap_explainer_kwargs: Optional[dict] = None,
    shap_explanation_kwargs: Optional[dict] = None,
):
    """
    Fit the regional SHAP.

    Args:
        features: the features to fit.
            - If set to "all", all the features will be fitted.

        candidate_conditioning_features: list of features to consider as conditioning features for the candidate splits
            - If set to "all", all the features will be considered as conditioning features.

        space_partitioner: the space partitioner to use
            - If set to "greedy", the greedy space partitioner will be used.

        binning_method: the binning method to use

        budget: Budget to use for the approximation. Defaults to 512.
            - Increasing the budget improves the approximation at the cost of slower computation.
            - Decrease the budget for faster computation at the cost of approximation error.

        shap_explainer_kwargs: the keyword arguments to be passed to the `shap.Explainer` or `shapiq.Explainer` class, depending on the backend.

            ??? note "Code behind the scene"
                Check the code that is running behind the scene before customizing `shap_explainer_kwargs`.

                ```python
                explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {}
                explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {}
                if self.backend == "shap":
                    explainer_defaults = {"masker": data}
                    explanation_defaults = {"max_evals": budget}
                elif self.backend == "shapiq":
                    explainer_defaults = {
                        "data": data,
                        "index": "SV",
                        "max_order": 1,
                        "approximator": "permutation",
                        "imputer": "marginal",
                    }
                    explanation_defaults = {"budget": budget}
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                explainer_kwargs = {**explainer_defaults, **explainer_kwargs}  # User args override defaults
                explanation_kwargs = {**explanation_defaults, **explanation_kwargs}  # User args override defaults

                if self.backend == "shap":
                    explainer = shap.Explainer(model, **explainer_kwargs)
                    explanation = explainer(data, **explanation_kwargs)
                    self.shap_values = explanation.values
                elif self.backend == "shapiq":
                    explainer = shapiq.Explainer(model, **explainer_kwargs)
                    explanations = explainer.explain_X(data, **explanation_kwargs)
                    self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations])
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                ```

            ??? warning "Be careful with custom arguments"

                For customizing `shap_explainer_kwargs` and `shap_explanation_kwargs` args,
                check the official documentation of [`shap`](https://shap.readthedocs.io/en/latest/) and [`shapiq`](https://shapiq.readthedocs.io/en/latest/) packages.

        shap_explanation_kwargs: the keyword arguments to be passed to the `shap` or `shapiq` Explainer to compute the SHAP values.

            ??? note "Code behind the scene"

                Check the code that is running behind the scene before customizing `shap_explanation_kwargs`.

                ```python
                explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {}
                explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {}
                if self.backend == "shap":
                    explainer_defaults = {"masker": data}
                    explanation_defaults = {"max_evals": budget}
                elif self.backend == "shapiq":
                    explainer_defaults = {
                        "data": data,
                        "index": "SV",
                        "max_order": 1,
                        "approximator": "permutation",
                        "imputer": "marginal",
                    }
                    explanation_defaults = {"budget": budget}
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                explainer_kwargs = {**explainer_defaults, **explainer_kwargs}  # User args override defaults
                explanation_kwargs = {**explanation_defaults, **explanation_kwargs}  # User args override defaults

                if self.backend == "shap":
                    explainer = shap.Explainer(model, **explainer_kwargs)
                    explanation = explainer(data, **explanation_kwargs)
                    self.shap_values = explanation.values
                elif self.backend == "shapiq":
                    explainer = shapiq.Explainer(model, **explainer_kwargs)
                    explanations = explainer.explain_X(data, **explanation_kwargs)
                    self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations])
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                ```

            ??? warning "Be careful with custom arguments"

                For customizing `shap_explainer_kwargs` and `shap_explanation_kwargs` args,
                check the official documentation of [`shap`](https://shap.readthedocs.io/en/latest/) and [`shapiq`](https://shapiq.readthedocs.io/en/latest/) packages.

    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)

    for feat in tqdm(features):
        # assert global SHAP values are available
        if self.global_shap_values is None:
            global_shap_dp = effector.ShapDP(self.data, self.model, self.axis_limits, "all", backend=self.backend)
            global_shap_dp.fit(
                feat,
                centering=False,
                binning_method=binning_method,
                budget=budget,
                shap_explainer_kwargs=shap_explainer_kwargs,
                shap_explanation_kwargs=shap_explanation_kwargs
            )
            self.global_shap_values = global_shap_dp.shap_values

        heter = self._create_heterogeneity_function(feat, space_partitioner.min_points_per_subregion, binning_method)

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 3 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # fit kwargs
    self.kwargs_fitting = {
        "binning_method": binning_method,
        "budget": budget,
        "shap_explainer_kwargs": shap_explainer_kwargs,
        "shap_explanation_kwargs": shap_explanation_kwargs
    }

plot(feature, node_idx, heterogeneity='shap_values', centering=True, nof_points=30, scale_x_list=None, scale_y=None, nof_shap_values='all', show_avg_output=False, y_limits=None, only_shap_values=False)

Plot the regional SHAP.

Parameters:

Name Type Description Default
feature

the feature to plot

required
node_idx

the index of the node to plot

required
heterogeneity

whether to plot the heterogeneity

'shap_values'
centering

whether to center the SHAP values

True
nof_points

number of points to plot

30
scale_x_list

the list of scaling factors for the feature names

None
scale_y

the scaling factor for the SHAP values

None
nof_shap_values

number of SHAP values to plot

'all'
show_avg_output

whether to show the average output

False
y_limits

the limits of the y-axis

None
only_shap_values

whether to plot only the SHAP values

False
Source code in effector/regional_effect_shap.py
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
def plot(self,
         feature,
         node_idx,
         heterogeneity="shap_values",
         centering=True,
         nof_points=30,
         scale_x_list=None,
         scale_y=None,
         nof_shap_values='all',
         show_avg_output=False,
         y_limits=None,
         only_shap_values=False
):
    """
    Plot the regional SHAP.

    Args:
        feature: the feature to plot
        node_idx: the index of the node to plot
        heterogeneity: whether to plot the heterogeneity
        centering: whether to center the SHAP values
        nof_points: number of points to plot
        scale_x_list: the list of scaling factors for the feature names
        scale_y: the scaling factor for the SHAP values
        nof_shap_values: number of SHAP values to plot
        show_avg_output: whether to show the average output
        y_limits: the limits of the y-axis
        only_shap_values: whether to plot only the SHAP values
    """
    kwargs = locals()
    kwargs.pop("self")
    return self._plot(kwargs)