Api regional

Summary

All regional effect methods have a similar interface and workflow:

create an instance of the regional effect method you want to use
(optional) .fit() to customize the method
.summary() to print the partition tree found for each feature
.plot() to plot the regional effect of a feature at a specific node
.eval() to evaluate the regional effect of a feature at a specific node at a specific grid of points

Usage

# set up the input
X = ... # input data
predict = ... # model to be explained
jacobian = ... # jacobian of the model

Create an instance of the regional effect method you want to use:

PDPRHALEShapDPALEDerPDP

effector.RegionalPDP(data=X, model=predict)

effector.RegionalRHALE(data=X, model=predict, model_jac=jacobian)

effector.RegionalShapDP(data=X, model=predict)

effector.RegionalALE(data=X, model=predict)

effector.DerPDP(data=X, model=predict, model_jac=jacobian)

Customize the regional effect method (optional):

.fit(features, **method_specific_args)

This is the place for customization

The .fit() step can be omitted if you are ok with the default settings; you can directly call the .summary(), .plot(), or .eval() methods. However, if you want more control over the fitting process, you can pass additional arguments to the .fit() method. Check the Usage section below and the method-specific documentation for more information.

Usage

# customize the space partitioning algorithm
space_partitioner = effector.space_partitioning.Best(
    min_heterogeneity_decrease_pcg=0.3, # percentage drop threshold (default: 0.1),
    max_split_levels=1 # maximum number of split levels (default: 2)
)
r_method.fit(
    features=[0, 1], # list of features to be analyzed
    space_partitioner=space_partitioner # space partitioning algorithm (default: effector.space_partitioning.Best)
)

Print the partition tree found for each feature in features:

.summary(features)

Usage

features = [...] # list of features to be analyzed
r_method.summary(features)

Example Output

Feature 3 - Full partition tree:
🌳 Full Tree Structure:
───────────────────────

hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
    workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
        temp ≤ 6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
        temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
    workingday ≠ 0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
        temp ≤ 6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
        temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
--------------------------------------------------
Feature 3 - Statistics per tree level:
🌳 Tree Summary:
─────────────────
Level 0🔹heter: 0.43
    Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
        Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)

Plot the regional effect of a feature at a specific node:

.plot(feature, node_idx)
Usage
```
feature = ...
node_idx = ...
r_method.plot(feature, node_idx, **plot_specific_args)
```
Output

PDPRHALEShapDPALEderPDP

node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)

r_method.plot(0, 1) r_method.plot(0, 2)

node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)

r_method.plot(0, 1) r_method.plot(0, 2)

node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)

r_method.plot(0, 1) r_method.plot(0, 2)

node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)

r_method.plot(0, 1) r_method.plot(0, 2)

node_idx=1: \(x_1\) when \(x_2 \leq 0\) node_idx=2: \(x_1\) when \(x_2 > 0\)

r_method.plot(0, 1) r_method.plot(0, 2)

Evaluate the regional effect of a feature at a specific node at a specific grid of points:

.eval(feature, node_idx, xs)

Usage

# Example input
feature = ... # feature to be analyzed
node_idx = ... # node index
xs = ... # grid of points to evaluate the regional effect, e.g., np.linspace(0, 1, 100)

y, het = r_method.eval(feature, node_idx, xs)

API

Constructor for the RegionalEffect class.

Methods:

Name	Description
`eval`	Evaluate the regional effect for a given feature and node.
`summary`	Summarize the partition tree for the selected features.

Source code in effector/regional_effect.py

def __init__(
    self,
    method_name: str,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    data_effect: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 10_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_types: Optional[List] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
) -> None:
    """
    Constructor for the RegionalEffect class.
    """
    assert data.ndim == 2

    self.method_name = method_name.lower()
    self.model = model
    self.model_jac = model_jac

    self.dim = data.shape[1]

    # data preprocessing (i): if axis_limits passed manually,
    # keep only the points within,
    # otherwise, compute the axis limits from the data
    if axis_limits is not None:
        assert axis_limits.shape == (2, self.dim)
        assert np.all(axis_limits[0, :] <= axis_limits[1, :])

        # drop points outside of limits
        accept_indices = helpers.indices_within_limits(data, axis_limits)
        data = data[accept_indices, :]
        data_effect = data_effect[accept_indices, :] if data_effect is not None else None
    else:
        axis_limits = helpers.axis_limits_from_data(data)
    self.axis_limits: np.ndarray = axis_limits


    # data preprocessing (ii): select nof_instances from the remaining data
    self.nof_instances, self.indices = helpers.prep_nof_instances(nof_instances, data.shape[0])
    data = data[self.indices, :]
    data_effect = data_effect[self.indices, :] if data_effect is not None else None

    # store the data
    self.data: np.ndarray = data
    self.data_effect: Optional[np.ndarray] = data_effect

    # set feature types
    self.cat_limit = cat_limit
    feature_types = (
        utils.get_feature_types(data, cat_limit)
        if feature_types is None
        else feature_types
    )
    self.feature_types: list = feature_types

    # set feature names
    feature_names: list[str] = (
        helpers.get_feature_names(axis_limits.shape[1])
        if feature_names is None
        else feature_names
    )
    self.feature_names: list = feature_names

    # set target name
    self.target_name = "y" if target_name is None else target_name

    # state variables
    self.is_fitted: np.ndarray = np.ones([self.dim]) < 0

    # parameters used when fitting the regional effect
    # self.method_args: typing.Dict = {}
    self.kwargs_subregion_detection: typing.Dict = {} # subregion specific arguments
    self.kwargs_fitting: typing.Dict = {} # fitting specific arguments

    # dictionary with all the information required for plotting or evaluating the regional effects
    self.partitioners: typing.Dict[str, Best] = {}
    # self.tree_full: typing.Dict[str, Tree] = {}
    self.tree: typing.Dict[str, Tree] = {}

`eval(feature, node_idx, xs, heterogeneity=False, centering=True)`

Evaluate the regional effect for a given feature and node.

This is a common method for all regional effect methods, so use the arguments carefully.

centering=True is a good option for most methods, but not for all.
- DerPDP, use centering=False
- [RegionalPDP, RegionalShapDP], it depends on you
- [RegionalALE, RegionalRHALE], use centering=True

The heterogeneity argument changes the return value of the function.

If heterogeneity=False, the function returns y
If heterogeneity=True, the function returns a tuple (y, std)

Parameters:

Name	Type	Description	Default
`feature`	`int`	index of the feature	required
`node_idx`	`int`	index of the node	required
`xs`	`ndarray`	horizontal grid of points to evaluate on	required
`heterogeneity`	`bool`	whether to return the heterogeneity. if `heterogeneity=False`, the function returns `y`, a numpy array of the mean effect at grid points `xs` If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect at grid points `xs`	`False`
`centering`	`Union[bool, str]`	whether to center the regional effect. The following options are available: If `centering` is `False`, the regional effect is not centered If `centering` is `True` or `zero_integral`, the regional effect is centered around the `y` axis. If `centering` is `zero_start`, the regional effect starts from `y=0`.	`True`

Returns:

Type	Description
`Union[ndarray, Tuple[ndarray, ndarray]]`	the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

Source code in effector/regional_effect.py

def eval(
        self,
        feature: int,
        node_idx: int,
        xs: np.ndarray,
        heterogeneity: bool = False,
        centering: Union[bool, str] = True,
) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]:
    """
    :point_right: Evaluate the regional effect for a given feature and node.

    !!! note "This is a common method for all regional effect methods, so use the arguments carefully."

        - `centering=True` is a good option for most methods, but not for all.
            - `DerPDP`, use `centering=False`
            - `[RegionalPDP, RegionalShapDP]`, it depends on you :sunglasses:
            - `[RegionalALE, RegionalRHALE]`, use `centering=True`

    !!! note "The `heterogeneity` argument changes the return value of the function."

        - If `heterogeneity=False`, the function returns `y`
        - If `heterogeneity=True`, the function returns a tuple `(y, std)`

    Args:
        feature: index of the feature
        node_idx: index of the node
        xs: horizontal grid of points to evaluate on
        heterogeneity: whether to return the heterogeneity.

              - if `heterogeneity=False`, the function returns `y`, a numpy array of the mean effect at grid points `xs`
              - If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect at grid points `xs`

        centering: whether to center the regional effect. The following options are available:

            - If `centering` is `False`, the regional effect is not centered
            - If `centering` is `True` or `zero_integral`, the regional effect is centered around the `y` axis.
            - If `centering` is `zero_start`, the regional effect starts from `y=0`.

    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    self.refit(feature)
    centering = helpers.prep_centering(centering)

    kwargs = copy.deepcopy(self.kwargs_fitting)
    kwargs['centering'] = centering

    # select only the three out of all
    fe_method = self._create_fe_object(feature, node_idx, None)
    fe_method.fit(features=feature, **kwargs)
    return fe_method.eval(feature, xs, heterogeneity, centering)

`summary(features, scale_x_list=None)`

Summarize the partition tree for the selected features.

Example output

Feature 3 - Full partition tree:
🌳 Full Tree Structure:
───────────────────────
hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
    workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
        temp ≤ 6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
        temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
    workingday ≠ 0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
        temp ≤ 6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
        temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
--------------------------------------------------
Feature 3 - Statistics per tree level:
🌳 Tree Summary:
─────────────────
Level 0🔹heter: 0.43
    Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
        Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)

Parameters:

Name	Type	Description	Default
`features`	`List[int]`	indices of the features to summarize	required
`scale_x_list`	`Optional[List]`	list of scaling factors for each feature `None`, for no scaling `[{"mean": 0, "std": 1}, {"mean": 3, "std": 0.1}]`, to manually scale the features	`None`

Source code in effector/regional_effect.py

def summary(self, features: List[int], scale_x_list: Optional[List] = None):
    """:point_right: Summarize the partition tree for the selected features.

    ???+ Example "Example output"

        ```python
        Feature 3 - Full partition tree:
        🌳 Full Tree Structure:
        ───────────────────────
        hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
            workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
                temp ≤ 6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
                temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
            workingday ≠ 0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
                temp ≤ 6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
                temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
        --------------------------------------------------
        Feature 3 - Statistics per tree level:
        🌳 Tree Summary:
        ─────────────────
        Level 0🔹heter: 0.43
            Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
                Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)
        ```

    Args:
        features: indices of the features to summarize
        scale_x_list: list of scaling factors for each feature

            - `None`, for no scaling
            - `[{"mean": 0, "std": 1}, {"mean": 3, "std": 0.1}]`, to manually scale the features

    """
    features = helpers.prep_features(features, self.dim)

    for feat in features:
        self.refit(feat)

        feat_str = "feature_{}".format(feat)
        tree_dict = self.tree[feat_str]

        print("\n")
        print("Feature {} - Full partition tree:".format(feat))

        if tree_dict is None:
            print("No splits found for feature {}".format(feat))
        else:
            tree_dict.show_full_tree(scale_x_list=scale_x_list)

        print("-" * 50)
        print("Feature {} - Statistics per tree level:".format(feat))

        if tree_dict is None:
            print("No splits found for feature {}".format(feat))
        else:
            tree_dict.show_level_stats()
        print("\n")

`effector.regional_effect_ale.RegionalALE(data, model, nof_instances=100000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)`

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	the design matrix, `ndarray` of shape `(N,D)`	required
`model`	`callable`	the black-box model, `Callable` with signature `x -> y` where: `x`: `ndarray` of shape `(N, D)` `y`: `ndarray` of shape `(N)`	required
`axis_limits`	`Union[None, ndarray]`	Feature effect limits along each axis `None`, infers them from `data` (`min` and `max` of each feature) `array` of shape `(D, 2)`, manually specify the limits for each feature. When possible, specify the axis limits manually they help to discard outliers and improve the quality of the fit `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations Their shape is `(2, D)`, not `(D, 2)` `axis_limits = np.array([[0, 1, -1], [1, 2, 3]])`	`None`
`nof_instances`	`Union[int, str]`	Max instances to use `"all"`, uses all `data` `int`, randomly selects `int` instances from `data` `100_000` (default) is a good choice; RegionalALE can handle large datasets.	`100000`
`feature_types`	`Union[list, None]`	The feature types. `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical. `['cat', 'cont', ...]`, manually specify the types of the features	`None`
`cat_limit`	`Union[int, None]`	The minimum number of unique values for a feature to be considered categorical if `feature_types` is manually specified, this parameter is ignored	`10`
`feature_names`	`Union[list, None]`	The names of the features `None`, defaults to: `["x_0", "x_1", ...]` `["age", "weight", ...]` to manually specify the names of the features	`None`
`target_name`	`Union[str, None]`	The name of the target variable `None`, to keep the default name: `"y"` `"price"`, to manually specify the name of the target variable	`None`

Methods:

Name	Description
`fit`	Find subregions by minimizing the ALE-based heterogeneity.
`plot`

Source code in effector/regional_effect_ale.py

def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: typing.Union[int, str] = 100_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`100_000` (default) is a good choice; RegionalALE can handle large datasets. :sunglasses:"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    self.global_bin_limits = {}
    self.global_data_effect = {}
    super(RegionalALE, self).__init__(
        "ale",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

`fit(features, candidate_conditioning_features='all', space_partitioner='best', binning_method='fixed', points_for_mean_heterogeneity=30)`

Find subregions by minimizing the ALE-based heterogeneity.

Parameters:

Name	Type	Description	Default
`features`	`Union[int, str, list]`	for which features to search for subregions use `"all"`, for all features, e.g. `features="all"` use an `int`, for a single feature, e.g. `features=0` use a `list`, for multiple features, e.g. `features=[0, 1, 2]`	required
`candidate_conditioning_features`	`Union[str, list]`	list of features to consider as conditioning features	`'all'`
`space_partitioner`	`Union[str, Best]`	the space partitioner to use	`'best'`
`binning_method`	`Union[str, Fixed]`	must be the Fixed binning method If set to `"fixed"`, the ALE plot will be computed with the default values, which are `20` bins with at least `0` points per bin If you want to change the parameters of the method, you pass an instance of the class `effector.binning_methods.Fixed` with the desired parameters. For example: `Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)`	`'fixed'`
`points_for_mean_heterogeneity`	`int`	number of equidistant points along the feature axis used for computing the mean heterogeneity	`30`

Source code in effector/regional_effect_ale.py

def fit(
    self,
    features: typing.Union[int, str, list],
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, effector.space_partitioning.Best] = "best",
    binning_method: typing.Union[str, ap.Fixed] = "fixed",
    points_for_mean_heterogeneity: int = 30
):
    """
    Find subregions by minimizing the ALE-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features
        space_partitioner: the space partitioner to use

        binning_method: must be the Fixed binning method

            - If set to `"fixed"`, the ALE plot will be computed with the  default values, which are
            `20` bins with at least `0` points per bin
            - If you want to change the parameters of the method, you pass an instance of the
            class `effector.binning_methods.Fixed` with the desired parameters.
            For example: `Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)`

        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
    """
    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # fit global method
        global_ale = ALE(self.data, self.model, nof_instances="all", axis_limits=self.axis_limits)
        global_ale.fit(features=feat, binning_method=binning_method, centering=False)
        self.global_data_effect["feature_" + str(feat)] = global_ale.data_effect_ale["feature_" + str(feat)]
        self.global_bin_limits["feature_" + str(feat)] = global_ale.bin_limits["feature_" + str(feat)]

        # create heterogeneity function
        heter = self._create_heterogeneity_function(feat, space_partitioner.min_points_per_subregion, points_for_mean_heterogeneity)

        # fit feature
        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["binnning_method"]}

`plot(feature, node_idx, heterogeneity=True, centering=True, scale_x_list=None, scale_y=None, y_limits=None, dy_limits=None)`

Source code in effector/regional_effect_ale.py

def plot(
    self,
    feature,
    node_idx,
    heterogeneity=True,
    centering=True,
    scale_x_list=None,
    scale_y=None,
    y_limits=None,
    dy_limits=None,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

`effector.regional_effect_ale.RegionalRHALE(data, model, model_jac=None, data_effect=None, nof_instances=100000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)`

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	the design matrix, `ndarray` of shape `(N,D)`	required
`model`	`Callable`	the black-box model, `Callable` with signature `x -> y` where: `x`: `ndarray` of shape `(N, D)` `y`: `ndarray` of shape `(N)`	required
`model_jac`	`Optional[Callable]`	the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where: `x`: `ndarray` of shape `(N, D)` `dy_dx`: `ndarray` of shape `(N, D)`	`None`
`data_effect`	`Optional[ndarray]`	The jacobian of the `model` on the `data` `None`, infers the Jacobian internally using `model_jac(data)` or numerically `np.ndarray`, to provide the Jacobian directly When possible, provide the Jacobian directly Computing the jacobian on the whole dataset can be memory demanding. If you have the jacobian already computed, provide it directly to the constructor.	`None`
`axis_limits`	`Optional[ndarray]`	Feature effect limits along each axis `None`, infers them from `data` (`min` and `max` of each feature) `array` of shape `(D, 2)`, manually specify the limits for each feature. When possible, specify the axis limits manually they help to discard outliers and improve the quality of the fit `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations Their shape is `(2, D)`, not `(D, 2)` `axis_limits = np.array([[0, 1, -1], [1, 2, 3]])`	`None`
`nof_instances`	`Union[int, str]`	Max instances to use `"all"`, uses all `data` `int`, randomly selects `int` instances from `data` `100_000` (default), is a good choice. RHALE can handle large datasets	`100000`
`feature_types`	`Optional[List]`	The feature types. `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical. `['cat', 'cont', ...]`, manually specify the types of the features	`None`
`cat_limit`	`Optional[int]`	The minimum number of unique values for a feature to be considered categorical if `feature_types` is manually specified, this parameter is ignored	`10`
`feature_names`	`Optional[List]`	The names of the features `None`, defaults to: `["x_0", "x_1", ...]` `["age", "weight", ...]` to manually specify the names of the features	`None`
`target_name`	`Optional[str]`	The name of the target variable `None`, to keep the default name: `"y"` `"price"`, to manually specify the name of the target variable	`None`

Methods:

Name	Description
`fit`	Find subregions by minimizing the RHALE-based heterogeneity.
`plot`

Source code in effector/regional_effect_ale.py

def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    data_effect: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 100_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_types: Optional[List] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        model_jac: the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `dy_dx`: `ndarray` of shape `(N, D)`

        data_effect: The jacobian of the `model` on the `data`

            - `None`, infers the Jacobian internally using `model_jac(data)` or numerically
            - `np.ndarray`, to provide the Jacobian directly

            !!! tip "When possible, provide the Jacobian directly"

                Computing the jacobian on the whole dataset can be memory demanding.
                If you have the jacobian already computed, provide it directly to the constructor.

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`100_000` (default), is a good choice. RHALE can handle large datasets :sunglasses: :sunglasses: "

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalRHALE, self).__init__(
        "rhale",
        data,
        model,
        model_jac,
        data_effect,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

`fit(features='all', candidate_conditioning_features='all', space_partitioner='best', binning_method='greedy', points_for_mean_heterogeneity=30)`

Find subregions by minimizing the RHALE-based heterogeneity.

Parameters:

Name	Type	Description	Default
`features`	`Union[int, str, list]`	for which features to search for subregions use `"all"`, for all features, e.g. `features="all"` use an `int`, for a single feature, e.g. `features=0` use a `list`, for multiple features, e.g. `features=[0, 1, 2]`	`'all'`
`candidate_conditioning_features`	`Union[str, list]`	list of features to consider as conditioning features	`'all'`
`space_partitioner`	`Union[str, Best]`	the space partitioner to use	`'best'`
`binning_method`	`str`	the binning method to use. Use `"greedy"` for using the Greedy binning solution with the default parameters. For custom parameters initialize a `binning_methods.Greedy` object Use `"dp"` for using a Dynamic Programming binning solution with the default parameters. For custom parameters initialize a `binning_methods.DynamicProgramming` object Use `"fixed"` for using a Fixed binning solution with the default parameters. For custom parameters initialize a `binning_methods.Fixed` object	`'greedy'`
`points_for_mean_heterogeneity`	`int`	number of equidistant points along the feature axis used for computing the mean heterogeneity	`30`

Source code in effector/regional_effect_ale.py

def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union[str, list] = "all",
    space_partitioner: typing.Union[str, effector.space_partitioning.Best] = "best",
    binning_method: typing.Union[str, ap.Fixed, ap.DynamicProgramming, ap.Greedy,] = "greedy",
    points_for_mean_heterogeneity: int = 30,
):
    """
    Find subregions by minimizing the RHALE-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features
        space_partitioner: the space partitioner to use
        binning_method (str): the binning method to use.

            - Use `"greedy"` for using the Greedy binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Greedy` object
            - Use `"dp"` for using a Dynamic Programming binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.DynamicProgramming` object
            - Use `"fixed"` for using a Fixed binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Fixed` object

        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
    """
    if self.data_effect is None:
        self.compile()

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # find global axis limits
        heter = self._create_heterogeneity_function(
            feat, binning_method, space_partitioner.min_points_per_subregion, points_for_mean_heterogeneity
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k: v for k, v in all_arguments.items() if k in ["binnning_method"]}

`plot(feature, node_idx, heterogeneity=True, centering=True, scale_x_list=None, scale_y=None, y_limits=None, dy_limits=None)`

Source code in effector/regional_effect_ale.py

def plot(
    self,
    feature,
    node_idx,
    heterogeneity=True,
    centering=True,
    scale_x_list=None,
    scale_y=None,
    y_limits=None,
    dy_limits=None,
):

    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

`effector.regional_effect_pdp.RegionalPDP(data, model, nof_instances=10000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)`

Bases: RegionalPDPBase

Initialize the Regional Effect method.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	the design matrix, `ndarray` of shape `(N,D)`	required
`model`	`callable`	the black-box model, `Callable` with signature `f(x) -> y` where: `x`: `ndarray` of shape `(N, D)` `y`: `ndarray` of shape `(N)`	required
`axis_limits`	`Union[None, ndarray]`	Feature effect limits along each axis `None`, infers them from `data` (`min` and `max` of each feature) `array` of shape `(D, 2)`, manually specify the limits for each feature. When possible, specify the axis limits manually they help to discard outliers and improve the quality of the fit `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations Their shape is `(2, D)`, not `(D, 2)` `axis_limits = np.array([[0, 1, -1], [1, 2, 3]])`	`None`
`nof_instances`	`Union[int, str]`	Max instances to use `"all"`, uses all `data` `int`, randomly selects `int` instances from `data` `10_000` (default), is a good balance between speed and accuracy	`10000`
`feature_types`	`Union[list, None]`	The feature types. `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical. `['cat', 'cont', ...]`, manually specify the types of the features	`None`
`cat_limit`	`Union[int, None]`	The minimum number of unique values for a feature to be considered categorical if `feature_types` is manually specified, this parameter is ignored	`10`
`feature_names`	`Union[list, None]`	The names of the features `None`, defaults to: `["x_0", "x_1", ...]` `["age", "weight", ...]` to manually specify the names of the features	`None`
`target_name`	`Union[str, None]`	The name of the target variable `None`, to keep the default name: `"y"` `"price"`, to manually specify the name of the target variable	`None`

Methods:

Name	Description
`fit`	Find subregions by minimizing the PDP-based heterogeneity.
`plot`

Source code in effector/regional_effect_pdp.py

def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `f(x) -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`10_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalPDP, self).__init__(
        "pdp",
        data,
        model,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

`fit(features='all', candidate_conditioning_features='all', space_partitioner='best', points_for_centering=30, points_for_mean_heterogeneity=30, use_vectorized=True)`

Find subregions by minimizing the PDP-based heterogeneity.

Parameters:

Name	Type	Description	Default
`features`	`Union[int, str, list]`	for which features to search for subregions use `"all"`, for all features, e.g. `features="all"` use an `int`, for a single feature, e.g. `features=0` use a `list`, for multiple features, e.g. `features=[0, 1, 2]`	`'all'`
`candidate_conditioning_features`	`Union[str, list]`	list of features to consider as conditioning features use `"all"`, for all features, e.g. `candidate_conditioning_features="all"` use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]` it means that for each feature in the `feature` list, the algorithm will consider applying a split conditioned on each feature in the `candidate_conditioning_features` list	`'all'`
`space_partitioner`	`Union[str, None]`	the method to use for partitioning the space	`'best'`
`points_for_centering`	`int`	number of equidistant points along the feature axis used for centering ICE plots	`30`
`points_for_mean_heterogeneity`	`int`	number of equidistant points along the feature axis used for computing the mean heterogeneity	`30`
`use_vectorized`	`bool`	whether to use vectorized operations for the PDP and ICE curves	`True`

Source code in effector/regional_effect_pdp.py

def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, None] = "best",
    points_for_centering: int = 30,
    points_for_mean_heterogeneity: int = 30,
    use_vectorized: bool = True,
):
    """
    Find subregions by minimizing the PDP-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features

            - use `"all"`, for all features, e.g. `candidate_conditioning_features="all"`
            - use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]`
            - it means that for each feature in the `feature` list, the algorithm will consider applying a split
            conditioned on each feature in the `candidate_conditioning_features` list

        space_partitioner: the method to use for partitioning the space
        points_for_centering: number of equidistant points along the feature axis used for centering ICE plots
        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves


    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # define the global method
        pdp = PDP(self.data, self.model, self.axis_limits, nof_instances="all")

        pdp.fit(
            features=feat,
            centering=True,
            points_for_centering=points_for_centering,
            use_vectorized=use_vectorized,
        )

        xx = np.linspace(self.axis_limits[:, feat][0], self.axis_limits[:, feat][1], points_for_mean_heterogeneity)
        y_ice = pdp.eval(
                feature=feat,
                xs=xx,
                heterogeneity=True,
                centering=True,
                use_vectorized=use_vectorized,
                return_all=True
            )
        self.y_ice["feature_" + str(feat)] = y_ice.T

        heter = self._create_heterogeneity_function(
            foi = feat,
            min_points=space_partitioner.min_points_per_subregion,
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}
    self.kwargs_subregion_detection["points_for_mean_heterogeneity"] = points_for_mean_heterogeneity

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["centering", "points_for_centering", "use_vectorized"]}

`plot(feature, node_idx, heterogeneity='ice', centering=False, nof_points=30, scale_x_list=None, scale_y=None, nof_ice=100, show_avg_output=False, y_limits=None, use_vectorized=True)`

Source code in effector/regional_effect_pdp.py

def plot(
    self,
    feature: int,
    node_idx: int,
    heterogeneity: bool = "ice",
    centering: typing.Union[bool, str] = False,
    nof_points: int = 30,
    scale_x_list: typing.Union[None, list] = None,
    scale_y: typing.Union[None, list] = None,
    nof_ice: int = 100,
    show_avg_output: bool = False,
    y_limits: typing.Union[None, list] = None,
    use_vectorized: bool = True,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

`effector.regional_effect_pdp.RegionalDerPDP(data, model, model_jac=None, nof_instances=10000, axis_limits=None, feature_types=None, cat_limit=10, feature_names=None, target_name=None)`

Bases: RegionalPDPBase

Initialize the Regional Effect method.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	the design matrix, `ndarray` of shape `(N,D)`	required
`model`	`callable`	the black-box model, `Callable` with signature `x -> y` where: `x`: `ndarray` of shape `(N, D)` `y`: `ndarray` of shape `(N)`	required
`model_jac`	`Optional[callable]`	the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where: `x`: `ndarray` of shape `(N, D)` `dy_dx`: `ndarray` of shape `(N, D)`	`None`
`axis_limits`	`Union[None, ndarray]`	Feature effect limits along each axis `None`, infers them from `data` (`min` and `max` of each feature) `array` of shape `(D, 2)`, manually specify the limits for each feature. When possible, specify the axis limits manually they help to discard outliers and improve the quality of the fit `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations Their shape is `(2, D)`, not `(D, 2)` `axis_limits = np.array([[0, 1, -1], [1, 2, 3]])`	`None`
`nof_instances`	`Union[int, str]`	Max instances to use `"all"`, uses all `data` `int`, randomly selects `int` instances from `data` `10_000` (default), is a good balance between speed and accuracy	`10000`
`feature_types`	`Union[list, None]`	The feature types. `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical. `['cat', 'cont', ...]`, manually specify the types of the features	`None`
`cat_limit`	`Union[int, None]`	The minimum number of unique values for a feature to be considered categorical if `feature_types` is manually specified, this parameter is ignored	`10`
`feature_names`	`Union[list, None]`	The names of the features `None`, defaults to: `["x_0", "x_1", ...]` `["age", "weight", ...]` to manually specify the names of the features	`None`
`target_name`	`Union[str, None]`	The name of the target variable `None`, to keep the default name: `"y"` `"price"`, to manually specify the name of the target variable	`None`

Methods:

Name	Description
`fit`	Find subregions by minimizing the PDP-based heterogeneity.
`plot`

Source code in effector/regional_effect_pdp.py

def __init__(
    self,
    data: np.ndarray,
    model: callable,
    model_jac: typing.Optional[callable] = None,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Union[None, np.ndarray] = None,
    feature_types: typing.Union[list, None] = None,
    cat_limit: typing.Union[int, None] = 10,
    feature_names: typing.Union[list, None] = None,
    target_name: typing.Union[str, None] = None,
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `x -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        model_jac: the black-box model's Jacobian, `Callable` with signature `x -> dy_dx` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `dy_dx`: `ndarray` of shape `(N, D)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`10_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable
    """

    super(RegionalDerPDP, self).__init__(
        "d-pdp",
        data,
        model,
        model_jac,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

`fit(features='all', candidate_conditioning_features='all', space_partitioner='best', points_for_mean_heterogeneity=30, use_vectorized=True)`

Find subregions by minimizing the PDP-based heterogeneity.

Parameters:

Name	Type	Description	Default
`features`	`Union[int, str, list]`	for which features to search for subregions use `"all"`, for all features, e.g. `features="all"` use an `int`, for a single feature, e.g. `features=0` use a `list`, for multiple features, e.g. `features=[0, 1, 2]`	`'all'`
`candidate_conditioning_features`	`Union[str, list]`	list of features to consider as conditioning features use `"all"`, for all features, e.g. `candidate_conditioning_features="all"` use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]` it means that for each feature in the `feature` list, the algorithm will consider applying a split conditioned on each feature in the `candidate_conditioning_features` list	`'all'`
`space_partitioner`	`Union[str, None]`	the method to use for partitioning the space	`'best'`
`points_for_mean_heterogeneity`	`int`	number of equidistant points along the feature axis used for computing the mean heterogeneity	`30`
`use_vectorized`	`bool`	whether to use vectorized operations for the PDP and ICE curves	`True`

Source code in effector/regional_effect_pdp.py

def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union[str, None] = "best",
    points_for_mean_heterogeneity: int = 30,
    use_vectorized: bool = True,
):
    """
    Find subregions by minimizing the PDP-based heterogeneity.

    Args:
        features: for which features to search for subregions

            - use `"all"`, for all features, e.g. `features="all"`
            - use an `int`, for a single feature, e.g. `features=0`
            - use a `list`, for multiple features, e.g. `features=[0, 1, 2]`

        candidate_conditioning_features: list of features to consider as conditioning features

            - use `"all"`, for all features, e.g. `candidate_conditioning_features="all"`
            - use a `list`, for multiple features, e.g. `candidate_conditioning_features=[0, 1, 2]`
            - it means that for each feature in the `feature` list, the algorithm will consider applying a split
            conditioned on each feature in the `candidate_conditioning_features` list

        space_partitioner: the method to use for partitioning the space
        points_for_mean_heterogeneity: number of equidistant points along the feature axis used for computing the mean heterogeneity
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves


    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)
    for feat in tqdm(features):
        # define the global method
        pdp = DerPDP(self.data, self.model, self.model_jac, self.axis_limits, nof_instances="all")

        pdp.fit(
            features=feat,
            centering=False,
            use_vectorized=use_vectorized,
        )

        xx = np.linspace(self.axis_limits[:, feat][0], self.axis_limits[:, feat][1], points_for_mean_heterogeneity)
        y_ice = pdp.eval(
                feature=feat,
                xs=xx,
                heterogeneity=True,
                centering=False,
                use_vectorized=use_vectorized,
                return_all=True
            )
        self.y_ice["feature_" + str(feat)] = y_ice.T

        heter = self._create_heterogeneity_function(
            foi = feat,
            min_points=space_partitioner.min_points_per_subregion,
        )

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 8 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}
    self.kwargs_subregion_detection["points_for_mean_heterogeneity"] = points_for_mean_heterogeneity

    # centering, points_for_centering, use_vectorized
    self.kwargs_fitting = {k:v for k,v in all_arguments.items() if k in ["centering", "points_for_centering", "use_vectorized"]}

`plot(feature, node_idx=0, heterogeneity='ice', centering=False, nof_points=30, scale_x_list=None, scale_y=None, nof_ice=100, show_avg_output=False, dy_limits=None, use_vectorized=True)`

Source code in effector/regional_effect_pdp.py

def plot(
    self,
    feature: int,
    node_idx: int = 0,
    heterogeneity: bool = "ice",
    centering: typing.Union[bool, str] = False,
    nof_points: int = 30,
    scale_x_list: typing.Union[None, list] = None,
    scale_y: typing.Union[None, list] = None,
    nof_ice: int = 100,
    show_avg_output: bool = False,
    dy_limits: typing.Union[None, list] = None,
    use_vectorized: bool = True,
):
    kwargs = locals()
    kwargs.pop("self")
    self._plot(kwargs)

`effector.regional_effect_shap.RegionalShapDP(data, model, axis_limits=None, nof_instances=1000, feature_types=None, cat_limit=10, feature_names=None, target_name=None, backend='shap')`

Bases: RegionalEffectBase

Initialize the Regional Effect method.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	the design matrix, `ndarray` of shape `(N,D)`	required
`model`	`Callable`	the black-box model, `Callable` with signature `f(x) -> y` where: `x`: `ndarray` of shape `(N, D)` `y`: `ndarray` of shape `(N)`	required
`axis_limits`	`Optional[ndarray]`	Feature effect limits along each axis `None`, infers them from `data` (`min` and `max` of each feature) `array` of shape `(D, 2)`, manually specify the limits for each feature. When possible, specify the axis limits manually they help to discard outliers and improve the quality of the fit `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations Their shape is `(2, D)`, not `(D, 2)` `axis_limits = np.array([[0, 1, -1], [1, 2, 3]])`	`None`
`nof_instances`	`Union[int, str]`	Max instances to use `"all"`, uses all `data` `int`, randomly selects `int` instances from `data` `1_000` (default), is a good balance between speed and accuracy	`1000`
`feature_types`	`Optional[List[str]]`	The feature types. `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical. `['cat', 'cont', ...]`, manually specify the types of the features	`None`
`cat_limit`	`Optional[int]`	The minimum number of unique values for a feature to be considered categorical if `feature_types` is manually specified, this parameter is ignored	`10`
`feature_names`	`Optional[List[str]]`	The names of the features `None`, defaults to: `["x_0", "x_1", ...]` `["age", "weight", ...]` to manually specify the names of the features	`None`
`target_name`	`Optional[str]`	The name of the target variable `None`, to keep the default name: `"y"` `"price"`, to manually specify the name of the target variable	`None`
`backend`	`str`	Package to compute SHAP values use `"shap"` for the `shap` package (default) use `"shapiq"` for the `shapiq` package	`'shap'`

Methods:

Name	Description
`fit`	Fit the regional SHAP.
`plot`	Plot the regional SHAP.

Source code in effector/regional_effect_shap.py

def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    axis_limits: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 1_000,
    feature_types: Optional[List[str]] = None,
    cat_limit: Optional[int] = 10,
    feature_names: Optional[List[str]] = None,
    target_name: Optional[str] = None,
    backend: str = "shap",
):
    """
    Initialize the Regional Effect method.

    Args:
        data: the design matrix, `ndarray` of shape `(N,D)`
        model: the black-box model, `Callable` with signature `f(x) -> y` where:

            - `x`: `ndarray` of shape `(N, D)`
            - `y`: `ndarray` of shape `(N)`

        axis_limits: Feature effect limits along each axis

            - `None`, infers them from `data` (`min` and `max` of each feature)
            - `array` of shape `(D, 2)`, manually specify the limits for each feature.

            !!! tip "When possible, specify the axis limits manually"

                - they help to discard outliers and improve the quality of the fit
                - `axis_limits` define the `.plot` method's x-axis limits; manual specification leads to better visualizations

            !!! tip "Their shape is `(2, D)`, not `(D, 2)`"

                ```python
                axis_limits = np.array([[0, 1, -1], [1, 2, 3]])
                ```

        nof_instances: Max instances to use

            - `"all"`, uses all `data`
            - `int`, randomly selects `int` instances from `data`

            !!! tip "`1_000` (default), is a good balance between speed and accuracy"

        feature_types: The feature types.

            - `None`, infers them from data; if the number of unique values is less than `cat_limit`, it is considered categorical.
            - `['cat', 'cont', ...]`, manually specify the types of the features

        cat_limit: The minimum number of unique values for a feature to be considered categorical

            - if `feature_types` is manually specified, this parameter is ignored

        feature_names: The names of the features

            - `None`, defaults to: `["x_0", "x_1", ...]`
            - `["age", "weight", ...]` to manually specify the names of the features

        target_name: The name of the target variable

            - `None`, to keep the default name: `"y"`
            - `"price"`, to manually specify the name of the target variable

        backend: Package to compute SHAP values

            - use `"shap"` for the `shap` package (default)
            - use `"shapiq"` for the `shapiq` package
    """
    self.global_shap_values = None
    self.backend = backend
    super(RegionalShapDP, self).__init__(
        "shap",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_types,
        cat_limit,
        feature_names,
        target_name,
    )

`fit(features, candidate_conditioning_features='all', space_partitioner='best', binning_method='greedy', budget=512, shap_explainer_kwargs=None, shap_explanation_kwargs=None)`

Fit the regional SHAP.

Parameters:

Name	Type	Description	Default
`features`	`Union[int, str, list]`	the features to fit. - If set to "all", all the features will be fitted.	required
`candidate_conditioning_features`	`Union[str, list]`	list of features to consider as conditioning features for the candidate splits - If set to "all", all the features will be considered as conditioning features.	`'all'`
`space_partitioner`	`Union[str, Best]`	the space partitioner to use - If set to "greedy", the greedy space partitioner will be used.	`'best'`
`binning_method`	`Union[str, Greedy, Fixed]`	the binning method to use	`'greedy'`
`budget`	`int`	Budget to use for the approximation. Defaults to 512. - Increasing the budget improves the approximation at the cost of slower computation. - Decrease the budget for faster computation at the cost of approximation error.	`512`
`shap_explainer_kwargs`	`Optional[dict]`	the keyword arguments to be passed to the `shap.Explainer` or `shapiq.Explainer` class, depending on the backend. Code behind the scene Check the code that is running behind the scene before customizing `shap_explainer_kwargs`. explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {} explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {} if self.backend == "shap": explainer_defaults = {"masker": data} explanation_defaults = {"max_evals": budget} elif self.backend == "shapiq": explainer_defaults = { "data": data, "index": "SV", "max_order": 1, "approximator": "permutation", "imputer": "marginal", } explanation_defaults = {"budget": budget} else: raise ValueError("`backend` should be either 'shap' or 'shapiq'") explainer_kwargs = {explainer_defaults, explainer_kwargs} # User args override defaults explanation_kwargs = {explanation_defaults, explanation_kwargs} # User args override defaults if self.backend == "shap": explainer = shap.Explainer(model, explainer_kwargs) explanation = explainer(data, explanation_kwargs) self.shap_values = explanation.values elif self.backend == "shapiq": explainer = shapiq.Explainer(model, explainer_kwargs) explanations = explainer.explain_X(data, explanation_kwargs) self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations]) else: raise ValueError("`backend` should be either 'shap' or 'shapiq'") Be careful with custom arguments For customizing `shap_explainer_kwargs` and `shap_explanation_kwargs` args, check the official documentation of `shap` and `shapiq` packages.	`None`
`shap_explanation_kwargs`	`Optional[dict]`	the keyword arguments to be passed to the `shap` or `shapiq` Explainer to compute the SHAP values. Code behind the scene Check the code that is running behind the scene before customizing `shap_explanation_kwargs`. explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {} explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {} if self.backend == "shap": explainer_defaults = {"masker": data} explanation_defaults = {"max_evals": budget} elif self.backend == "shapiq": explainer_defaults = { "data": data, "index": "SV", "max_order": 1, "approximator": "permutation", "imputer": "marginal", } explanation_defaults = {"budget": budget} else: raise ValueError("`backend` should be either 'shap' or 'shapiq'") explainer_kwargs = {explainer_defaults, explainer_kwargs} # User args override defaults explanation_kwargs = {explanation_defaults, explanation_kwargs} # User args override defaults if self.backend == "shap": explainer = shap.Explainer(model, explainer_kwargs) explanation = explainer(data, explanation_kwargs) self.shap_values = explanation.values elif self.backend == "shapiq": explainer = shapiq.Explainer(model, explainer_kwargs) explanations = explainer.explain_X(data, explanation_kwargs) self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations]) else: raise ValueError("`backend` should be either 'shap' or 'shapiq'") Be careful with custom arguments For customizing `shap_explainer_kwargs` and `shap_explanation_kwargs` args, check the official documentation of `shap` and `shapiq` packages.	`None`

Source code in effector/regional_effect_shap.py

def fit(
    self,
    features: typing.Union[int, str, list],
    candidate_conditioning_features: typing.Union["str", list] = "all",
    space_partitioner: typing.Union["str", effector.space_partitioning.Best] = "best",
    binning_method: Union[str, ap.Greedy, ap.Fixed] = "greedy",
    budget: int = 512,
    shap_explainer_kwargs: Optional[dict] = None,
    shap_explanation_kwargs: Optional[dict] = None,
):
    """
    Fit the regional SHAP.

    Args:
        features: the features to fit.
            - If set to "all", all the features will be fitted.

        candidate_conditioning_features: list of features to consider as conditioning features for the candidate splits
            - If set to "all", all the features will be considered as conditioning features.

        space_partitioner: the space partitioner to use
            - If set to "greedy", the greedy space partitioner will be used.

        binning_method: the binning method to use

        budget: Budget to use for the approximation. Defaults to 512.
            - Increasing the budget improves the approximation at the cost of slower computation.
            - Decrease the budget for faster computation at the cost of approximation error.

        shap_explainer_kwargs: the keyword arguments to be passed to the `shap.Explainer` or `shapiq.Explainer` class, depending on the backend.

            ??? note "Code behind the scene"
                Check the code that is running behind the scene before customizing `shap_explainer_kwargs`.

                ```python
                explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {}
                explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {}
                if self.backend == "shap":
                    explainer_defaults = {"masker": data}
                    explanation_defaults = {"max_evals": budget}
                elif self.backend == "shapiq":
                    explainer_defaults = {
                        "data": data,
                        "index": "SV",
                        "max_order": 1,
                        "approximator": "permutation",
                        "imputer": "marginal",
                    }
                    explanation_defaults = {"budget": budget}
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                explainer_kwargs = {**explainer_defaults, **explainer_kwargs}  # User args override defaults
                explanation_kwargs = {**explanation_defaults, **explanation_kwargs}  # User args override defaults

                if self.backend == "shap":
                    explainer = shap.Explainer(model, **explainer_kwargs)
                    explanation = explainer(data, **explanation_kwargs)
                    self.shap_values = explanation.values
                elif self.backend == "shapiq":
                    explainer = shapiq.Explainer(model, **explainer_kwargs)
                    explanations = explainer.explain_X(data, **explanation_kwargs)
                    self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations])
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                ```

            ??? warning "Be careful with custom arguments"

                For customizing `shap_explainer_kwargs` and `shap_explanation_kwargs` args,
                check the official documentation of [`shap`](https://shap.readthedocs.io/en/latest/) and [`shapiq`](https://shapiq.readthedocs.io/en/latest/) packages.

        shap_explanation_kwargs: the keyword arguments to be passed to the `shap` or `shapiq` Explainer to compute the SHAP values.

            ??? note "Code behind the scene"

                Check the code that is running behind the scene before customizing `shap_explanation_kwargs`.

                ```python
                explainer_kwargs = explainer_kwargs.copy() if explainer_kwargs else {}
                explanation_kwargs = explanation_kwargs.copy() if explanation_kwargs else {}
                if self.backend == "shap":
                    explainer_defaults = {"masker": data}
                    explanation_defaults = {"max_evals": budget}
                elif self.backend == "shapiq":
                    explainer_defaults = {
                        "data": data,
                        "index": "SV",
                        "max_order": 1,
                        "approximator": "permutation",
                        "imputer": "marginal",
                    }
                    explanation_defaults = {"budget": budget}
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                explainer_kwargs = {**explainer_defaults, **explainer_kwargs}  # User args override defaults
                explanation_kwargs = {**explanation_defaults, **explanation_kwargs}  # User args override defaults

                if self.backend == "shap":
                    explainer = shap.Explainer(model, **explainer_kwargs)
                    explanation = explainer(data, **explanation_kwargs)
                    self.shap_values = explanation.values
                elif self.backend == "shapiq":
                    explainer = shapiq.Explainer(model, **explainer_kwargs)
                    explanations = explainer.explain_X(data, **explanation_kwargs)
                    self.shap_values = np.stack([ex.get_n_order_values(1) for ex in explanations])
                else:
                    raise ValueError("`backend` should be either 'shap' or 'shapiq'")
                ```

            ??? warning "Be careful with custom arguments"

                For customizing `shap_explainer_kwargs` and `shap_explanation_kwargs` args,
                check the official documentation of [`shap`](https://shap.readthedocs.io/en/latest/) and [`shapiq`](https://shapiq.readthedocs.io/en/latest/) packages.

    """

    if isinstance(space_partitioner, str):
        space_partitioner = effector.space_partitioning.return_default(space_partitioner)

    assert space_partitioner.min_points_per_subregion >= 2, "min_points_per_subregion must be >= 2"
    features = helpers.prep_features(features, self.dim)

    for feat in tqdm(features):
        # assert global SHAP values are available
        if self.global_shap_values is None:
            global_shap_dp = effector.ShapDP(self.data, self.model, self.axis_limits, "all", backend=self.backend)
            global_shap_dp.fit(
                feat,
                centering=False,
                binning_method=binning_method,
                budget=budget,
                shap_explainer_kwargs=shap_explainer_kwargs,
                shap_explanation_kwargs=shap_explanation_kwargs
            )
            self.global_shap_values = global_shap_dp.shap_values

        heter = self._create_heterogeneity_function(feat, space_partitioner.min_points_per_subregion, binning_method)

        self._fit_feature(
            feat,
            heter,
            space_partitioner,
            candidate_conditioning_features,
        )

    all_arguments = locals()
    all_arguments.pop("self")

    # region splitting arguments are the first 3 arguments
    self.kwargs_subregion_detection = {k: all_arguments[k] for k in list(all_arguments.keys())[:3]}

    # fit kwargs
    self.kwargs_fitting = {
        "binning_method": binning_method,
        "budget": budget,
        "shap_explainer_kwargs": shap_explainer_kwargs,
        "shap_explanation_kwargs": shap_explanation_kwargs
    }

`plot(feature, node_idx, heterogeneity='shap_values', centering=True, nof_points=30, scale_x_list=None, scale_y=None, nof_shap_values='all', show_avg_output=False, y_limits=None, only_shap_values=False)`

Plot the regional SHAP.

Parameters:

Name	Description	Default
`feature`	the feature to plot	required
`node_idx`	the index of the node to plot	required
`heterogeneity`	whether to plot the heterogeneity	`'shap_values'`
`centering`	whether to center the SHAP values	`True`
`nof_points`	number of points to plot	`30`
`scale_x_list`	the list of scaling factors for the feature names	`None`
`scale_y`	the scaling factor for the SHAP values	`None`
`nof_shap_values`	number of SHAP values to plot	`'all'`
`show_avg_output`	whether to show the average output	`False`
`y_limits`	the limits of the y-axis	`None`
`only_shap_values`	whether to plot only the SHAP values	`False`

Source code in effector/regional_effect_shap.py

def plot(self,
         feature,
         node_idx,
         heterogeneity="shap_values",
         centering=True,
         nof_points=30,
         scale_x_list=None,
         scale_y=None,
         nof_shap_values='all',
         show_avg_output=False,
         y_limits=None,
         only_shap_values=False
):
    """
    Plot the regional SHAP.

    Args:
        feature: the feature to plot
        node_idx: the index of the node to plot
        heterogeneity: whether to plot the heterogeneity
        centering: whether to center the SHAP values
        nof_points: number of points to plot
        scale_x_list: the list of scaling factors for the feature names
        scale_y: the scaling factor for the SHAP values
        nof_shap_values: number of SHAP values to plot
        show_avg_output: whether to show the average output
        y_limits: the limits of the y-axis
        only_shap_values: whether to plot only the SHAP values
    """
    kwargs = locals()
    kwargs.pop("self")
    return self._plot(kwargs)

`node_idx=1`: \(x_1\) when \(x_2 \leq 0\)	`node_idx=2`: \(x_1\) when \(x_2 > 0\)
`r_method.plot(0, 1)`	`r_method.plot(0, 2)`

`node_idx=1`: \(x_1\) when \(x_2 \leq 0\)	`node_idx=2`: \(x_1\) when \(x_2 > 0\)
`r_method.plot(0, 1)`	`r_method.plot(0, 2)`

`node_idx=1`: \(x_1\) when \(x_2 \leq 0\)	`node_idx=2`: \(x_1\) when \(x_2 > 0\)
`r_method.plot(0, 1)`	`r_method.plot(0, 2)`

`node_idx=1`: \(x_1\) when \(x_2 \leq 0\)	`node_idx=2`: \(x_1\) when \(x_2 > 0\)
`r_method.plot(0, 1)`	`r_method.plot(0, 2)`