Skip to content

Api global

effector.global_effect_ale.ALE(data, model, nof_instances=10000, axis_limits=None, feature_names=None, target_name=None)

Bases: ALEBase

Constructor for the ALE plot.

Definition

ALE is defined as: $$ \hat{f}^{ALE}(x_s) = TODO $$

The heterogeneity is: $$ TODO $$

The std of the bin-effects is: $$ TODO $$

Notes
  • The required parameters are data and model. The rest are optional.

Parameters:

Name Type Description Default
data ndarray

the design matrix

  • shape: (N,D)
required
model callable

the black-box model. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N, )
required
nof_instances Union[int, str]

the number of instances to use for the explanation

  • use an int, to specify the number of instances
  • use "all", to use all the instances
10000
axis_limits Optional[ndarray]

The limits of the feature effect plot along each axis

  • use a ndarray of shape (2, D), to specify them manually
  • use None, to be inferred from the data
None
feature_names Optional[List]

The names of the features

  • use a list of str, to specify the name manually. For example: ["age", "weight", ...]
  • use None, to keep the default names: ["x_0", "x_1", ...]
None
target_name Optional[str]

The name of the target variable

  • use a str, to specify it name manually. For example: "price"
  • use None, to keep the default name: "y"
None

Methods:

Name Description
fit

Fit the ALE plot.

eval

Evalueate the (RH)ALE feature effect of feature feature at points xs.

plot

Plot the (RH)ALE feature effect of feature feature.

Source code in effector/global_effect_ale.py
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    nof_instances: Union[int, str] = 10_000,
    axis_limits: Optional[np.ndarray] = None,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
):
    """
    Constructor for the ALE plot.

    Definition:
        ALE is defined as:
        $$
        \hat{f}^{ALE}(x_s) = TODO
        $$

        The heterogeneity is:
        $$
        TODO
        $$

        The std of the bin-effects is:
        $$
        TODO
        $$

    Notes:
        - The required parameters are `data` and `model`. The rest are optional.

    Args:
        data: the design matrix

            - shape: `(N,D)`
        model: the black-box model. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N, )`

        nof_instances: the number of instances to use for the explanation

            - use an `int`, to specify the number of instances
            - use `"all"`, to use all the instances

        axis_limits: The limits of the feature effect plot along each axis

            - use a `ndarray` of shape `(2, D)`, to specify them manually
            - use `None`, to be inferred from the data

        feature_names: The names of the features

            - use a `list` of `str`, to specify the name manually. For example: `                  ["age", "weight", ...]`
            - use `None`, to keep the default names: `["x_0", "x_1", ...]`

        target_name: The name of the target variable

            - use a `str`, to specify it name manually. For example: `"price"`
            - use `None`, to keep the default name: `"y"`
    """
    self.bin_limits = {}
    self.data_effect_ale = {}
    super(ALE, self).__init__(
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_names,
        target_name,
        "ALE",
    )

fit(features='all', binning_method='fixed', centering=True, points_for_centering=30)

Fit the ALE plot.

Parameters:

Name Type Description Default
features Union[int, str, list]

the features to fit. If set to "all", all the features will be fitted.

'all'
binning_method Union[str, Fixed]
  • If set to "fixed", the ALE plot will be computed with the default values, which are 20 bins with at least 10 points per bin and the feature is considered as categorical if it has less than 15 unique values.
  • If you want to change the parameters of the method, you pass an instance of the class effector.binning_methods.Fixed with the desired parameters. For example: Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)
'fixed'
centering Union[bool, str]

whether to compute the normalization constant for centering the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
True
points_for_centering int

the number of points to use for centering the plot. Default is 100.

30
Source code in effector/global_effect_ale.py
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    binning_method: typing.Union[str, ap.Fixed] = "fixed",
    centering: typing.Union[bool, str] = True,
    points_for_centering: int = 30
) -> None:
    """Fit the ALE plot.

    Args:
        features: the features to fit. If set to "all", all the features will be fitted.

        binning_method:

            - If set to `"fixed"`, the ALE plot will be computed with the  default values, which are
            `20` bins with at least `10` points per bin and the feature is considered as categorical if it has
            less than `15` unique values.
            - If you want to change the parameters of the method, you pass an instance of the
            class `effector.binning_methods.Fixed` with the desired parameters.
            For example: `Fixed(nof_bins=20, min_points_per_bin=0, cat_limit=10)`

        centering: whether to compute the normalization constant for centering the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        points_for_centering: the number of points to use for centering the plot. Default is 100.
    """
    assert binning_method == "fixed" or isinstance(
        binning_method, ap.Fixed
    ), "ALE can work only with the fixed binning method!"

    self._fit_loop(features, binning_method, centering, points_for_centering)

eval(feature, xs, heterogeneity=False, centering=True, **kwargs)

Evalueate the (RH)ALE feature effect of feature feature at points xs.

Notes

This is a common method inherited by both ALE and RHALE.

Parameters:

Name Type Description Default
feature int

index of feature of interest

required
xs ndarray

the points along the s-th axis to evaluate the FE plot - np.ndarray of shape (T, )

required
heterogeneity bool

whether to return heterogeneity:

  • False, returns the mean effect y at the given xs
  • True, returns a tuple (y, H) of two ndarrays; y is the mean effect and H is the heterogeneity evaluated at xs
False
centering Union[bool, str]

whether to center the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
True

Returns: the mean effect y, if heterogeneity=False (default) or a tuple (y, std) otherwise

Source code in effector/global_effect_ale.py
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def eval(
    self,
    feature: int,
    xs: np.ndarray,
    heterogeneity: bool = False,
    centering: typing.Union[bool, str] = True,
    **kwargs
) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]:
    """Evalueate the (RH)ALE feature effect of feature `feature` at points `xs`.

    Notes:
        This is a common method inherited by both ALE and RHALE.

    Args:
        feature: index of feature of interest
        xs: the points along the s-th axis to evaluate the FE plot
          - `np.ndarray` of shape `(T, )`
        heterogeneity: whether to return heterogeneity:

              - `False`, returns the mean effect `y` at the given `xs`
              - `True`, returns a tuple `(y, H)` of two `ndarrays`; `y` is the mean effect and `H` is the
              heterogeneity evaluated at `xs`

        centering: whether to center the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.
    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    centering = helpers.prep_centering(centering)

    if self.requires_refit(feature, centering):
        self.fit(features=feature, centering=centering)

    # Check if the lower bound is less than the upper bound
    assert self.axis_limits[0, feature] < self.axis_limits[1, feature]

    # Evaluate the feature
    yy = self._eval_unnorm(feature, xs, heterogeneity=heterogeneity)
    y, std = yy if heterogeneity else (yy, None)

    # Center if asked
    y = (
        y - self.feature_effect["feature_" + str(feature)]["norm_const"]
        if centering
        else y
    )

    return (y, std) if heterogeneity is not False else y

plot(feature, heterogeneity=True, centering=True, scale_x=None, scale_y=None, show_avg_output=False, y_limits=None, dy_limits=None, show_only_aggregated=False)

Plot the (RH)ALE feature effect of feature feature.

Notes

This is a common method inherited by both ALE and RHALE.

Parameters:

Name Type Description Default
feature int

the feature to plot

required
heterogeneity bool

whether to plot the heterogeneity

  • False, plots only the mean effect
  • True, the std of the bin-effects will be plotted using a red vertical bar
True
centering Union[bool, str]

whether to center the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
True
scale_x Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the x-axis will be scaled by the standard deviation and the mean.
None
scale_y Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the y-axis will be scaled by the standard deviation and the mean.
None
show_avg_output bool

if True, the average output will be shown as a horizontal line.

False
y_limits Optional[List]

None or tuple, the limits of the y-axis

  • If set to None, the limits of the y-axis are set automatically
  • If set to a tuple, the limits are manually set
None
dy_limits Optional[List]

None or tuple, the limits of the dy-axis

  • If set to None, the limits of the dy-axis are set automatically
  • If set to a tuple, the limits are manually set
None
Source code in effector/global_effect_ale.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def plot(
    self,
    feature: int,
    heterogeneity: bool = True,
    centering: Union[bool, str] = True,
    scale_x: Optional[dict] = None,
    scale_y: Optional[dict] = None,
    show_avg_output: bool = False,
    y_limits: Optional[List] = None,
    dy_limits: Optional[List] = None,
    show_only_aggregated: bool = False,
):
    """
    Plot the (RH)ALE feature effect of feature `feature`.

    Notes:
        This is a common method inherited by both ALE and RHALE.

    Parameters:
        feature: the feature to plot
        heterogeneity: whether to plot the heterogeneity

              - `False`, plots only the mean effect
              - `True`, the std of the bin-effects will be plotted using a red vertical bar

        centering: whether to center the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        scale_x: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the x-axis will be scaled by the standard deviation and the mean.
        scale_y: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the y-axis will be scaled by the standard deviation and the mean.
        show_avg_output: if True, the average output will be shown as a horizontal line.
        y_limits: None or tuple, the limits of the y-axis

            - If set to None, the limits of the y-axis are set automatically
            - If set to a tuple, the limits are manually set

        dy_limits: None or tuple, the limits of the dy-axis

            - If set to None, the limits of the dy-axis are set automatically
            - If set to a tuple, the limits are manually set
    """
    heterogeneity = helpers.prep_confidence_interval(heterogeneity)
    centering = helpers.prep_centering(centering)

    # hack to fit the feature if not fitted
    self.eval(
        feature, np.array([self.axis_limits[0, feature]]), centering=centering
    )

    if show_avg_output:
        avg_output = helpers.prep_avg_output(
            self.data, self.model, self.avg_output, scale_y
        )
    else:
        avg_output = None

    vis.ale_plot(
        self.feature_effect["feature_" + str(feature)],
        self.eval,
        feature,
        centering=centering,
        error=heterogeneity,
        scale_x=scale_x,
        scale_y=scale_y,
        title=self.method_name.upper(),
        avg_output=avg_output,
        feature_names=self.feature_names,
        target_name=self.target_name,
        y_limits=y_limits,
        dy_limits=dy_limits,
        show_only_aggregated=show_only_aggregated,
    )

effector.global_effect_ale.RHALE(data, model, model_jac=None, nof_instances=10000, axis_limits=None, data_effect=None, feature_names=None, target_name=None)

Bases: ALEBase

Constructor for RHALE.

Definition

RHALE is defined as: $$ \hat{f}^{RHALE}(x_s) = TODO $$

The heterogeneity is: $$ TODO $$

The std of the bin-effects is: $$ TODO $$

Notes

The required parameters are data and model. The rest are optional.

Parameters:

Name Type Description Default
data ndarray

the design matrix

  • shape: (N,D)
required
model callable

the black-box model. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N, )
required
model_jac Union[None, callable]

the Jacobian of the model. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N, D)
None
nof_instances Union[int, str]

the number of instances to use for the explanation

  • use an int, to specify the number of instances
  • use "all", to use all the instances
10000
axis_limits Optional[ndarray]

The limits of the feature effect plot along each axis

  • use a ndarray of shape (2, D), to specify them manually
  • use None, to be inferred from the data
None
data_effect Optional[ndarray]
  • if np.ndarray, the model Jacobian computed on the data
  • if None, the Jacobian will be computed using model_jac
None
feature_names Optional[list]

The names of the features

  • use a list of str, to specify the name manually. For example: ["age", "weight", ...]
  • use None, to keep the default names: ["x_0", "x_1", ...]
None
target_name Optional[str]

The name of the target variable

  • use a str, to specify it name manually. For example: "price"
  • use None, to keep the default name: "y"
None

Methods:

Name Description
fit

Fit the model.

eval

Evalueate the (RH)ALE feature effect of feature feature at points xs.

plot

Plot the (RH)ALE feature effect of feature feature.

Source code in effector/global_effect_ale.py
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
def __init__(
    self,
    data: np.ndarray,
    model: callable,
    model_jac: typing.Union[None, callable] = None,
    nof_instances: typing.Union[int, str] = 10_000,
    axis_limits: typing.Optional[np.ndarray] = None,
    data_effect: typing.Optional[np.ndarray] = None,
    feature_names: typing.Optional[list] = None,
    target_name: typing.Optional[str] = None,
):
    """
    Constructor for RHALE.

    Definition:
        RHALE is defined as:
        $$
        \hat{f}^{RHALE}(x_s) = TODO
        $$

        The heterogeneity is:
        $$
        TODO
        $$

        The std of the bin-effects is:
        $$
        TODO
        $$

    Notes:
        The required parameters are `data` and `model`. The rest are optional.

    Args:
        data: the design matrix

            - shape: `(N,D)`
        model: the black-box model. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N, )`

        model_jac: the Jacobian of the model. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N, D)`

        nof_instances: the number of instances to use for the explanation

            - use an `int`, to specify the number of instances
            - use `"all"`, to use all the instances

        axis_limits: The limits of the feature effect plot along each axis

            - use a `ndarray` of shape `(2, D)`, to specify them manually
            - use `None`, to be inferred from the data

        data_effect:
            - if np.ndarray, the model Jacobian computed on the `data`
            - if None, the Jacobian will be computed using model_jac

        feature_names: The names of the features

            - use a `list` of `str`, to specify the name manually. For example: `["age", "weight", ...]`
            - use `None`, to keep the default names: `["x_0", "x_1", ...]`

        target_name: The name of the target variable

            - use a `str`, to specify it name manually. For example: `"price"`
            - use `None`, to keep the default name: `"y"`
    """
    super(RHALE, self).__init__(
        data,
        model,
        model_jac,
        data_effect,
        nof_instances,
        axis_limits,
        feature_names,
        target_name,
        "RHALE",
    )

fit(features='all', binning_method='greedy', centering=True, points_for_centering=30)

Fit the model.

Parameters:

Name Type Description Default
features (int, str, list)

the features to fit.

  • If set to "all", all the features will be fitted.
'all'
binning_method str

the binning method to use.

  • Use "greedy" for using the Greedy binning solution with the default parameters. For custom parameters initialize a binning_methods.Greedy object
  • Use "dp" for using a Dynamic Programming binning solution with the default parameters. For custom parameters initialize a binning_methods.DynamicProgramming object
  • Use "fixed" for using a Fixed binning solution with the default parameters. For custom parameters initialize a binning_methods.Fixed object
'greedy'
centering Union[bool, str]

whether to compute the normalization constant for centering the plot:

  • False means no centering
  • True or zero_integral centers around the y axis
  • zero_start starts the plot from y=0
True
points_for_centering int

the number of points to use for centering the plot. Default is 100.

30
Source code in effector/global_effect_ale.py
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
def fit(
    self,
    features: typing.Union[int, str, list] = "all",
    binning_method: typing.Union[
        str, ap.DynamicProgramming, ap.Greedy, ap.Fixed
    ] = "greedy",
    centering: typing.Union[bool, str] = True,
    points_for_centering: int = 30
) -> None:
    """Fit the model.

    Args:
        features (int, str, list): the features to fit.

            - If set to "all", all the features will be fitted.

        binning_method (str): the binning method to use.

            - Use `"greedy"` for using the Greedy binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Greedy` object
            - Use `"dp"` for using a Dynamic Programming binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.DynamicProgramming` object
            - Use `"fixed"` for using a Fixed binning solution with the default parameters.
              For custom parameters initialize a `binning_methods.Fixed` object

        centering: whether to compute the normalization constant for centering the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis
            - `zero_start` starts the plot from `y=0`

        points_for_centering: the number of points to use for centering the plot. Default is 100.
    """
    assert (
        binning_method in ["greedy", "dynamic", "fixed"]
        or isinstance(binning_method, ap.Greedy)
        or isinstance(binning_method, ap.DynamicProgramming)
        or isinstance(binning_method, ap.Fixed)
    ), "Unknown binning method!"

    self._fit_loop(features, binning_method, centering, points_for_centering)

eval(feature, xs, heterogeneity=False, centering=True, **kwargs)

Evalueate the (RH)ALE feature effect of feature feature at points xs.

Notes

This is a common method inherited by both ALE and RHALE.

Parameters:

Name Type Description Default
feature int

index of feature of interest

required
xs ndarray

the points along the s-th axis to evaluate the FE plot - np.ndarray of shape (T, )

required
heterogeneity bool

whether to return heterogeneity:

  • False, returns the mean effect y at the given xs
  • True, returns a tuple (y, H) of two ndarrays; y is the mean effect and H is the heterogeneity evaluated at xs
False
centering Union[bool, str]

whether to center the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
True

Returns: the mean effect y, if heterogeneity=False (default) or a tuple (y, std) otherwise

Source code in effector/global_effect_ale.py
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def eval(
    self,
    feature: int,
    xs: np.ndarray,
    heterogeneity: bool = False,
    centering: typing.Union[bool, str] = True,
    **kwargs
) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]:
    """Evalueate the (RH)ALE feature effect of feature `feature` at points `xs`.

    Notes:
        This is a common method inherited by both ALE and RHALE.

    Args:
        feature: index of feature of interest
        xs: the points along the s-th axis to evaluate the FE plot
          - `np.ndarray` of shape `(T, )`
        heterogeneity: whether to return heterogeneity:

              - `False`, returns the mean effect `y` at the given `xs`
              - `True`, returns a tuple `(y, H)` of two `ndarrays`; `y` is the mean effect and `H` is the
              heterogeneity evaluated at `xs`

        centering: whether to center the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.
    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    centering = helpers.prep_centering(centering)

    if self.requires_refit(feature, centering):
        self.fit(features=feature, centering=centering)

    # Check if the lower bound is less than the upper bound
    assert self.axis_limits[0, feature] < self.axis_limits[1, feature]

    # Evaluate the feature
    yy = self._eval_unnorm(feature, xs, heterogeneity=heterogeneity)
    y, std = yy if heterogeneity else (yy, None)

    # Center if asked
    y = (
        y - self.feature_effect["feature_" + str(feature)]["norm_const"]
        if centering
        else y
    )

    return (y, std) if heterogeneity is not False else y

plot(feature, heterogeneity=True, centering=True, scale_x=None, scale_y=None, show_avg_output=False, y_limits=None, dy_limits=None, show_only_aggregated=False)

Plot the (RH)ALE feature effect of feature feature.

Notes

This is a common method inherited by both ALE and RHALE.

Parameters:

Name Type Description Default
feature int

the feature to plot

required
heterogeneity bool

whether to plot the heterogeneity

  • False, plots only the mean effect
  • True, the std of the bin-effects will be plotted using a red vertical bar
True
centering Union[bool, str]

whether to center the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
True
scale_x Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the x-axis will be scaled by the standard deviation and the mean.
None
scale_y Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the y-axis will be scaled by the standard deviation and the mean.
None
show_avg_output bool

if True, the average output will be shown as a horizontal line.

False
y_limits Optional[List]

None or tuple, the limits of the y-axis

  • If set to None, the limits of the y-axis are set automatically
  • If set to a tuple, the limits are manually set
None
dy_limits Optional[List]

None or tuple, the limits of the dy-axis

  • If set to None, the limits of the dy-axis are set automatically
  • If set to a tuple, the limits are manually set
None
Source code in effector/global_effect_ale.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def plot(
    self,
    feature: int,
    heterogeneity: bool = True,
    centering: Union[bool, str] = True,
    scale_x: Optional[dict] = None,
    scale_y: Optional[dict] = None,
    show_avg_output: bool = False,
    y_limits: Optional[List] = None,
    dy_limits: Optional[List] = None,
    show_only_aggregated: bool = False,
):
    """
    Plot the (RH)ALE feature effect of feature `feature`.

    Notes:
        This is a common method inherited by both ALE and RHALE.

    Parameters:
        feature: the feature to plot
        heterogeneity: whether to plot the heterogeneity

              - `False`, plots only the mean effect
              - `True`, the std of the bin-effects will be plotted using a red vertical bar

        centering: whether to center the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        scale_x: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the x-axis will be scaled by the standard deviation and the mean.
        scale_y: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the y-axis will be scaled by the standard deviation and the mean.
        show_avg_output: if True, the average output will be shown as a horizontal line.
        y_limits: None or tuple, the limits of the y-axis

            - If set to None, the limits of the y-axis are set automatically
            - If set to a tuple, the limits are manually set

        dy_limits: None or tuple, the limits of the dy-axis

            - If set to None, the limits of the dy-axis are set automatically
            - If set to a tuple, the limits are manually set
    """
    heterogeneity = helpers.prep_confidence_interval(heterogeneity)
    centering = helpers.prep_centering(centering)

    # hack to fit the feature if not fitted
    self.eval(
        feature, np.array([self.axis_limits[0, feature]]), centering=centering
    )

    if show_avg_output:
        avg_output = helpers.prep_avg_output(
            self.data, self.model, self.avg_output, scale_y
        )
    else:
        avg_output = None

    vis.ale_plot(
        self.feature_effect["feature_" + str(feature)],
        self.eval,
        feature,
        centering=centering,
        error=heterogeneity,
        scale_x=scale_x,
        scale_y=scale_y,
        title=self.method_name.upper(),
        avg_output=avg_output,
        feature_names=self.feature_names,
        target_name=self.target_name,
        y_limits=y_limits,
        dy_limits=dy_limits,
        show_only_aggregated=show_only_aggregated,
    )

effector.global_effect_pdp.PDP(data, model, axis_limits=None, nof_instances=10000, feature_names=None, target_name=None)

Bases: PDPBase

Constructor of the PDP class.

Definition

PDP: $$ PDP(x_s) = {1 \over N} \sum_{i=1}^N f(x_s, \mathbf{x}_c^i) $$

centered-PDP: $$ PDP_c(x_s) = PDP(x_s) - c, \quad c = {1 \over M} \sum_{j=1}^M PDP(x_s^j) $$

ICE: $$ ICE^i(x_s) = f(x_s, \mathbf{x}_c^i), \quad i=1, \dots, N $$

centered-ICE: $$ ICE_c^i(x_s) = ICE^i(x_s) - c_i, \quad c_i = {1 \over M} \sum_{j=1}^M ICE^i(x_s^j) $$

heterogeneity function: $$ h(x_s) = {1 \over N} \sum_{i=1}^N ( ICE_c^i(x_s) - PDP_c(x_s) )^2 $$

The heterogeneity value is: $$ \mathcal{H}(x_s) = {1 \over M} \sum_{j=1}^M h(x_s^j), $$ where \(x_s^j\) are an equally spaced grid of points in \([x_s^{\min}, x_s^{\max}]\).

Notes

The required parameters are data and model. The rest are optional.

Parameters:

Name Type Description Default
data ndarray

the design matrix

  • shape: (N,D)
required
model Callable

the black-box model. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N,)
required
axis_limits Optional[ndarray]

The limits of the feature effect plot along each axis

  • use a ndarray of shape (2, D), to specify them manually
  • use None, to be inferred from the data
None
nof_instances Union[int, str]

maximum number of instances to be used

  • use "all", for using all instances.
  • use an int, for selecting nof_instances instances randomly.
10000
feature_names Optional[List]

The names of the features

  • use a list of str, to specify the name manually. For example: ["age", "weight", ...]
  • use None, to keep the default names: ["x_0", "x_1", ...]
None
target_name Optional[str]

The name of the target variable

  • use a str, to specify it name manually. For example: "price"
  • use None, to keep the default name: "y"
None

Methods:

Name Description
fit

Fit the Feature effect to the data.

eval

Evaluate the effect of the s-th feature at positions xs.

plot

Plot the feature effect.

Source code in effector/global_effect_pdp.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    axis_limits: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 10_000,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
):
    """
    Constructor of the PDP class.

    Definition:
        PDP:
        $$
        PDP(x_s) = {1 \over N} \sum_{i=1}^N f(x_s, \mathbf{x}_c^i)
        $$

        centered-PDP:
        $$
        PDP_c(x_s) = PDP(x_s) - c, \quad c = {1 \over M} \sum_{j=1}^M PDP(x_s^j)
        $$

        ICE:
        $$
        ICE^i(x_s) = f(x_s, \mathbf{x}_c^i), \quad i=1, \dots, N
        $$

        centered-ICE:
        $$
        ICE_c^i(x_s) = ICE^i(x_s) - c_i, \quad c_i = {1 \over M} \sum_{j=1}^M ICE^i(x_s^j)
        $$

        heterogeneity function:
        $$
        h(x_s) = {1 \over N} \sum_{i=1}^N ( ICE_c^i(x_s) - PDP_c(x_s) )^2
        $$

        The heterogeneity value is:
        $$
        \mathcal{H}(x_s) = {1 \over M} \sum_{j=1}^M h(x_s^j),
        $$
        where $x_s^j$ are an equally spaced grid of points in $[x_s^{\min}, x_s^{\max}]$.

    Notes:
        The required parameters are `data` and `model`. The rest are optional.

    Args:
        data: the design matrix

            - shape: `(N,D)`
        model: the black-box model. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N,)`

        axis_limits: The limits of the feature effect plot along each axis

            - use a `ndarray` of shape `(2, D)`, to specify them manually
            - use `None`, to be inferred from the data

        nof_instances: maximum number of instances to be used

            - use "all", for using all instances.
            - use an `int`, for selecting `nof_instances` instances randomly.

        feature_names: The names of the features

            - use a `list` of `str`, to specify the name manually. For example: `["age", "weight", ...]`
            - use `None`, to keep the default names: `["x_0", "x_1", ...]`

        target_name: The name of the target variable

            - use a `str`, to specify it name manually. For example: `"price"`
            - use `None`, to keep the default name: `"y"`
    """

    super(PDP, self).__init__(
        data,
        model,
        None,
        axis_limits,
        nof_instances,
        feature_names,
        target_name,
        method_name="PDP",
    )

fit(features='all', centering=False, points_for_centering=30, use_vectorized=True)

Fit the Feature effect to the data.

Notes

You can use .eval or .plot without calling .fit explicitly. The only thing that .fit does is to compute the normalization constant for centering the PDP and ICE plots. This will be automatically done when calling eval or plot, so there is no need to call fit explicitly.

Parameters:

Name Type Description Default
features Union[int, str, list]

the features to fit. - If set to "all", all the features will be fitted.

'all'
centering Union[bool, str]

whether to center the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
False
points_for_centering int

number of linspaced points along the feature axis used for centering.

30
use_vectorized bool

whether to use vectorized operations for the PDP and ICE curves

True
Source code in effector/global_effect_pdp.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def fit(
    self,
    features: Union[int, str, list] = "all",
    centering: Union[bool, str] = False,
    points_for_centering: int = 30,
    use_vectorized: bool = True,
):
    """
    Fit the Feature effect to the data.

    Notes:
        You can use `.eval` or `.plot` without calling `.fit` explicitly.
        The only thing that `.fit` does is to compute the normalization constant for centering the PDP and ICE plots.
        This will be automatically done when calling `eval` or `plot`, so there is no need to call `fit` explicitly.

    Args:
        features: the features to fit.
            - If set to "all", all the features will be fitted.

        centering: whether to center the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        points_for_centering: number of linspaced points along the feature axis used for centering.
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves

    """
    centering = helpers.prep_centering(centering)
    features = helpers.prep_features(features, self.dim)

    for s in features:
        self.feature_effect["feature_" + str(s)] = self._fit_feature(
            s, centering, points_for_centering, use_vectorized
        )
        self.is_fitted[s] = True
        self.fit_args["feature_" + str(s)] = {
            "centering": centering,
            "points_for_centering": points_for_centering,
        }

eval(feature, xs, heterogeneity=False, centering=False, return_all=False, use_vectorized=True)

Evaluate the effect of the s-th feature at positions xs.

Parameters:

Name Type Description Default
feature int

index of feature of interest

required
xs ndarray

the points along the s-th axis to evaluate the FE plot

  • np.ndarray of shape (T, )
required
heterogeneity bool

whether to return the heterogeneity measures.

  • if heterogeneity=False, the function returns the mean effect at the given xs
  • If heterogeneity=True, the function returns (y, std) where y is the mean effect and std is the standard deviation of the mean effect
False
centering Union[bool, str]

whether to center the PDP

  • If centering is False, the PDP not centered
  • If centering is True or zero_integral, the PDP is centered around the y axis.
  • If centering is zero_start, the PDP starts from y=0.
False
return_all bool

whether to return PDP and ICE plots evaluated at xs

  • If return_all=False, the function returns the mean effect at the given xs
  • If return_all=True, the function returns a ndarray of shape (T, N) with the N ICE plots evaluated at xs
False
use_vectorized bool

whether to use the vectorized version of the computation

True

Returns:

Type Description
Union[ndarray, Tuple[ndarray, ndarray]]

the mean effect y, if heterogeneity=False (default) or a tuple (y, std) otherwise

Source code in effector/global_effect_pdp.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def eval(
    self,
    feature: int,
    xs: np.ndarray,
    heterogeneity: bool = False,
    centering: typing.Union[bool, str] = False,
    return_all: bool = False,
    use_vectorized: bool = True,
) -> typing.Union[np.ndarray, typing.Tuple[np.ndarray, np.ndarray]]:
    """Evaluate the effect of the s-th feature at positions `xs`.

    Args:
        feature: index of feature of interest
        xs: the points along the s-th axis to evaluate the FE plot

          - `np.ndarray` of shape `(T, )`

        heterogeneity: whether to return the heterogeneity measures.

              - if `heterogeneity=False`, the function returns the mean effect at the given `xs`
              - If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect

        centering: whether to center the PDP

            - If `centering` is `False`, the PDP not centered
            - If `centering` is `True` or `zero_integral`, the PDP is centered around the `y` axis.
            - If `centering` is `zero_start`, the PDP starts from `y=0`.

        return_all: whether to return PDP and ICE plots evaluated at `xs`

            - If `return_all=False`, the function returns the mean effect at the given `xs`
            - If `return_all=True`, the function returns a `ndarray` of shape `(T, N)` with the `N` ICE plots evaluated at `xs`

        use_vectorized: whether to use the vectorized version of the computation

    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    centering = helpers.prep_centering(centering)

    if self.requires_refit(feature, centering):
        self.fit(
            features=feature, centering=centering, use_vectorized=use_vectorized
        )

    # Check if the lower bound is less than the upper bound
    assert self.axis_limits[0, feature] < self.axis_limits[1, feature]

    # new implementation
    y_ice = self._predict(self.data, xs, feature, use_vectorized)
    if centering:
        norm_consts = np.expand_dims(
            self.feature_effect["feature_" + str(feature)]["norm_const"], axis=0
        )
        y_ice = y_ice - norm_consts

    y_mean = np.mean(y_ice, axis=1)

    if return_all:
        return y_ice

    if heterogeneity:
        y_var = np.var(y_ice, axis=1)
        return y_mean, y_var
    else:
        return y_mean

plot(feature, heterogeneity='ice', centering=True, nof_points=30, scale_x=None, scale_y=None, nof_ice='all', show_avg_output=False, y_limits=None, use_vectorized=True)

Plot the feature effect.

Parameters:

Name Type Description Default
feature int

the feature to plot

required
heterogeneity Union[bool, str]

whether to plot the heterogeneity

  • False, plot only the mean effect
  • True or std, plot the standard deviation of the ICE curves
  • ice, also plot the ICE curves
'ice'
centering Union[bool, str]

whether to center the plot

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
True
nof_points int

the grid size for the PDP plot

30
scale_x Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the x-axis will be scaled x = (x + mean) * std
None
scale_y Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the y-axis will be scaled y = (y + mean) * std
None
nof_ice Union[int, str]

number of ICE plots to show on top of the SHAP curve

'all'
show_avg_output bool

whether to show the average output of the model

False
y_limits Optional[List]

None or tuple, the limits of the y-axis

  • If set to None, the limits of the y-axis are set automatically
  • If set to a tuple, the limits are manually set
None
use_vectorized bool

whether to use the vectorized version of the PDP computation

True
Source code in effector/global_effect_pdp.py
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
def plot(
    self,
    feature: int,
    heterogeneity: Union[bool, str] = "ice",
    centering: Union[bool, str] = True,
    nof_points: int = 30,
    scale_x: Optional[dict] = None,
    scale_y: Optional[dict] = None,
    nof_ice: Union[int, str] = "all",
    show_avg_output: bool = False,
    y_limits: Optional[List] = None,
    use_vectorized: bool = True,
):
    """
    Plot the feature effect.

    Parameters:
        feature: the feature to plot
        heterogeneity: whether to plot the heterogeneity

              - `False`, plot only the mean effect
              - `True` or `std`, plot the standard deviation of the ICE curves
              - `ice`, also plot the ICE curves

        centering: whether to center the plot

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        nof_points: the grid size for the PDP plot

        scale_x: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the x-axis will be scaled `x = (x + mean) * std`

        scale_y: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the y-axis will be scaled `y = (y + mean) * std`

        nof_ice: number of ICE plots to show on top of the SHAP curve
        show_avg_output: whether to show the average output of the model

        y_limits: None or tuple, the limits of the y-axis

            - If set to None, the limits of the y-axis are set automatically
            - If set to a tuple, the limits are manually set

        use_vectorized: whether to use the vectorized version of the PDP computation
    """
    self._plot(
        feature,
        heterogeneity,
        centering,
        nof_points,
        scale_x,
        scale_y,
        nof_ice,
        show_avg_output,
        y_limits,
        use_vectorized,
    )

effector.global_effect_pdp.DerPDP(data, model, model_jac=None, axis_limits=None, nof_instances=10000, feature_names=None, target_name=None)

Bases: PDPBase

Constructor of the DerivativePDP class.

Definition

d-PDP: $$ dPDP(x_s) = {1 \over N} \sum_{i=1}^N {\partial f \over \partial x_s}(x_s, \mathbf{x}_c^i) $$

centered-PDP: $$ dPDP_c(x_s) = dPDP(x_s) - c, \quad c = {1 \over M} \sum_{j=1}^M dPDP(x_s^j) $$

ICE: $$ dICE^i(x_s) = {\partial f \over \partial x_s}(x_s, \mathbf{x}_c^i), \quad i=1, \dots, N $$

centered-ICE: $$ dICE_c^i(x_s) = dICE^i(x_s) - c_i, \quad c_i = {1 \over M} \sum_{j=1}^M dICE^i(x_s^j) $$

heterogeneity function: $$ h(x_s) = {1 \over N} \sum_{i=1}^N ( dICE_c^i(x_s) - dPDP_c(x_s) )^2 $$

The heterogeneity value is: $$ \mathcal{H}(x_s) = {1 \over M} \sum_{j=1}^M h(x_s^j), $$ where \(x_s^j\) are an equally spaced grid of points in \([x_s^{\min}, x_s^{\max}]\).

Notes
  • The required parameters are data and model. The rest are optional.
  • The model_jac is the Jacobian of the model. If None, the Jacobian will be computed numerically.

Parameters:

Name Type Description Default
data ndarray

the design matrix

  • shape: (N,D)
required
model Callable

the black-box model. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N, )
required
model_jac Optional[Callable]

the black-box model Jacobian. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N, D)
None
axis_limits Optional[ndarray]

The limits of the feature effect plot along each axis

  • use a ndarray of shape (2, D), to specify them manually
  • use None, to be inferred from the data
None
nof_instances Union[int, str]

maximum number of instances to be used for PDP.

  • use "all", for using all instances.
  • use an int, for using nof_instances instances.
10000
feature_names Optional[List]

The names of the features

  • use a list of str, to specify the name manually. For example: ["age", "weight", ...]
  • use None, to keep the default names: ["x_0", "x_1", ...]
None
target_name Optional[str]

The name of the target variable

  • use a str, to specify it name manually. For example: "price"
  • use None, to keep the default name: "y"
None

Methods:

Name Description
fit

Fit the Feature effect to the data.

eval

Evaluate the effect of the s-th feature at positions xs.

plot

Plot the feature effect.

Source code in effector/global_effect_pdp.py
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    model_jac: Optional[Callable] = None,
    axis_limits: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 10_000,
    feature_names: Optional[List] = None,
    target_name: Optional[str] = None,
):
    """
    Constructor of the DerivativePDP class.

    Definition:
        d-PDP:
        $$
        dPDP(x_s) = {1 \over N} \sum_{i=1}^N {\partial f \over \partial x_s}(x_s, \mathbf{x}_c^i)
        $$

        centered-PDP:
        $$
        dPDP_c(x_s) = dPDP(x_s) - c, \quad c = {1 \over M} \sum_{j=1}^M dPDP(x_s^j)
        $$

        ICE:
        $$
        dICE^i(x_s) = {\partial f \over \partial x_s}(x_s, \mathbf{x}_c^i), \quad i=1, \dots, N
        $$

        centered-ICE:
        $$
        dICE_c^i(x_s) = dICE^i(x_s) - c_i, \quad c_i = {1 \over M} \sum_{j=1}^M dICE^i(x_s^j)
        $$

        heterogeneity function:
        $$
        h(x_s) = {1 \over N} \sum_{i=1}^N ( dICE_c^i(x_s) - dPDP_c(x_s) )^2
        $$

        The heterogeneity value is:
        $$
        \mathcal{H}(x_s) = {1 \over M} \sum_{j=1}^M h(x_s^j),
        $$
        where $x_s^j$ are an equally spaced grid of points in $[x_s^{\min}, x_s^{\max}]$.

    Notes:
        - The required parameters are `data` and `model`. The rest are optional.
        - The `model_jac` is the Jacobian of the model. If `None`, the Jacobian will be computed numerically.

    Args:
        data: the design matrix

            - shape: `(N,D)`
        model: the black-box model. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N, )`

        model_jac: the black-box model Jacobian. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N, D)`

        axis_limits: The limits of the feature effect plot along each axis

            - use a `ndarray` of shape `(2, D)`, to specify them manually
            - use `None`, to be inferred from the data

        nof_instances: maximum number of instances to be used for PDP.

            - use "all", for using all instances.
            - use an `int`, for using `nof_instances` instances.

        feature_names: The names of the features

            - use a `list` of `str`, to specify the name manually. For example: `["age", "weight", ...]`
            - use `None`, to keep the default names: `["x_0", "x_1", ...]`

        target_name: The name of the target variable

            - use a `str`, to specify it name manually. For example: `"price"`
            - use `None`, to keep the default name: `"y"`
    """

    super(DerPDP, self).__init__(
        data,
        model,
        model_jac,
        axis_limits,
        nof_instances,
        feature_names,
        target_name,
        method_name="d-PDP",
    )

fit(features='all', centering=False, points_for_centering=30, use_vectorized=True)

Fit the Feature effect to the data.

Notes

You can use .eval or .plot without calling .fit explicitly. The only thing that .fit does is to compute the normalization constant for centering the PDP and ICE plots. This will be automatically done when calling eval or plot, so there is no need to call fit explicitly.

Parameters:

Name Type Description Default
features Union[int, str, list]

the features to fit. - If set to "all", all the features will be fitted.

'all'
centering Union[bool, str]

whether to center the plot:

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
False
points_for_centering int

number of linspaced points along the feature axis used for centering.

30
use_vectorized bool

whether to use vectorized operations for the PDP and ICE curves

True
Source code in effector/global_effect_pdp.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def fit(
    self,
    features: Union[int, str, list] = "all",
    centering: Union[bool, str] = False,
    points_for_centering: int = 30,
    use_vectorized: bool = True,
):
    """
    Fit the Feature effect to the data.

    Notes:
        You can use `.eval` or `.plot` without calling `.fit` explicitly.
        The only thing that `.fit` does is to compute the normalization constant for centering the PDP and ICE plots.
        This will be automatically done when calling `eval` or `plot`, so there is no need to call `fit` explicitly.

    Args:
        features: the features to fit.
            - If set to "all", all the features will be fitted.

        centering: whether to center the plot:

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        points_for_centering: number of linspaced points along the feature axis used for centering.
        use_vectorized: whether to use vectorized operations for the PDP and ICE curves

    """
    centering = helpers.prep_centering(centering)
    features = helpers.prep_features(features, self.dim)

    for s in features:
        self.feature_effect["feature_" + str(s)] = self._fit_feature(
            s, centering, points_for_centering, use_vectorized
        )
        self.is_fitted[s] = True
        self.fit_args["feature_" + str(s)] = {
            "centering": centering,
            "points_for_centering": points_for_centering,
        }

eval(feature, xs, heterogeneity=False, centering=False, return_all=False, use_vectorized=True)

Evaluate the effect of the s-th feature at positions xs.

Parameters:

Name Type Description Default
feature int

index of feature of interest

required
xs ndarray

the points along the s-th axis to evaluate the FE plot

  • np.ndarray of shape (T, )
required
heterogeneity bool

whether to return the heterogeneity measures.

  • if heterogeneity=False, the function returns the mean effect at the given xs
  • If heterogeneity=True, the function returns (y, std) where y is the mean effect and std is the standard deviation of the mean effect
False
centering Union[bool, str]

whether to center the PDP

  • If centering is False, the PDP not centered
  • If centering is True or zero_integral, the PDP is centered around the y axis.
  • If centering is zero_start, the PDP starts from y=0.
False
return_all bool

whether to return PDP and ICE plots evaluated at xs

  • If return_all=False, the function returns the mean effect at the given xs
  • If return_all=True, the function returns a ndarray of shape (T, N) with the N ICE plots evaluated at xs
False
use_vectorized bool

whether to use the vectorized version of the computation

True

Returns:

Type Description
Union[ndarray, Tuple[ndarray, ndarray]]

the mean effect y, if heterogeneity=False (default) or a tuple (y, std) otherwise

Source code in effector/global_effect_pdp.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def eval(
    self,
    feature: int,
    xs: np.ndarray,
    heterogeneity: bool = False,
    centering: typing.Union[bool, str] = False,
    return_all: bool = False,
    use_vectorized: bool = True,
) -> typing.Union[np.ndarray, typing.Tuple[np.ndarray, np.ndarray]]:
    """Evaluate the effect of the s-th feature at positions `xs`.

    Args:
        feature: index of feature of interest
        xs: the points along the s-th axis to evaluate the FE plot

          - `np.ndarray` of shape `(T, )`

        heterogeneity: whether to return the heterogeneity measures.

              - if `heterogeneity=False`, the function returns the mean effect at the given `xs`
              - If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect

        centering: whether to center the PDP

            - If `centering` is `False`, the PDP not centered
            - If `centering` is `True` or `zero_integral`, the PDP is centered around the `y` axis.
            - If `centering` is `zero_start`, the PDP starts from `y=0`.

        return_all: whether to return PDP and ICE plots evaluated at `xs`

            - If `return_all=False`, the function returns the mean effect at the given `xs`
            - If `return_all=True`, the function returns a `ndarray` of shape `(T, N)` with the `N` ICE plots evaluated at `xs`

        use_vectorized: whether to use the vectorized version of the computation

    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std)` otherwise

    """
    centering = helpers.prep_centering(centering)

    if self.requires_refit(feature, centering):
        self.fit(
            features=feature, centering=centering, use_vectorized=use_vectorized
        )

    # Check if the lower bound is less than the upper bound
    assert self.axis_limits[0, feature] < self.axis_limits[1, feature]

    # new implementation
    y_ice = self._predict(self.data, xs, feature, use_vectorized)
    if centering:
        norm_consts = np.expand_dims(
            self.feature_effect["feature_" + str(feature)]["norm_const"], axis=0
        )
        y_ice = y_ice - norm_consts

    y_mean = np.mean(y_ice, axis=1)

    if return_all:
        return y_ice

    if heterogeneity:
        y_var = np.var(y_ice, axis=1)
        return y_mean, y_var
    else:
        return y_mean

plot(feature, heterogeneity='ice', centering=False, nof_points=30, scale_x=None, scale_y=None, nof_ice=100, show_avg_output=False, dy_limits=None, use_vectorized=True)

Plot the feature effect.

Parameters:

Name Type Description Default
feature int

the feature to plot

required
heterogeneity Union[bool, str]

whether to plot the heterogeneity

  • False, plot only the mean effect
  • True or std, plot the standard deviation of the ICE curves
  • ice, also plot the ICE curves
'ice'
centering Union[bool, str]

whether to center the plot

  • False means no centering
  • True or zero_integral centers around the y axis.
  • zero_start starts the plot from y=0.
False
nof_points int

the grid size for the PDP plot

30
scale_x Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the x-axis will be scaled x = (x + mean) * std
None
scale_y Optional[dict]

None or Dict with keys ['std', 'mean']

  • If set to None, no scaling will be applied.
  • If set to a dict, the y-axis will be scaled y = (y + mean) * std
None
nof_ice Union[int, str]

number of ICE plots to show on top of the SHAP curve

100
show_avg_output bool

whether to show the average output of the model

False
dy_limits Optional[List]

None or tuple, the limits of the y-axis for the derivative PDP

  • If set to None, the limits of the y-axis are set automatically
  • If set to a tuple, the limits are manually set
None
use_vectorized bool

whether to use the vectorized version of the PDP computation

True
Source code in effector/global_effect_pdp.py
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
def plot(
    self,
    feature: int,
    heterogeneity: Union[bool, str] = "ice",
    centering: Union[bool, str] = False,
    nof_points: int = 30,
    scale_x: Optional[dict] = None,
    scale_y: Optional[dict] = None,
    nof_ice: Union[int, str] = 100,
    show_avg_output: bool = False,
    dy_limits: Optional[List] = None,
    use_vectorized: bool = True,
):
    """
    Plot the feature effect.

    Parameters:
        feature: the feature to plot
        heterogeneity: whether to plot the heterogeneity

              - `False`, plot only the mean effect
              - `True` or `std`, plot the standard deviation of the ICE curves
              - `ice`, also plot the ICE curves

        centering: whether to center the plot

            - `False` means no centering
            - `True` or `zero_integral` centers around the `y` axis.
            - `zero_start` starts the plot from `y=0`.

        nof_points: the grid size for the PDP plot

        scale_x: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the x-axis will be scaled `x = (x + mean) * std`

        scale_y: None or Dict with keys ['std', 'mean']

            - If set to None, no scaling will be applied.
            - If set to a dict, the y-axis will be scaled `y = (y + mean) * std`

        nof_ice: number of ICE plots to show on top of the SHAP curve
        show_avg_output: whether to show the average output of the model

        dy_limits: None or tuple, the limits of the y-axis for the derivative PDP

            - If set to None, the limits of the y-axis are set automatically
            - If set to a tuple, the limits are manually set

        use_vectorized: whether to use the vectorized version of the PDP computation
    """
    self._plot(
        feature,
        heterogeneity,
        centering,
        nof_points,
        scale_x,
        scale_y,
        nof_ice,
        show_avg_output,
        dy_limits,
        use_vectorized,
    )

effector.global_effect_shap.ShapDP(data, model, axis_limits=None, nof_instances=1000, feature_names=None, target_name=None, shap_values=None)

Bases: GlobalEffectBase

Constructor of the SHAPDependence class.

Definition

The value of a coalition of \(S\) features is estimated as: $$ \hat{v}(S) = {1 \over N} \sum_{i=1}^N f(x_S \cup x_C^i) - f(x^i) $$ The value of a coalition \(S\) quantifies what the values \(\mathbf{x}_S\) of the features in \(S\) contribute to the output of the model. It is the average (over all instances) difference on the output between setting features in \(S\) to be \(x_S\), i.e., \(\mathbf{x} = (\mathbf{x}_S, \mathbf{x}_C^i)\) and leaving the instance as it is, i.e., \(\mathbf{x}^i = (\mathbf{x}_S^i, \mathbf{x}_C^i)\).

The contribution of a feature \(j\) added to a coalition \(S\) is estimated as: $$ \hat{\Delta}_{S, j} = \hat{v}(S \cup {j}) - \hat{v}(S) $$

The SHAP value of a feature \(j\) with value \(x_j\) is the average contribution of feature \(j\) across all possible coalitions with a weight \(w_{S, j}\):

\[ \hat{\phi}_j(x_j) = {1 \over N} \sum_{S \subseteq \{1, \dots, D\} \setminus \{j\}} w_{S, j} \hat{\Delta}_{S, j} \]

where \(w_{S, j}\) assures that the contribution of feature \(j\) is the same for all coalitions of the same size. For example, there are \(D-1\) ways for \(x_j\) to enter a coalition of \(|S| = 1\) feature, so \(w_{S, j} = {1 \over D (D-1)}\) for each of them. In contrast, there is only one way for \(x_j\) to enter a coaltion of \(|S|=0\) (to be the first specified feature), so \(w_{S, j} = {1 \over D}\).

The SHAP Dependence Plot (SHAP-DP) is a spline \(\hat{f}^{SDP}_j(x_j)\) fit to the dataset \(\{(x_j^i, \hat{\phi}_j(x_j^i))\}_{i=1}^N\) using the UnivariateSpline function from scipy.interpolate.

Notes
  • The required parameters are data and model. The rest are optional.
  • SHAP values are computed using the shap package, using the class Explainer.
  • SHAP values are centered by default, i.e., the average SHAP value is subtracted from the SHAP values.
  • More details on the SHAP values can be found in the original paper and in the book Interpreting Machine Learning Models with SHAP

Parameters:

Name Type Description Default
data ndarray

the design matrix

  • shape: (N,D)
required
model Callable

the black-box model. Must be a Callable with:

  • input: ndarray of shape (N, D)
  • output: ndarray of shape (N,)
required
axis_limits Optional[ndarray]

The limits of the feature effect plot along each axis

  • use a ndarray of shape (2, D), to specify them manually
  • use None, to be inferred from the data
None
nof_instances Union[int, str]

maximum number of instances to be used for SHAP estimation.

  • use "all", for using all instances.
  • use an int, for using nof_instances instances.
1000
avg_output

The average output of the model.

  • use a float, to specify it manually
  • use None, to be inferred as np.mean(model(data))
required
feature_names Optional[List[str]]

The names of the features

  • use a list of str, to specify the name manually. For example: ["age", "weight", ...]
  • use None, to keep the default names: ["x_0", "x_1", ...]
None
target_name Optional[str]

The name of the target variable

  • use a str, to specify it name manually. For example: "price"
  • use None, to keep the default name: "y"
None
shap_values Optional[ndarray]

The SHAP values of the model

  • if shap values are already computed, they can be passed here
  • if None, the SHAP values will be computed using the shap package
None

Methods:

Name Description
fit

Fit the SHAP Dependence Plot to the data.

eval

Evaluate the effect of the s-th feature at positions xs.

plot

Plot the SHAP Dependence Plot (SDP) of the s-th feature.

Source code in effector/global_effect_shap.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def __init__(
    self,
    data: np.ndarray,
    model: Callable,
    axis_limits: Optional[np.ndarray] = None,
    nof_instances: Union[int, str] = 1_000,
    feature_names: Optional[List[str]] = None,
    target_name: Optional[str] = None,
    shap_values: Optional[np.ndarray] = None,
):
    """
    Constructor of the SHAPDependence class.

    Definition:
        The value of a coalition of $S$ features is estimated as:
        $$
        \hat{v}(S) = {1 \over N} \sum_{i=1}^N  f(x_S \cup x_C^i) - f(x^i)
        $$
        The value of a coalition $S$ quantifies what the values $\mathbf{x}_S$ of the features in $S$ contribute to the output of the model. It
        is the average (over all instances) difference on the output between setting features in $S$ to be $x_S$, i.e., $\mathbf{x} = (\mathbf{x}_S, \mathbf{x}_C^i)$ and leaving the instance as it is, i.e., $\mathbf{x}^i = (\mathbf{x}_S^i, \mathbf{x}_C^i)$.

        The contribution of a feature $j$ added to a coalition $S$ is estimated as:
        $$
        \hat{\Delta}_{S, j} = \hat{v}(S \cup \{j\}) - \hat{v}(S)
        $$

        The SHAP value of a feature $j$ with value $x_j$ is the average contribution of feature $j$ across all possible coalitions with a weight $w_{S, j}$:

        $$
        \hat{\phi}_j(x_j) = {1 \over N} \sum_{S \subseteq \{1, \dots, D\} \setminus \{j\}} w_{S, j} \hat{\Delta}_{S, j}
        $$

        where $w_{S, j}$ assures that the contribution of feature $j$ is the same for all coalitions of the same size. For example, there are $D-1$ ways for $x_j$ to enter a coalition of $|S| = 1$ feature, so $w_{S, j} = {1 \over D (D-1)}$ for each of them. In contrast, there is only one way for $x_j$ to enter a coaltion of $|S|=0$ (to be the first specified feature), so $w_{S, j} = {1 \over D}$.

        The SHAP Dependence Plot (SHAP-DP) is a spline $\hat{f}^{SDP}_j(x_j)$ fit to the dataset $\{(x_j^i, \hat{\phi}_j(x_j^i))\}_{i=1}^N$ using the `UnivariateSpline` function from `scipy.interpolate`.

    Notes:
        * The required parameters are `data` and `model`. The rest are optional.
        * SHAP values are computed using the `shap` package, using the class `Explainer`.
        * SHAP values are centered by default, i.e., the average SHAP value is subtracted from the SHAP values.
        * More details on the SHAP values can be found in the [original paper](https://arxiv.org/abs/1705.07874) and in the book [Interpreting Machine Learning Models with SHAP](https://christophmolnar.com/books/shap/)

    Args:
        data: the design matrix

            - shape: `(N,D)`
        model: the black-box model. Must be a `Callable` with:

            - input: `ndarray` of shape `(N, D)`
            - output: `ndarray` of shape `(N,)`

        axis_limits: The limits of the feature effect plot along each axis

            - use a `ndarray` of shape `(2, D)`, to specify them manually
            - use `None`, to be inferred from the data

        nof_instances: maximum number of instances to be used for SHAP estimation.

            - use "all", for using all instances.
            - use an `int`, for using `nof_instances` instances.

        avg_output: The average output of the model.

            - use a `float`, to specify it manually
            - use `None`, to be inferred as `np.mean(model(data))`

        feature_names: The names of the features

            - use a `list` of `str`, to specify the name manually. For example: `                  ["age", "weight", ...]`
            - use `None`, to keep the default names: `["x_0", "x_1", ...]`

        target_name: The name of the target variable

            - use a `str`, to specify it name manually. For example: `"price"`
            - use `None`, to keep the default name: `"y"`

        shap_values: The SHAP values of the model

            - if shap values are already computed, they can be passed here
            - if `None`, the SHAP values will be computed using the `shap` package
    """
    self.shap_values = shap_values if shap_values is not None else None
    super(ShapDP, self).__init__(
        "SHAP DP",
        data,
        model,
        None,
        None,
        nof_instances,
        axis_limits,
        feature_names,
        target_name,
    )

fit(features='all', centering=True, points_for_centering=30, binning_method='greedy')

Fit the SHAP Dependence Plot to the data.

Notes

The SHAP Dependence Plot (SDP) \(\hat{f}^{SDP}_j(x_j)\) is a spline fit to the dataset \(\{(x_j^i, \hat{\phi}_j(x_j^i))\}_{i=1}^N\) using the UnivariateSpline function from scipy.interpolate.

The SHAP standard deviation, \(\hat{\sigma}^{SDP}_j(x_j)\), is a spline fit to the absolute value of the residuals, i.e., to the dataset \(\{(x_j^i, |\hat{\phi}_j(x_j^i) - \hat{f}^{SDP}_j(x_j^i)|)\}_{i=1}^N\), using the UnivariateSpline function from scipy.interpolate.

Parameters:

Name Type Description Default
features Union[int, str, List]

the features to fit. - If set to "all", all the features will be fitted.

'all'
centering Union[bool, str]
  • If set to False, no centering will be applied.
  • If set to "zero_integral" or True, the integral of the feature effect will be set to zero.
  • If set to "zero_mean", the mean of the feature effect will be set to zero.
True
points_for_centering Union[int, str]

number of linspaced points along the feature axis used for centering.

  • If set to all, all the dataset points will be used.
30
Notes

SHAP values are by default centered, i.e., \(\sum_{i=1}^N \hat{\phi}_j(x_j^i) = 0\). This does not mean that the SHAP curve is centered around zero; this happens only if the \(s\)-th feature of the dataset instances, i.e., the set \(\{x_s^i\}_{i=1}^N\) is uniformly distributed along the \(s\)-th axis. So, use:

  • centering=False, to leave the SHAP values as they are.
  • centering=True or centering=zero_integral, to center the SHAP curve around the y axis.
  • centering=zero_start, to start the SHAP curve from y=0.

SHAP values are expensive to compute. To speed up the computation consider using a subset of the dataset points for computing the SHAP values and for centering the spline. The default values (points_for_fitting_spline=100 and points_for_centering=100) are a moderate choice.

Source code in effector/global_effect_shap.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
def fit(
    self,
    features: Union[int, str, List] = "all",
    centering: Union[bool, str] = True,
    points_for_centering: Union[int, str] = 30,
    binning_method: Union[str, ap.Greedy, ap.Fixed] = "greedy",
) -> None:
    """Fit the SHAP Dependence Plot to the data.

    Notes:
        The SHAP Dependence Plot (SDP) $\hat{f}^{SDP}_j(x_j)$ is a spline fit to
        the dataset $\{(x_j^i, \hat{\phi}_j(x_j^i))\}_{i=1}^N$
        using the `UnivariateSpline` function from `scipy.interpolate`.

        The SHAP standard deviation, $\hat{\sigma}^{SDP}_j(x_j)$, is a spline fit            to the absolute value of the residuals, i.e., to the dataset $\{(x_j^i, |\hat{\phi}_j(x_j^i) - \hat{f}^{SDP}_j(x_j^i)|)\}_{i=1}^N$, using the `UnivariateSpline` function from `scipy.interpolate`.

    Args:
        features: the features to fit.
            - If set to "all", all the features will be fitted.
        centering:
            - If set to False, no centering will be applied.
            - If set to "zero_integral" or True, the integral of the feature effect will be set to zero.
            - If set to "zero_mean", the mean of the feature effect will be set to zero.

        points_for_centering: number of linspaced points along the feature axis used for centering.

            - If set to `all`, all the dataset points will be used.

    Notes:
        SHAP values are by default centered, i.e., $\sum_{i=1}^N \hat{\phi}_j(x_j^i) = 0$. This does not mean that the SHAP _curve_ is centered around zero; this happens only if the $s$-th feature of the dataset instances, i.e., the set $\{x_s^i\}_{i=1}^N$ is uniformly distributed along the $s$-th axis. So, use:

        * `centering=False`, to leave the SHAP values as they are.
        * `centering=True` or `centering=zero_integral`, to center the SHAP curve around the `y` axis.
        * `centering=zero_start`, to start the SHAP curve from `y=0`.

        SHAP values are expensive to compute.
        To speed up the computation consider using a subset of the dataset
        points for computing the SHAP values and for centering the spline.
        The default values (`points_for_fitting_spline=100`
        and `points_for_centering=100`) are a moderate choice.
    """
    centering = helpers.prep_centering(centering)
    features = helpers.prep_features(features, self.dim)

    # new implementation
    for s in features:
        self.feature_effect["feature_" + str(s)] = self._fit_feature(
            s, binning_method, centering, points_for_centering,
        )
        self.is_fitted[s] = True
        self.fit_args["feature_" + str(s)] = {
            "centering": centering,
            "points_for_centering": points_for_centering,
        }

eval(feature, xs, heterogeneity=True, centering=True)

Evaluate the effect of the s-th feature at positions xs.

Parameters:

Name Type Description Default
feature int

index of feature of interest

required
xs ndarray

the points along the s-th axis to evaluate the FE plot

  • np.ndarray of shape (T,)
required
heterogeneity bool

whether to return the heterogeneity measures.

  • if heterogeneity=False, the function returns the mean effect at the given xs
  • If heterogeneity=True, the function returns (y, std) where y is the mean effect and std is the standard deviation of the mean effect
True
centering Union[bool, str]

whether to center the plot

  • If centering is False, the SHAP curve is not centered
  • If centering is True or zero_integral, the SHAP curve is centered around the y axis.
  • If centering is zero_start, the SHAP curve starts from y=0.
True

Returns:

Type Description
Union[ndarray, Tuple[ndarray, ndarray]]

the mean effect y, if heterogeneity=False (default) or a tuple (y, std, estimator_var) otherwise

Source code in effector/global_effect_shap.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
def eval(
    self,
    feature: int,
    xs: np.ndarray,
    heterogeneity: bool = True,
    centering: typing.Union[bool, str] = True,
) -> typing.Union[np.ndarray, typing.Tuple[np.ndarray, np.ndarray]]:
    """Evaluate the effect of the s-th feature at positions `xs`.

    Args:
        feature: index of feature of interest
        xs: the points along the s-th axis to evaluate the FE plot

          - `np.ndarray` of shape `(T,)`
        heterogeneity: whether to return the heterogeneity measures.

              - if `heterogeneity=False`, the function returns the mean effect at the given `xs`
              - If `heterogeneity=True`, the function returns `(y, std)` where `y` is the mean effect and `std` is the standard deviation of the mean effect

        centering: whether to center the plot

            - If `centering` is `False`, the SHAP curve is not centered
            - If `centering` is `True` or `zero_integral`, the SHAP curve is centered around the `y` axis.
            - If `centering` is `zero_start`, the SHAP curve starts from `y=0`.

    Returns:
        the mean effect `y`, if `heterogeneity=False` (default) or a tuple `(y, std, estimator_var)` otherwise
    """
    centering = helpers.prep_centering(centering)

    if self.requires_refit(feature, centering):
        self.fit(features=feature, centering=centering)

    # Check if the lower bound is less than the upper bound
    assert self.axis_limits[0, feature] < self.axis_limits[1, feature]

    yy = self.feature_effect["feature_" + str(feature)]["spline_mean"](xs)

    if centering is not False:
        norm_const = self.feature_effect["feature_" + str(feature)]["norm_const"]
        yy = yy - norm_const

    if heterogeneity:
        yy_var = self.feature_effect["feature_" + str(feature)]["spline_std"](xs)
        return yy, yy_var
    else:
        return yy

plot(feature, heterogeneity='shap_values', centering=True, nof_points=30, scale_x=None, scale_y=None, nof_shap_values='all', show_avg_output=False, y_limits=None, only_shap_values=False)

Plot the SHAP Dependence Plot (SDP) of the s-th feature.

Parameters:

Name Type Description Default
feature int

index of the plotted feature

required
heterogeneity Union[bool, str]

whether to output the heterogeneity of the SHAP values

  • If heterogeneity is False, no heterogeneity is plotted
  • If heterogeneity is True or "std", the standard deviation of the shap values is plotted
  • If heterogeneity is "shap_values", the shap values are scattered on top of the SHAP curve
'shap_values'
centering Union[bool, str]

whether to center the SDP

  • If centering is False, the SHAP curve is not centered
  • If centering is True or zero_integral, the SHAP curve is centered around the y axis.
  • If centering is zero_start, the SHAP curve starts from y=0.
True
nof_points int

number of points to evaluate the SDP plot

30
scale_x Optional[dict]

dictionary with keys "mean" and "std" for scaling the x-axis

None
scale_y Optional[dict]

dictionary with keys "mean" and "std" for scaling the y-axis

None
nof_shap_values Union[int, str]

number of shap values to show on top of the SHAP curve

'all'
show_avg_output bool

whether to show the average output of the model

False
y_limits Optional[List]

limits of the y-axis

None
only_shap_values bool

whether to plot only the shap values

False
Source code in effector/global_effect_shap.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
def plot(
    self,
    feature: int,
    heterogeneity: Union[bool, str] = "shap_values",
    centering: Union[bool, str] = True,
    nof_points: int = 30,
    scale_x: Optional[dict] = None,
    scale_y: Optional[dict] = None,
    nof_shap_values: Union[int, str] = "all",
    show_avg_output: bool = False,
    y_limits: Optional[List] = None,
    only_shap_values: bool = False,
) -> None:
    """
    Plot the SHAP Dependence Plot (SDP) of the s-th feature.

    Args:
        feature: index of the plotted feature
        heterogeneity: whether to output the heterogeneity of the SHAP values

            - If `heterogeneity` is `False`, no heterogeneity is plotted
            - If `heterogeneity` is `True` or `"std"`, the standard deviation of the shap values is plotted
            - If `heterogeneity` is `"shap_values"`, the shap values are scattered on top of the SHAP curve

        centering: whether to center the SDP

            - If `centering` is `False`, the SHAP curve is not centered
            - If `centering` is `True` or `zero_integral`, the SHAP curve is centered around the `y` axis.
            - If `centering` is `zero_start`, the SHAP curve starts from `y=0`.

        nof_points: number of points to evaluate the SDP plot
        scale_x: dictionary with keys "mean" and "std" for scaling the x-axis
        scale_y: dictionary with keys "mean" and "std" for scaling the y-axis
        nof_shap_values: number of shap values to show on top of the SHAP curve
        show_avg_output: whether to show the average output of the model
        y_limits: limits of the y-axis
        only_shap_values: whether to plot only the shap values
    """
    heterogeneity = helpers.prep_confidence_interval(heterogeneity)

    x = np.linspace(
        self.axis_limits[0, feature], self.axis_limits[1, feature], nof_points
    )

    # get the SHAP curve
    y = self.eval(feature, x, heterogeneity=False, centering=centering)
    y_std = (
        np.sqrt(self.feature_effect["feature_" + str(feature)]["spline_std"](x))
        if heterogeneity == "std" or True
        else None
    )

    # get some SHAP values
    _, ind = helpers.prep_nof_instances(nof_shap_values, self.data.shape[0])
    yy = (
        self.feature_effect["feature_" + str(feature)]["yy"][ind]
        if heterogeneity == "shap_values"
        else None
    )
    if yy is not None and centering is not False:
        yy = yy - self.feature_effect["feature_" + str(feature)]["norm_const"]
    xx = (
        self.feature_effect["feature_" + str(feature)]["xx"][ind]
        if heterogeneity == "shap_values"
        else None
    )

    if show_avg_output:
        avg_output = helpers.prep_avg_output(
            self.data, self.model, self.avg_output, scale_y
        )
    else:
        avg_output = None

    vis.plot_shap(
        x,
        y,
        xx,
        yy,
        y_std,
        feature,
        heterogeneity=heterogeneity,
        scale_x=scale_x,
        scale_y=scale_y,
        avg_output=avg_output,
        feature_names=self.feature_names,
        target_name=self.target_name,
        y_limits=y_limits,
        only_shap_values=only_shap_values,
    )