Model with conditional interaction
In this example, we show the heterogeneity of the global effects, using PDP, ALE, and RHALE, on a model with conditional interactions. We will use the following model:
where the features
In contrast,
import numpy as np
import matplotlib.pyplot as plt
import effector
np.random.seed(21)
model = effector.models.ConditionalInteraction()
dataset = effector.datasets.IndependentUniform(dim=3, low=-1, high=1)
x = dataset.generate_data(10_000)
PDP
Effector
Let's see below the PDP heterogeneity for each feature, using effector
.
pdp = effector.PDP(x, model.predict, dataset.axis_limits)
pdp.fit(features="all", centering=True)
for feature in [0, 1, 2]:
pdp.plot(feature=feature, centering=True, heterogeneity=True, y_limits=[-2, 2])
for feature in [0, 1, 2]:
pdp.plot(feature=feature, centering=True, heterogeneity="ice", y_limits=[-2, 2])
pdp = effector.PDP(x, model.predict, dataset.axis_limits, nof_instances="all")
pdp.fit(features="all", centering=True)
heter_per_feat = []
for feature in [0, 1, 2]:
y_mean, y_var = pdp.eval(feature=feature, xs=np.linspace(-1, 1, 100), centering=True, heterogeneity=True)
print(f"Heterogeneity of x_{feature}: {y_var.mean():.3f}")
heter_per_feat.append(y_var.mean())
Heterogeneity of x_0: 0.093
Heterogeneity of x_1: 0.088
Heterogeneity of x_2: 0.000
Conclusions:
- The global effect of
arises from heterogenous local effects, as for all . The std margins (red area of height around the global effect) musleadingly suggest that the heterogeneity is minimized at . ICE provide a clearer picture; they reveal two groups of effects, and . The heterogeneity as a scalar value is . - Similar to
, the global effect of arises from heterogeneous local effects. However, unlike , both std margins and ICE plots indicate a constant heterogeneity along all the axis, i.e., . ICE plots further show a smooth range of varying local effects around the global effect, without distinct groups. The heterogeneity as a scalar is $ H_{x_2} \approx 0.9 $, which is identical to $ H_{x_1} $. This is consistent, as heterogeneity measures the interaction between a feature and all other features. In this case, since only $ x_1 $ and $ x_2 $ interact, their heterogeneity values should be the same. shows no heteregeneity; all local effects align perfectly with the global effect.
Derivations
How PDP (and effector) reached such hetergoneity functions?
For
The average effect is
The heterogeneity as a scalar is simply the mean of the heterogeneity function:
For
The average effect is
The heterogeneity as a scalar is simply the mean of the heterogeneity function, so:
\(
For
Conclusions
PDP heterogeneity provides intuitive insights:
- The global effects of
and arise from heterogeneous local effects, while shows no heterogeneity. - The heterogeneity of
and is quantified at the same level (0.99), which makes sense since only these two features interact. - However, the heterogeneity of
appears misleading when centering ICE plots, as it falsely suggests minimized heterogeneity at , which is not accurate.
def pdp_ground_truth(feature, xs):
if feature == 0:
ff = lambda x: x**4 - 2/3*x**2 + 1/9
return ff(xs)
elif feature == 1:
ff = lambda x: np.ones_like(x) * 0.088
return ff(xs)
elif feature == 2:
ff = lambda x: np.zeros_like(x)
return ff(xs)
# make a test
xx = np.linspace(-1, 1, 100)
for feature in [0, 1, 2]:
pdp_mean, pdp_heter = pdp.eval(feature=feature, xs=xx, centering=True, heterogeneity=True)
y_heter = pdp_ground_truth(feature, xx)
np.testing.assert_allclose(pdp_heter, y_heter, atol=1e-1)
ALE
Effector
Let's see below the PDP effects for each feature, using effector
.
ale = effector.ALE(x, model.predict, axis_limits=dataset.axis_limits)
ale.fit(features="all", centering=True, binning_method=effector.axis_partitioning.Fixed(nof_bins=31))
for feature in [0, 1, 2]:
ale.plot(feature=feature, centering=True, heterogeneity=True, y_limits=[-2, 2])
ALE states that:
-
Feature
:
The heterogeneity varies across all values of . It starts large at , decreases until it becomes zero at , and then increases again until .
This behavior contrasts with the heterogeneity observed in the PDP, which has two zero-points at . -
Feature
:
Heterogeneity is observed only around . This behavior also contrasts PDP's heterogeneity which is constant for all values of -
Feature
:
No heterogeneity is present for this feature.
ale.feature_effect["feature_1"]
{'limits': array([-1. , -0.93548387, -0.87096774, -0.80645161, -0.74193548,
-0.67741935, -0.61290323, -0.5483871 , -0.48387097, -0.41935484,
-0.35483871, -0.29032258, -0.22580645, -0.16129032, -0.09677419,
-0.03225806, 0.03225806, 0.09677419, 0.16129032, 0.22580645,
0.29032258, 0.35483871, 0.41935484, 0.48387097, 0.5483871 ,
0.61290323, 0.67741935, 0.74193548, 0.80645161, 0.87096774,
0.93548387, 1.00000001]),
'dx': array([0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
0.06451614]),
'points_per_bin': array([320, 339, 318, 345, 313, 312, 331, 311, 293, 326, 309, 294, 314,
312, 335, 318, 354, 328, 312, 331, 315, 307, 341, 320, 330, 336,
312, 368, 325, 294, 337]),
'bin_effect': array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
9.93884099, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]),
'bin_variance': array([ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
82.51292593, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]),
'alg_params': 'fixed',
'norm_const': np.float64(0.3206077740261283)}
Derivations
For x_1:
The
For
In all bins except the central one, the local effects are zero. In the central bin, however, the local effects are $ 2(x_1^i)^2 $, which introduces some heterogeneity.
So, if $ x_2 $ is not in the central bin $ k = K/2 $,
So:
For
The effect is zero everywehere.
def ale_ground_truth(feature):
K = 31
bin_centers = np.linspace(-1 + 1/K, 1 - 1/K, K)
if feature == 0:
return 4*bin_centers**2
elif feature == 1:
y = np.zeros_like(bin_centers)
y[15] = None
return y
elif feature == 2:
return np.zeros_like(bin_centers)
# make a test
K = 31
bin_centers = np.linspace(-1 + 1/K, 1 - 1/K, K)
for feature in [0, 1, 2]:
bin_var = ale.feature_effect[f"feature_{feature}"]["bin_variance"]
gt_var = ale_ground_truth(feature)
mask = ~np.isnan(gt_var)
np.testing.assert_allclose(bin_var[mask], gt_var[mask], atol=1e-1)
Conclusions
Is the heterogeneity implied by the ALE plots meaningful? It is
RHALE
Effector
Let's see below the RHALE effects for each feature, using effector
.
rhale = effector.RHALE(x, model.predict, model.jacobian, axis_limits=dataset.axis_limits)
rhale.fit(features="all", centering=True)
for feature in [0, 1, 2]:
rhale.plot(feature=feature, centering=True, heterogeneity=True, y_limits=[-2, 2])
RHALE states that:
-
Feature
: As in ALE, the heterogeneity varies across all values of . It starts large at , decreases until it becomes zero at , and then increases again until . -
Feature
:
No heterogeneity is present for this feature. -
Feature
:
No heterogeneity is present for this feature.
Derivations
Conclusions
Are the RHALE effects intuitive?