Skip to content

Model with conditional interaction

In this example, we show the heterogeneity of the global effects, using PDP, ALE, and RHALE, on a model with conditional interactions. We will use the following model:

f(x1,x2,x3)=x121x2<0+x121x20+ex3

where the features x1,x2,x3 are independent and uniformly distributed in the interval [1,1]. The model has an interaction between x1 and x2 caused by the terms: f1,2(x1,x2)=x121x2<0+x121x20. This means that the effect of x1 on the output y depends on the value of x2 and vice versa. Terms like this introduce heterogeneity. Each global effect method has a different formula for qunatifying such heterogeneity; below, we will see how PDP, ALE, and RHALE handles it.

In contrast, x3 does not interact with any other feature, so its global effect has zero heterogeneity.

import numpy as np
import matplotlib.pyplot as plt
import effector

np.random.seed(21)

model = effector.models.ConditionalInteraction()
dataset = effector.datasets.IndependentUniform(dim=3, low=-1, high=1)
x = dataset.generate_data(10_000)

PDP

Effector

Let's see below the PDP heterogeneity for each feature, using effector.

pdp = effector.PDP(x, model.predict, dataset.axis_limits)
pdp.fit(features="all", centering=True)
for feature in [0, 1, 2]:
    pdp.plot(feature=feature, centering=True, heterogeneity=True, y_limits=[-2, 2])

png

png

png

for feature in [0, 1, 2]:
    pdp.plot(feature=feature, centering=True, heterogeneity="ice", y_limits=[-2, 2])

png

png

png

pdp = effector.PDP(x, model.predict, dataset.axis_limits, nof_instances="all")
pdp.fit(features="all", centering=True)
heter_per_feat = []
for feature in [0, 1, 2]:
    y_mean, y_var = pdp.eval(feature=feature, xs=np.linspace(-1, 1, 100), centering=True, heterogeneity=True)
    print(f"Heterogeneity of x_{feature}: {y_var.mean():.3f}")
    heter_per_feat.append(y_var.mean())
Heterogeneity of x_0: 0.093
Heterogeneity of x_1: 0.088
Heterogeneity of x_2: 0.000

Conclusions:

  • The global effect of x1 arises from heterogenous local effects, as h(x1)>0 for all x1. The std margins (red area of height ±h(x1) around the global effect) musleadingly suggest that the heterogeneity is minimized at x1=±23. ICE provide a clearer picture; they reveal two groups of effects, x12+c1 and x12+c2. The heterogeneity as a scalar value is Hx10.9.
  • Similar to x1, the global effect of x2 arises from heterogeneous local effects. However, unlike x1, both std margins and ICE plots indicate a constant heterogeneity along all the axis, i.e., h(x2)0.9x2. ICE plots further show a smooth range of varying local effects around the global effect, without distinct groups. The heterogeneity as a scalar is $ H_{x_2} \approx 0.9 $, which is identical to $ H_{x_1} $. This is consistent, as heterogeneity measures the interaction between a feature and all other features. In this case, since only $ x_1 $ and $ x_2 $ interact, their heterogeneity values should be the same.
  • x3 shows no heteregeneity; all local effects align perfectly with the global effect.

Derivations

How PDP (and effector) reached such hetergoneity functions?

For x1:

The average effect is fPDP(x1)=0. ICE plots are: $-x_1^2 + \frac{1}{3} $ when x2i<0 and $x_1^2 - \frac{1}{3} $ when x2i0. Due to the square, they both create the same deviation from the average effect: $ \left ( x_1^2 - \frac{1}{3} \right )^2 $.

h(x1)=1NiN(fcICE,i(x1)fcPDP(x1))2=1NiN(x1213)2=x1423x12+19

The heterogeneity as a scalar is simply the mean of the heterogeneity function:

Hx1=11(x1423x12+19)x12=4450.9

For x2:

The average effect is fPDP(x2)=131x2<0+131x20 and the ICE plots are fICE,i(x2)=(x1i)21x2<0+(x1i)21x20.

h(x2)=1NiN(fcICE,i(x2)fcPDP(x2))2=1NiN(1x2<0(13+(x1i)2)2+1x20(13+(x1i)2)2)=1NiN(13+(x1i)2)2=4450.9

The heterogeneity as a scalar is simply the mean of the heterogeneity function, so:

\(Hx20.9\).

For x3, there is no heterogeneity so h(x3)=0 and Hx3=0.

Conclusions

PDP heterogeneity provides intuitive insights:

  • The global effects of x1 and x2 arise from heterogeneous local effects, while x3 shows no heterogeneity.
  • The heterogeneity of x1 and x2 is quantified at the same level (0.99), which makes sense since only these two features interact.
  • However, the heterogeneity of x1 appears misleading when centering ICE plots, as it falsely suggests minimized heterogeneity at ±23, which is not accurate.
def pdp_ground_truth(feature, xs):
    if feature == 0:
        ff = lambda x: x**4 - 2/3*x**2 + 1/9
        return ff(xs)
    elif feature == 1:
        ff = lambda x: np.ones_like(x) * 0.088
        return ff(xs)
    elif feature == 2:
        ff = lambda x: np.zeros_like(x)
        return ff(xs)
# make a test
xx = np.linspace(-1, 1, 100)
for feature in [0, 1, 2]:
    pdp_mean, pdp_heter = pdp.eval(feature=feature, xs=xx, centering=True, heterogeneity=True)
    y_heter = pdp_ground_truth(feature, xx)
    np.testing.assert_allclose(pdp_heter, y_heter, atol=1e-1)

ALE

Effector

Let's see below the PDP effects for each feature, using effector.

ale = effector.ALE(x, model.predict, axis_limits=dataset.axis_limits)
ale.fit(features="all", centering=True, binning_method=effector.axis_partitioning.Fixed(nof_bins=31))

for feature in [0, 1, 2]:
    ale.plot(feature=feature, centering=True, heterogeneity=True, y_limits=[-2, 2])

png

png

png

ALE states that:

  • Feature x1:
    The heterogeneity varies across all values of x1. It starts large at x1=1, decreases until it becomes zero at x1=0, and then increases again until x1=1.
    This behavior contrasts with the heterogeneity observed in the PDP, which has two zero-points at x1=±23.

  • Feature x2:
    Heterogeneity is observed only around x2=0. This behavior also contrasts PDP's heterogeneity which is constant for all values of x2

  • Feature x3:
    No heterogeneity is present for this feature.

ale.feature_effect["feature_1"]
{'limits': array([-1.        , -0.93548387, -0.87096774, -0.80645161, -0.74193548,
        -0.67741935, -0.61290323, -0.5483871 , -0.48387097, -0.41935484,
        -0.35483871, -0.29032258, -0.22580645, -0.16129032, -0.09677419,
        -0.03225806,  0.03225806,  0.09677419,  0.16129032,  0.22580645,
         0.29032258,  0.35483871,  0.41935484,  0.48387097,  0.5483871 ,
         0.61290323,  0.67741935,  0.74193548,  0.80645161,  0.87096774,
         0.93548387,  1.00000001]),
 'dx': array([0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
        0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
        0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
        0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
        0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
        0.06451613, 0.06451613, 0.06451613, 0.06451613, 0.06451613,
        0.06451614]),
 'points_per_bin': array([320, 339, 318, 345, 313, 312, 331, 311, 293, 326, 309, 294, 314,
        312, 335, 318, 354, 328, 312, 331, 315, 307, 341, 320, 330, 336,
        312, 368, 325, 294, 337]),
 'bin_effect': array([0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        9.93884099, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]),
 'bin_variance': array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        82.51292593,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ]),
 'alg_params': 'fixed',
 'norm_const': np.float64(0.3206077740261283)}

Derivations

For x_1:

The x1-axis is divided into K equal bins, indexed by k=1,,K, with the center of the k-th bin denoted as ck. In each bin: if x2<0, the local effects is 2ck, and if x20 the local effect is 2ck. This gives a variance of 4ck2 within each bin. Therefore, h(x1)=4ck2 where k is the index of the bin that contains x1.

h(x1)=1|Sk|xiSk(f(zk,x2i,x3i)f(zk1,x2i,x3i)zkzk1μ^kALE)2=1|Sk|xiSk(2ck(zkzk1)1x2i<0+2ck(zkzk1)1x2i0zkzk1)2=1|Sk|xiSk4ck2=4ck2

For x2:

In all bins except the central one, the local effects are zero. In the central bin, however, the local effects are $ 2(x_1^i)^2 $, which introduces some heterogeneity. So, if $ x_2 $ is not in the central bin $ k = K/2 $, h(x2)=0. If $ x_2 $ is in the central bin $ k = K/2 $:

h(x2)=1|Sk|xiSk(f(zk,x2i,x3i)f(zk1,x2i,x3i)zkzk1μ^kALE)2=2zkzk11|Sk|xiSk(x1i)2=23(zkzk1)10 for K=31

So: h(x2)={0,if x2 is not in the central bin ( k=K/2), 31310if x2 is in the central bin ( k=K/2). 

For x3:

The effect is zero everywehere.

def ale_ground_truth(feature):
    K = 31
    bin_centers = np.linspace(-1 + 1/K, 1 - 1/K, K)
    if feature == 0:
        return 4*bin_centers**2
    elif feature == 1:
        y = np.zeros_like(bin_centers)
        y[15] = None
        return y
    elif feature == 2:
        return np.zeros_like(bin_centers)
# make a test
K = 31
bin_centers = np.linspace(-1 + 1/K, 1 - 1/K, K)
for feature in [0, 1, 2]:
    bin_var = ale.feature_effect[f"feature_{feature}"]["bin_variance"]
    gt_var = ale_ground_truth(feature)
    mask = ~np.isnan(gt_var)
    np.testing.assert_allclose(bin_var[mask], gt_var[mask], atol=1e-1)

Conclusions

Is the heterogeneity implied by the ALE plots meaningful? It is

RHALE

Effector

Let's see below the RHALE effects for each feature, using effector.

rhale = effector.RHALE(x, model.predict, model.jacobian, axis_limits=dataset.axis_limits)
rhale.fit(features="all", centering=True)

for feature in [0, 1, 2]:
    rhale.plot(feature=feature, centering=True, heterogeneity=True, y_limits=[-2, 2])

png

png

png

RHALE states that:

  • Feature x1: As in ALE, the heterogeneity varies across all values of x1. It starts large at x1=1, decreases until it becomes zero at x1=0, and then increases again until x1=1.

  • Feature x2:
    No heterogeneity is present for this feature.

  • Feature x3:
    No heterogeneity is present for this feature.

Derivations

H(x1)
H(x2)
H(x3)

Conclusions

Are the RHALE effects intuitive?