ceteris_paribus package

Submodules

ceteris_paribus.explainer module

class ceteris_paribus.explainer.Explainer(model, var_names, data, y, predict_fun, label)
data

Alias for field number 2

label

Alias for field number 5

model

Alias for field number 0

predict_fun

Alias for field number 4

var_names

Alias for field number 1

y

Alias for field number 3

ceteris_paribus.explainer.explain(model, variable_names=None, data=None, y=None, predict_function=None, label=None)

This function creates a unified representation of a model, which can be further processed by various explainers

Parameters:
  • model – a model to be explained
  • variable_names – names of variables, if not supplied then derived from data
  • data – data that was used for fitting
  • y – labels for the data
  • predict_function – function that takes the data and returns predictions
  • label – label of the model, if not supplied the function will try to infer it from the model object, otherwise unset
Returns:

Explainer object

ceteris_paribus.gower module

Gower Distance is a distance measure, that might be used to calculate the similarity between two observations with both categorical and numerical values. It also permits missing values in categorical variables. Therefore this measure might be applied in any dataset. Here we use it as a default function for finding the closest observations to the given one.

The original paper describing the idea might be found here.

This is the module for calculating gower’s distance/dissimilarity

ceteris_paribus.gower.gower_distances(data, observation)

Return an array of distances between all observations and a chosen one Based on: https://sourceforge.net/projects/gower-distance-4python https://beta.vu.nl/nl/Images/stageverslag-hoven_tcm235-777817.pdf

ceteris_paribus.profiles module

class ceteris_paribus.profiles.CeterisParibus(explainer, new_observation, y, selected_variables, grid_points, variable_splits)

Bases: object

print_profile()
set_label(label)
split_by(column)

Split cp profile data frame by values of a given column

Returns:sorted mapping of values to dataframes
ceteris_paribus.profiles.individual_variable_profile(explainer, new_observation, y=None, variables=None, grid_points=101, variable_splits=None)

Calculate ceteris paribus profile

Parameters:
  • explainer – a model to be explained
  • new_observation – a new observation for which the profiles are calculated
  • y – y true labels for new_observation. If specified then will be added to ceteris paribus plots
  • variables – collection of variables selected for calculating profiles
  • grid_points – number of points for profile
  • variable_splits – dictionary of splits for variables, in most cases created with _calculate_variable_splits(). If None then it will be calculated based on validation data avaliable in the explainer.
Returns:

instance of CeterisParibus class

ceteris_paribus.select_data module

ceteris_paribus.select_data.select_neighbours(data, observation, y=None, variable_names=None, selected_variables=None, dist_fun='gower', n=20)

Select observations from dataset, that are similar to a given observation

Parameters:
  • data – array or DataFrame with observations
  • observation – reference observation for neighbours selection
  • y – labels for observations
  • variable_names – names of variables
  • selected_variables – selected variables - require supplying variable names along with data
  • dist_fun – ‘gower’ or distance function, as pairwise distances in sklearn, gower works with missing data
  • n – size of the sample
Returns:

DataFrame with selected observations and pandas Series with corresponding labels if provided

ceteris_paribus.select_data.select_sample(data, y=None, n=15, seed=42)

Select sample from dataset.

Parameters:
  • data – array or dataframe with observations
  • y – labels for observations
  • n – size of the sample
  • seed – seed for random number generator
Returns:

selected observations and corresponding labels if provided

Module contents