ceteris_paribus package¶
Subpackages¶
Submodules¶
ceteris_paribus.explainer module¶
-
class
ceteris_paribus.explainer.
Explainer
(model, var_names, data, y, predict_fun, label)¶ -
data
¶ Alias for field number 2
-
label
¶ Alias for field number 5
-
model
¶ Alias for field number 0
-
predict_fun
¶ Alias for field number 4
-
var_names
¶ Alias for field number 1
-
y
¶ Alias for field number 3
-
-
ceteris_paribus.explainer.
explain
(model, variable_names=None, data=None, y=None, predict_function=None, label=None)¶ This function creates a unified representation of a model, which can be further processed by various explainers
Parameters: - model – a model to be explained
- variable_names – names of variables, if not supplied then derived from data
- data – data that was used for fitting
- y – labels for the data
- predict_function – function that takes the data and returns predictions
- label – label of the model, if not supplied the function will try to infer it from the model object, otherwise unset
Returns: Explainer object
ceteris_paribus.gower module¶
Gower Distance is a distance measure, that might be used to calculate the similarity between two observations with both categorical and numerical values. It also permits missing values in categorical variables. Therefore this measure might be applied in any dataset. Here we use it as a default function for finding the closest observations to the given one.
The original paper describing the idea might be found here.
This is the module for calculating gower’s distance/dissimilarity
-
ceteris_paribus.gower.
gower_distances
(data, observation)¶ Return an array of distances between all observations and a chosen one Based on: https://sourceforge.net/projects/gower-distance-4python https://beta.vu.nl/nl/Images/stageverslag-hoven_tcm235-777817.pdf
ceteris_paribus.profiles module¶
-
class
ceteris_paribus.profiles.
CeterisParibus
(explainer, new_observation, y, selected_variables, grid_points, variable_splits)¶ Bases:
object
-
print_profile
()¶
-
set_label
(label)¶
-
split_by
(column)¶ Split cp profile data frame by values of a given column
Returns: sorted mapping of values to dataframes
-
-
ceteris_paribus.profiles.
individual_variable_profile
(explainer, new_observation, y=None, variables=None, grid_points=101, variable_splits=None)¶ Calculate ceteris paribus profile
Parameters: - explainer – a model to be explained
- new_observation – a new observation for which the profiles are calculated
- y – y true labels for new_observation. If specified then will be added to ceteris paribus plots
- variables – collection of variables selected for calculating profiles
- grid_points – number of points for profile
- variable_splits – dictionary of splits for variables, in most cases created with _calculate_variable_splits(). If None then it will be calculated based on validation data avaliable in the explainer.
Returns: instance of CeterisParibus class
ceteris_paribus.select_data module¶
-
ceteris_paribus.select_data.
select_neighbours
(data, observation, y=None, variable_names=None, selected_variables=None, dist_fun='gower', n=20)¶ Select observations from dataset, that are similar to a given observation
Parameters: - data – array or DataFrame with observations
- observation – reference observation for neighbours selection
- y – labels for observations
- variable_names – names of variables
- selected_variables – selected variables - require supplying variable names along with data
- dist_fun – ‘gower’ or distance function, as pairwise distances in sklearn, gower works with missing data
- n – size of the sample
Returns: DataFrame with selected observations and pandas Series with corresponding labels if provided
-
ceteris_paribus.select_data.
select_sample
(data, y=None, n=15, seed=42)¶ Select sample from dataset.
Parameters: - data – array or dataframe with observations
- y – labels for observations
- n – size of the sample
- seed – seed for random number generator
Returns: selected observations and corresponding labels if provided