Imputer API¶

This is the API for the imputer class.

class censorfix.censorImputer(sample_posterior=True, column_choice='auto', initial_point='auto', distribution='gaussian', missing_values=nan, no_columns='all', max_iter=5, stan_iterations=4000, debug=True, n_jobs=8, imputation_order='ascending', number_imputations=1)¶

__init__(sample_posterior=True, column_choice='auto', initial_point='auto', distribution='gaussian', missing_values=nan, no_columns='all', max_iter=5, stan_iterations=4000, debug=True, n_jobs=8, imputation_order='ascending', number_imputations=1)¶

Multivariate imputer that multiply imputes censored values.

This is a strategy for dealing with missing and censored values in data sets. It can handle both lower and upper censoring points.

Parameters:

sample_posterior (bool) – whether to use the best prediction at each step or a bayesian imputation

distribution (gaussian, t-distribution, skew-normal, exponential) – the distribution to use for the experiment

missing_values (str) – the placeholder for missing values that will be imputed

max_iter (int) – the number of cycles

no_columns (int) – how many columns to use for the imputation

stan_iterations (int) – number of iterations for Stan to run

imputation_order (str ascending) – the order of imputations

debug (bool) – display debug information

number_imputations (int) – the number of imputations required

impute(data, right_cen=None, left_cen=None, iter_val=1)¶

impute multiple columns in an iterative style

returns the data in a sorted form if multiple imputations are requested data is returned

Parameters:

data (pandas dataframe) – the data as a pandas dataframe

right_cen (list of doubles) – the right censoring points of the data NA if no censoring

left_cen (list of doubles) – the left censoring points of the data NA if no censoring

iter_val (int) – the number of imputation rounds to perform

Returns:
Dataset with imputed values

Return type:
array

impute_once(y, X, U, L)¶

impute one column of censored values using Stan with chosen options

Parameters:

y (array like) – censored values

X (array like) – independent values

U (double) – the upper censored values

L (double) – the lower censored values