Imputer API

This is the API for the imputer class.

class censorfix.censorImputer(sample_posterior=True, column_choice='auto', initial_point='auto', distribution='gaussian', missing_values=nan, no_columns='all', max_iter=5, stan_iterations=4000, debug=True, n_jobs=8, imputation_order='ascending', number_imputations=1)
__init__(sample_posterior=True, column_choice='auto', initial_point='auto', distribution='gaussian', missing_values=nan, no_columns='all', max_iter=5, stan_iterations=4000, debug=True, n_jobs=8, imputation_order='ascending', number_imputations=1)

Multivariate imputer that multiply imputes censored values.

This is a strategy for dealing with missing and censored values in data sets. It can handle both lower and upper censoring points.

Parameters:
  • sample_posterior (bool) – whether to use the best prediction at each step or a bayesian imputation
  • distribution (gaussian, t-distribution, skew-normal, exponential) – the distribution to use for the experiment
  • missing_values (str) – the placeholder for missing values that will be imputed
  • max_iter (int) – the number of cycles
  • no_columns (int) – how many columns to use for the imputation
  • stan_iterations (int) – number of iterations for Stan to run
  • imputation_order (str ascending) – the order of imputations
  • debug (bool) – display debug information
  • number_imputations (int) – the number of imputations required
impute(data, right_cen=None, left_cen=None, iter_val=1)

impute multiple columns in an iterative style

returns the data in a sorted form if multiple imputations are requested data is returned

Parameters:
  • data (pandas dataframe) – the data as a pandas dataframe
  • right_cen (list of doubles) – the right censoring points of the data NA if no censoring
  • left_cen (list of doubles) – the left censoring points of the data NA if no censoring
  • iter_val (int) – the number of imputation rounds to perform
Returns:

Dataset with imputed values

Return type:

array

impute_once(y, X, U, L)

impute one column of censored values using Stan with chosen options

Parameters:
  • y (array like) – censored values
  • X (array like) – independent values
  • U (double) – the upper censored values
  • L (double) – the lower censored values