Imputer API¶
This is the API for the imputer class.
- class
censorfix.
censorImputer
(sample_posterior=True, column_choice='auto', initial_point='auto', distribution='gaussian', missing_values=nan, no_columns='all', max_iter=5, stan_iterations=4000, debug=True, n_jobs=8, imputation_order='ascending', number_imputations=1)¶
__init__
(sample_posterior=True, column_choice='auto', initial_point='auto', distribution='gaussian', missing_values=nan, no_columns='all', max_iter=5, stan_iterations=4000, debug=True, n_jobs=8, imputation_order='ascending', number_imputations=1)¶Multivariate imputer that multiply imputes censored values.
This is a strategy for dealing with missing and censored values in data sets. It can handle both lower and upper censoring points.
Parameters:
- sample_posterior (bool) – whether to use the best prediction at each step or a bayesian imputation
- distribution (gaussian, t-distribution, skew-normal, exponential) – the distribution to use for the experiment
- missing_values (str) – the placeholder for missing values that will be imputed
- max_iter (int) – the number of cycles
- no_columns (int) – how many columns to use for the imputation
- stan_iterations (int) – number of iterations for Stan to run
- imputation_order (str ascending) – the order of imputations
- debug (bool) – display debug information
- number_imputations (int) – the number of imputations required
impute
(data, right_cen=None, left_cen=None, iter_val=1)¶impute multiple columns in an iterative style
returns the data in a sorted form if multiple imputations are requested data is returned
Parameters:
- data (pandas dataframe) – the data as a pandas dataframe
- right_cen (list of doubles) – the right censoring points of the data NA if no censoring
- left_cen (list of doubles) – the left censoring points of the data NA if no censoring
- iter_val (int) – the number of imputation rounds to perform
Returns: Dataset with imputed values
Return type: array
impute_once
(y, X, U, L)¶impute one column of censored values using Stan with chosen options
Parameters:
- y (array like) – censored values
- X (array like) – independent values
- U (double) – the upper censored values
- L (double) – the lower censored values