ehrapy.preprocessing.iterative_svd_impute#
- ehrapy.preprocessing.iterative_svd_impute(adata, var_names=None, copy=False, warning_threshold=30, rank=10, convergence_threshold=1e-05, max_iters=200, gradual_rank_increase=True, svd_algorithm='arpack', init_fill_method='mean', min_value=None, max_value=None, verbose=False)[source]#
Impute missing values in an AnnData object using the IterativeSVD algorithm.
The IterativeSVD algorithm is a matrix completion method based on iterative low-rank singular value decomposition (SVD). This function can impute missing values for numerical and ordinal-encoded data.
- Parameters:
adata (
AnnData
) – An AnnData object to impute missing values in.var_names (
Optional
[Iterable
[str
]]) – A list of var names indicating which columns to impute. If None, all columns will be imputed. Defaults to None.copy (
bool
) – Whether to return a copy of the AnnData object or act in place. Defaults to False.warning_threshold (
int
) – Threshold of percentage of missing values to display a warning for. Defaults to 30.rank (
int
) – Rank of the SVD decomposition. Defaults to 10.convergence_threshold (
float
) – Convergence threshold for the iterative algorithm. The algorithm stops when the relative difference in Frobenius norm between two iterations is less than convergence_threshold. Defaults to 0.00001.max_iters (
int
) – Maximum number of iterations. The algorithm stops after max_iters iterations if it does not converge. Defaults to 200.gradual_rank_increase (
bool
) – Whether to increase the rank gradually or to use the rank value immediately. Defaults to True.svd_algorithm (
Literal
['arpack'
,'randomized'
]) – The SVD algorithm to use. Can be one of {‘arpack’, ‘randomized’}. Defaults to arpack.init_fill_method (
Literal
['zero'
,'mean'
,'median'
]) – The fill method to use for initializing missing values. Can be one of {‘zero’, ‘mean’, ‘median’}. Defaults to mean.min_value (
Optional
[float
]) – The minimum value allowed for the imputed data. Any imputed value less than min_value is clipped to min_value. Defaults to None.max_value (
Optional
[float
]) – The maximum value allowed for the imputed data. Any imputed value greater than max_value is clipped to max_value. Defaults to None.verbose (
bool
) – Whether to print progress messages during the imputation. Defaults to False.
- Return type:
- Returns:
An AnnData object with imputed values.
- Raises:
ValueError – If svd_algorithm is not one of {‘arpack’, ‘randomized’}.
ValueError – If init_fill_method is not one of {‘zero’, ‘mean’, ‘median’}.
Examples
>>> import ehrapy as ep >>> adata = ep.dt.mimic_2(encoded=True) >>> ep.pp.iterative_svd_impute(adata)