ehrapy.preprocessing.knn_impute¶
- ehrapy.preprocessing.knn_impute(adata, var_names=None, *, n_neighbours=5, copy=False, backend='faiss', warning_threshold=70, backend_kwargs=None)[source]¶
Imputes missing values in the input AnnData object using K-nearest neighbor imputation.
When using KNN Imputation with mixed data (non-numerical and numerical), encoding using ordinal encoding is required since KNN Imputation can only work on numerical data. The encoding itself is just a utility and will be undone once imputation ran successfully.
- Parameters:
adata (
AnnData
) – An annotated data matrix containing EHR data.var_names (
Iterable
[str
] |None
) – A list of variable names indicating which columns to impute. If None, all columns are imputed. Default is None.n_neighbours (
int
) – Number of neighbors to use when performing the imputation. Defaults to 5.copy (
bool
) – Whether to perform the imputation on a copy of the original AnnData object. If True, the original object remains unmodified. Defaults to False.backend (
Literal
['scikit-learn'
,'faiss'
]) – The implementation to use for the KNN imputation. ‘scikit-learn’ is very slow but uses an exact KNN algorithm, whereas ‘faiss’ is drastically faster but uses an approximation for the KNN graph. In practice, ‘faiss’ is close enough to the ‘scikit-learn’ results.warning_threshold (
int
) – Percentage of missing values above which a warning is issued. Defaults to 70.backend_kwargs (
dict
|None
) – Passed to the backend. Pass “mean”, “median”, or “weighted” for ‘strategy’ to set the imputation strategy for faiss. See sklearn.impute.KNNImputer for more information on the ‘scikit-learn’ backend. See fknni.faiss.FaissImputer for more information on the ‘faiss’ backend.
- Return type:
- Returns:
An updated AnnData object with imputed values.
- Raises:
ValueError – If the input data matrix contains only categorical (non-numeric) values.
Examples
>>> import ehrapy as ep >>> adata = ep.dt.mimic_2(encoded=True) >>> ep.ad.infer_feature_types(adata) >>> ep.pp.knn_impute(adata)