Note
This page was generated from ontology_mapping.ipynb. Some tutorial content may look better in light mode.
Ontology mapping¶
Ontologies are structured and standardized representations of knowledge in a specific domain, defining the concepts, relationships, and properties within that domain. They matter for Electronic Health Records (EHR) as they provide a common vocabulary and framework for organizing and integrating healthcare data. By using ontologies, EHR systems can improve interoperability, semantic understanding, and facilitate effective data exchange, leading to enhanced decision support, data analysis, and collaboration among healthcare providers and also analysts.
ehrapy is compatible with Bionty which provides access to public ontologies and functionality to map values against them.
Here, we’ll create an artificial AnnData object containing different diseases that we will map against to ensure that all of our annotations adhere to ontologies.
[1]:
import anndata as ad
import numpy as np
import pandas as pd
Create an AnnData object with disease annotations in the obs
slot.
[2]:
adata = ad.AnnData(
X=np.random.random((3, 3)),
var=pd.DataFrame(index=[f"Lab value {val}" for val in range(3)]),
obs=pd.DataFrame(
columns=["Immune system disorders", "nervous system disorder", "injury"],
data=[
["Rheumatoid arthritis", "Alzheimer's disease", "Fracture"],
["Celiac disease", "Parkinson's disease", "Traumatic brain injury"],
["Multipla sclurosis", "Epilepsy", "Fractured Femur"],
],
),
)
adata
/home/zeth/miniconda3/envs/ehrapy/lib/python3.11/site-packages/anndata/_core/anndata.py:183: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
[2]:
AnnData object with n_obs × n_vars = 3 × 3
obs: 'Immune system disorders', 'nervous system disorder', 'injury'
[3]:
adata.obs
[3]:
Immune system disorders | nervous system disorder | injury | |
---|---|---|---|
0 | Rheumatoid arthritis | Alzheimer's disease | Fracture |
1 | Celiac disease | Parkinson's disease | Traumatic brain injury |
2 | Multipla sclurosis | Epilepsy | Fractured Femur |
We notice that one of our injuries does not exist and we expect to have to correct it later.
Introduction to Bionty¶
First we import Bionty.
[4]:
import bionty_base as bt
✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!
if you see this message repeatedly, run: import bionty_base; bionty_base.reset_sources()
Bionty provides support for several ontologies related to diseases.
[5]:
bt.display_available_sources().loc["Disease"]
[5]:
source | organism | version | url | md5 | source_name | source_website | |
---|---|---|---|---|---|---|---|
entity | |||||||
Disease | mondo | all | 2024-02-06 | http://purl.obolibrary.org/obo/mondo/releases/... | 78914fa236773c5ea6605f7570df6245 | Mondo Disease Ontology | https://mondo.monarchinitiative.org |
Disease | mondo | all | 2023-08-02 | http://purl.obolibrary.org/obo/mondo/releases/... | 7f33767422042eec29f08b501fc851db | Mondo Disease Ontology | https://mondo.monarchinitiative.org |
Disease | mondo | all | 2023-04-04 | http://purl.obolibrary.org/obo/mondo/releases/... | 700c43dd9ba51aecc7a8edfc3bc2dab1 | Mondo Disease Ontology | https://mondo.monarchinitiative.org |
Disease | mondo | all | 2023-02-06 | http://purl.obolibrary.org/obo/mondo/releases/... | 2b7d479d4bd02a94eab47d1c9e64c5db | Mondo Disease Ontology | https://mondo.monarchinitiative.org |
Disease | mondo | all | 2022-10-11 | http://purl.obolibrary.org/obo/mondo/releases/... | 04b808d05c2c2e81430b20a0e87552bb | Mondo Disease Ontology | https://mondo.monarchinitiative.org |
Disease | doid | human | 2024-01-31 | http://purl.obolibrary.org/obo/doid/releases/2... | b36c15a4610757094f8db64b78ae2693 | Human Disease Ontology | https://disease-ontology.org |
Disease | doid | human | 2023-03-31 | http://purl.obolibrary.org/obo/doid/releases/2... | 64f083a1e47867c307c8eae308afc3bb | Human Disease Ontology | https://disease-ontology.org |
Disease | doid | human | 2023-01-30 | http://purl.obolibrary.org/obo/doid/releases/2... | 9f0c92ad2896dda82195e9226a06dc36 | Human Disease Ontology | https://disease-ontology.org |
Disease | icd | human | icd-11-2023 | s3://bionty-assets/df_human__icd__icd-11-2023_... | 16263aef644d2c62c47b7b1ecfbad9d6 | International Classification of Diseases (ICD) | https://www.cdc.gov/nchs/icd/icd9cm.htm |
Disease | icd | human | icd-10-2020 | s3://bionty-assets/df_human__icd__icd-10-2020_... | 93ec5734fcc2edd64686d5ffc6f6105f | International Classification of Diseases (ICD) | https://www.cdc.gov/nchs/icd/icd9cm.htm |
Disease | icd | human | icd-9-2011 | s3://bionty-assets/df_human__icd__icd-9-2011__... | cb3aefb3c4f7b2c47bf3de38453350c7 | International Classification of Diseases (ICD) | https://www.cdc.gov/nchs/icd/icd9cm.htm |
Disease | icd | human | icd-10-2024 | s3://bionty-assets/df_human__icd__icd-10-2024_... | None | International Classification of Diseases (ICD) | https://www.cdc.gov/nchs/icd/icd9cm.htm |
Bionty provides three key functionalities:
inspect
: Check whether any of our values (here diseases) are mappable against a specified ontology.map_synonyms
: Map values against synonyms. This is not relevant for our diseases.curate
: Curate ontology values against the ontology to ensure compliance.
Mapping against the MONDO Disease Ontology with Bionty¶
We will now showcase how to access the Mondo Disease Ontology with Bionty. The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world.
There are several different sources available that provide definitions and data models for diseases, such as HPO, OMIM, SNOMED CT, ICD, PhenoDB, MedDRA, MedGen, ORDO, DO, GARD, and others. However, these sources often overlap and sometimes conflict with each other, making it challenging to understand how they are related.
To address the need for a unified disease terminology that offers precise equivalences between disease concepts, Mondo was developed. Mondo is designed to unify multiple disease resources using a logic-based structure.
Bionty is centered around Bionty entity objects that provide the above introduced functionality. We’ll now create a Bionty Disease object with the MONDO ontology as our source and a specific version for reproducibility.
[6]:
disease_bionty = bt.Disease(source="mondo", version="2023-02-06")
disease_bionty
[6]:
PublicOntology
Entity: Disease
Organism: all
Source: mondo, 2023-02-06
#terms: 25913
We can access the DataFrame that contains all ontology terms:
[7]:
disease_bionty.df()
[7]:
name | definition | synonyms | parents | |
---|---|---|---|---|
ontology_id | ||||
MONDO:0000001 | disease or disorder | A Disease Is A Disposition To Undergo Patholog... | disorders|medical condition|other disease|dise... | [] |
MONDO:0000002 | obsolete 46,XX sex reversal | None | None | [] |
MONDO:0000003 | obsolete 17-hydroxysteroid dehydrogenase defic... | None | None | [] |
MONDO:0000004 | adrenocortical insufficiency | An Endocrine Or Hormonal Disorder That Occurs ... | adrenal gland insufficiency|adrenal cortical i... | [MONDO:0002816] |
MONDO:0000005 | alopecia, isolated | None | None | [MONDO:0021034] |
... | ... | ... | ... | ... |
MONDO:8000030 | obsolete morphological anomaly | None | None | [] |
MONDO:8000031 | obsolete subtype of a disorder | None | None | [] |
MONDO:8000032 | obsolete malformation syndrome | None | None | [] |
MONDO:8000033 | obsolete group of disorders | None | None | [] |
MONDO:8000034 | obsolete disorder | None | None | [] |
25913 rows × 4 columns
Let’s inspect all of our “Immune system disorders” to learn which terms map against the MONDO Disease ontology.
[8]:
disease_bionty.inspect(
adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)
❗ 3 terms (100.00%) are not validated for name: Rheumatoid arthritis, Celiac disease, Multipla sclurosis
detected 2 terms with inconsistent casing/synonyms: Rheumatoid arthritis, Celiac disease
→ standardize terms via .standardize()
[8]:
__validated__ | |
---|---|
Rheumatoid arthritis | False |
Celiac disease | False |
Multipla sclurosis | False |
None of the values can be validated immediately, but “Rheumatoid arthritis” and “Celiac disease” have synonyms and can be standardized.
[9]:
adata.obs["Immune system disorders"] = disease_bionty.standardize(adata.obs["Immune system disorders"], field=disease_bionty.name)
💡 standardized 2/3 terms
[10]:
disease_bionty.inspect(
adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)
✅ 2 terms (66.70%) are validated for name
❗ 1 term (33.30%) is not validated for name: Multipla sclurosis
[10]:
__validated__ | |
---|---|
rheumatoid arthritis | True |
celiac disease | True |
Multipla sclurosis | False |
We can use Bionty’s lookup functionality to try to find the corresponding term in the MONDO Disease ontology for the terms that could not be mapped using auto-complete. For this purpose we create a lookup object.
[11]:
disease_bionty_lookup = disease_bionty.lookup()
[12]:
disease_bionty_lookup.multiple_sclerosis
[12]:
Disease(ontology_id='MONDO:0005301', name='multiple sclerosis', definition='A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.', synonyms=None, parents=array(['MONDO:0006704', 'MONDO:0000568', 'MONDO:0002562', 'MONDO:0005560'],
dtype=object), _5='multiple sclerosis')
We found a match! Let’s look at the definition of our result.
[13]:
disease_bionty_lookup.multiple_sclerosis.definition
[13]:
'A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.'
This is exactly what we’ve been looking for. We can also search directly.
[14]:
disease_bionty.search(
"Multipla sclurosis", field=disease_bionty.name, case_sensitive=False
)
[14]:
ontology_id | definition | synonyms | parents | __agg__ | __ratio__ | |
---|---|---|---|---|---|---|
name | ||||||
multiple sclerosis | MONDO:0005301 | A Progressive Autoimmune Disorder Affecting Th... | None | [MONDO:0006704, MONDO:0000568, MONDO:0002562, ... | multiple sclerosis | 88.888889 |
multiple sclerosis variant | MONDO:0016428 | None | None | [MONDO:0005071] | multiple sclerosis variant | 72.727273 |
pediatric multiple sclerosis | MONDO:0018784 | Pediatric Multiple Sclerosis (Ms) Is A Rare Mu... | None | [MONDO:0016428] | pediatric multiple sclerosis | 69.565217 |
lateral sclerosis | MONDO:0018155 | Primary Lateral Sclerosis (Pls) Is An Idiopath... | primary lateral sclerosis|adult-onset PLS|PLS|... | [MONDO:0024257] | lateral sclerosis | 68.571429 |
glomerulosclerosis | MONDO:0000490 | A Hardening Of The Kidney Glomerulus Caused By... | glomerular sclerosis | [MONDO:0019722] | glomerulosclerosis | 68.421053 |
... | ... | ... | ... | ... | ... | ... |
BAFopathy | MONDO:0700120 | Disorder Caused By Mutations In The Various Su... | None | [MONDO:0003847] | bafopathy | 14.814815 |
hydrocele | MONDO:0004920 | None | None | [MONDO:0003150] | hydrocele | 14.814815 |
XH antigen | MONDO:0010760 | None | XH antigen | [MONDO:0003847] | xh antigen | 14.285714 |
angiomyxoma | MONDO:0006086 | A Benign Soft Tissue Neoplasm Characterized By... | None | [MONDO:0021581, MONDO:0044335] | angiomyxoma | 13.793103 |
Pygmy | MONDO:0009941 | None | Pygmy | [MONDO:0003847] | pygmy | 8.695652 |
25913 rows × 6 columns
Now we can finally replace the values of our obs column with the MONDO Disease ontology values.
[15]:
adata.obs["Immune system disorders"].replace({"Multipla sclurosis": disease_bionty_lookup.multiple_sclerosis.name},
inplace=True)
adata.obs["Immune system disorders"]
/tmp/ipykernel_305804/3382110660.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
adata.obs["Immune system disorders"].replace({"Multipla sclurosis": disease_bionty_lookup.multiple_sclerosis.name},
[15]:
0 rheumatoid arthritis
1 celiac disease
2 multiple sclerosis
Name: Immune system disorders, dtype: object
[16]:
disease_bionty.inspect(
adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)
✅ 3 terms (100.00%) are validated for name
[16]:
__validated__ | |
---|---|
rheumatoid arthritis | True |
celiac disease | True |
multiple sclerosis | True |
Voilà, all of our immune system disorders are mapped against the ontology. We could now repeat this process for all other columns.
Mapping against other Disease ontologies¶
Bionty supports other ontologies besides the MONDO Disease Ontology like the Disease Ontology or ICD. The workflow is the same.
We solely need to adapt the source and the version.
[17]:
disease_bionty = bt.Disease(source="icd", version="icd-11-2023")
disease_bionty
[17]:
PublicOntology
Entity: Disease
Organism: human
Source: icd, icd-11-2023
#terms: 35574
The remaining workflow would be the same as above.
Conclusion¶
ehrapy provides support for ontology management, inspection and mapping through Bionty. Bionty provide access to ontologies such as the Mondo Disease Ontology, Disease Ontology and many others. To access these ontologies we create a Bionty Disease objects that have class functions to map synonyms and to inspect data for adherence against ontologies. Mismatches can be remedied by finding the actual correct ontology name using lookup objects or fuzzy matching.