Matcher
- class Matcher[source]
-
An interface for a named entity normalizer.
Methods Summary
get_best_match(-> ~ssslm.ner.Match | None)Get matches in the SSSLM format.
get_matches(text, **kwargs)Get matches in the SSSLM format.
ground_df(df, column, *[, target_column, ...])Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.
Return if the matcher has entries in it.
Methods Documentation
- get_best_match(text: str, *, strict: Literal[False] = False, **kwargs: Any) Match | None[source]
- get_best_match(text: str, *, strict: Literal[True] = False, **kwargs: Any) Match
Get matches in the SSSLM format.
- abstractmethod get_matches(text: str, **kwargs: Any) list[Match][source]
Get matches in the SSSLM format.
- ground_df(df: pd.DataFrame, column: str | int, *, target_column: None | str | int = None, target_type: PandasTargetType | str = PandasTargetType.curie, **kwargs: Any) None[source]
Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.
- Parameters:
df – A pandas dataframe
column – The column to ground. This column contains text corresponding to named entities’ labels or synonyms
target_column – The column where to put the groundings (either a CURIE string, or None). It’s possible to create a new column when passing a string for this argument. If not given, will create a new column name like
<source column>_grounded.target_type – The type to fill columns with
kwargs – Keyword arguments passed to
Grounder.ground(), could include context, organisms, or namespaces.
import pandas as pd import ssslm INDEX = "phenotype" mappings_url = f"https://github.com/biopragmatics/biolexica/raw/main/lexica/{INDEX}/{INDEX}.ssslm.tsv.gz" grounder = ssslm.make_grounder(mappings_url) data_url = "https://raw.githubusercontent.com/OBOAcademy/obook/master/docs/tutorial/linking_data/data.csv" df = pd.read_csv(data_url) grounder.ground_df(df, "disease", target_column="disease_curie")