Matcher

Bases: ABC, Generic[R]

An interface for a named entity normalizer.

Methods Summary

`empty`()	Return if the matcher doesn't entries in it.
`get_best_match`(-> ~ssslm.ner.Match \| None)	Get matches in the SSSLM format.
`get_matches`(text, **kwargs)	Get matches in the SSSLM format.
`ground_df`(df, column, *[, target_column, ...])	Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.
`not_empty`()	Return if the matcher has entries in it.

Methods Documentation

empty() → bool[source]: Return if the matcher doesn’t entries in it.

get_best_match(text: str, *, strict: Literal[False] = False, **kwargs: Any) → Match | None[source]
get_best_match(text: str, *, strict: Literal[True] = False, **kwargs: Any) → Match: Get matches in the SSSLM format.

abstractmethod get_matches(text: str, **kwargs: Any) → list[Match][source]: Get matches in the SSSLM format.

ground_df(df: pd.DataFrame, column: str | int, *, target_column: None | str | int = None, target_type: PandasTargetType | str = PandasTargetType.curie, **kwargs: Any) → None[source]

Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.

Parameters:

df – A pandas dataframe
column – The column to ground. This column contains text corresponding to named entities’ labels or synonyms
target_column – The column where to put the groundings (either a CURIE string, or None). It’s possible to create a new column when passing a string for this argument. If not given, will create a new column name like <source column>_grounded.
target_type – The type to fill columns with
kwargs – Keyword arguments passed to Grounder.ground(), could include context, organisms, or namespaces.

import pandas as pd
import ssslm

INDEX = "phenotype"
mappings_url = f"https://github.com/biopragmatics/biolexica/raw/main/lexica/{INDEX}/{INDEX}.ssslm.tsv.gz"

grounder = ssslm.make_grounder(mappings_url)

data_url = "https://raw.githubusercontent.com/OBOAcademy/obook/master/docs/tutorial/linking_data/data.csv"
df = pd.read_csv(data_url)

grounder.ground_df(df, "disease", target_column="disease_curie")

abstractmethod not_empty() → bool[source]: Return if the matcher has entries in it.