Matcher

class Matcher[source]

Bases: ABC, Generic[R]

An interface for a named entity normalizer.

Methods Summary

get_best_match(-> ~ssslm.ner.Match | None)

Get matches in the SSSLM format.

get_matches(text, **kwargs)

Get matches in the SSSLM format.

ground_df(df, column, *[, target_column, ...])

Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.

not_empty()

Return if the matcher has entries in it.

Methods Documentation

get_best_match(text: str, *, strict: Literal[False] = False, **kwargs: Any) Match | None[source]
get_best_match(text: str, *, strict: Literal[True] = False, **kwargs: Any) Match

Get matches in the SSSLM format.

abstractmethod get_matches(text: str, **kwargs: Any) list[Match][source]

Get matches in the SSSLM format.

ground_df(df: pd.DataFrame, column: str | int, *, target_column: None | str | int = None, target_type: PandasTargetType | str = PandasTargetType.curie, **kwargs: Any) None[source]

Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.

Parameters:
  • df – A pandas dataframe

  • column – The column to ground. This column contains text corresponding to named entities’ labels or synonyms

  • target_column – The column where to put the groundings (either a CURIE string, or None). It’s possible to create a new column when passing a string for this argument. If not given, will create a new column name like <source column>_grounded.

  • target_type – The type to fill columns with

  • kwargs – Keyword arguments passed to Grounder.ground(), could include context, organisms, or namespaces.

import pandas as pd
import ssslm

INDEX = "phenotype"
mappings_url = f"https://github.com/biopragmatics/biolexica/raw/main/lexica/{INDEX}/{INDEX}.ssslm.tsv.gz"

grounder = ssslm.make_grounder(mappings_url)

data_url = "https://raw.githubusercontent.com/OBOAcademy/obook/master/docs/tutorial/linking_data/data.csv"
df = pd.read_csv(data_url)

grounder.ground_df(df, "disease", target_column="disease_curie")
abstractmethod not_empty() bool[source]

Return if the matcher has entries in it.