API Reference

This section provides a detailed API reference for the core components of the europmc-dev-tool package.

API Clients

class europmc_dev_tool.api.client.BaseClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)

Base client for Europe PMC APIs.

class europmc_dev_tool.api.client.RateLimiter(rate: float, per: float)

Token-bucket rate limiter to throttle requests.

class europmc_dev_tool.api.articles.ArticlesClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)

Client for the Europe PMC Articles RESTful API.

get_article(source: str, article_id: str, result_type: str = 'core') dict

Fetch metadata for a single article by ID.

get_fulltext_xml(article_id: str) str

Fetch the full-text XML for an open-access article.

get_references(source: str, article_id: str, page: int = 1, page_size: int = 25) dict

Fetch references for a single article by ID.

search(query: str, page: int = 1, page_size: int = 25, result_type: str = 'core') dict

Search articles via /search endpoint.

class europmc_dev_tool.api.annotations.AnnotationsClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)

Client for the Europe PMC Annotations API. See: https://europepmc.org/AnnotationsApi

get_by_article_ids(article_ids: list, provider: str | None = None) dict

Retrieve text-mined annotations for given article IDs. IDs should be in the format: SOURCE:ID, e.g., PMC:11704132

get_by_entity(entity: str, provider: str | None = None) dict

Find articles that cite a specific entity (e.g., gene, chemical).

get_by_section_and_or_type(annotation_type: str, subtype: str | None = None, section: str | None = None, provider: str | None = None, filter: int = 1, page_size: int = 4, cursor_mark: str = '0.0') dict

Get annotations of a specific type, optionally filtered by subtype and section.

class europmc_dev_tool.api.grants.GrantsClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)

Client for the Europe PMC Grants RESTful API.

search(query: str, page: int = 1, page_size: int = 25) dict

Search grants via /search endpoint.

class europmc_dev_tool.api.oai.OAIClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)

Client for the Europe PMC OAI-PMH service.

harvest(verb: str = 'ListRecords', metadata_prefix: str = 'oai_dc', from_date: str | None = None, until: str | None = None, set_spec: str | None = None) str

Harvest metadata via OAI-PMH.

JATS Processor

class europmc_dev_tool.jats_processor.XMLProcessor(sentenciser=True)

A class to process JATS XML content.

This processor handles the parsing of JATS XML, cleaning, structuring the content into sections, and optionally splitting text into sentences.

Accession Number and Resource Extractor

europmc_dev_tool.spacy_extractor.extract_with_spacy(nlp, text, section='unknown', sentence_id=None, offline=False)

Extracts accession numbers and resources from text using spaCy’s Matcher.

This function uses a predefined list of patterns to find potential accession numbers and resources in a given text. It also performs context validation to reduce false positives.

Parameters:
  • nlp – The loaded spaCy language model.

  • text (str) – The input text (sentence) to search within.

  • section (str, optional) – The document section where the text originates, defaults to “unknown”.

  • sentence_id (str, optional) – The ID of the sentence, defaults to None.

  • offline (bool, optional) – If True, skips online validation.

Returns:

A list of dictionaries, where each dictionary represents an extracted accession number or resource and its metadata.

Return type:

list