API Reference
This section provides a detailed API reference for the core components of the europmc-dev-tool package.
API Clients
- class europmc_dev_tool.api.client.BaseClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)
Base client for Europe PMC APIs.
- class europmc_dev_tool.api.client.RateLimiter(rate: float, per: float)
Token-bucket rate limiter to throttle requests.
- class europmc_dev_tool.api.articles.ArticlesClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)
Client for the Europe PMC Articles RESTful API.
- get_article(source: str, article_id: str, result_type: str = 'core') dict
Fetch metadata for a single article by ID.
- get_fulltext_xml(article_id: str) str
Fetch the full-text XML for an open-access article.
- get_references(source: str, article_id: str, page: int = 1, page_size: int = 25) dict
Fetch references for a single article by ID.
- search(query: str, page: int = 1, page_size: int = 25, result_type: str = 'core') dict
Search articles via /search endpoint.
- class europmc_dev_tool.api.annotations.AnnotationsClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)
Client for the Europe PMC Annotations API. See: https://europepmc.org/AnnotationsApi
- get_by_article_ids(article_ids: list, provider: str | None = None) dict
Retrieve text-mined annotations for given article IDs. IDs should be in the format: SOURCE:ID, e.g., PMC:11704132
- get_by_entity(entity: str, provider: str | None = None) dict
Find articles that cite a specific entity (e.g., gene, chemical).
- get_by_section_and_or_type(annotation_type: str, subtype: str | None = None, section: str | None = None, provider: str | None = None, filter: int = 1, page_size: int = 4, cursor_mark: str = '0.0') dict
Get annotations of a specific type, optionally filtered by subtype and section.
- class europmc_dev_tool.api.grants.GrantsClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)
Client for the Europe PMC Grants RESTful API.
- search(query: str, page: int = 1, page_size: int = 25) dict
Search grants via /search endpoint.
- class europmc_dev_tool.api.oai.OAIClient(email: str | None = None, tool: str | None = None, rate_limit: float = 10.0)
Client for the Europe PMC OAI-PMH service.
- harvest(verb: str = 'ListRecords', metadata_prefix: str = 'oai_dc', from_date: str | None = None, until: str | None = None, set_spec: str | None = None) str
Harvest metadata via OAI-PMH.
JATS Processor
- class europmc_dev_tool.jats_processor.XMLProcessor(sentenciser=True)
A class to process JATS XML content.
This processor handles the parsing of JATS XML, cleaning, structuring the content into sections, and optionally splitting text into sentences.
Accession Number and Resource Extractor
- europmc_dev_tool.spacy_extractor.extract_with_spacy(nlp, text, section='unknown', sentence_id=None, offline=False)
Extracts accession numbers and resources from text using spaCy’s Matcher.
This function uses a predefined list of patterns to find potential accession numbers and resources in a given text. It also performs context validation to reduce false positives.
- Parameters:
nlp – The loaded spaCy language model.
text (str) – The input text (sentence) to search within.
section (str, optional) – The document section where the text originates, defaults to “unknown”.
sentence_id (str, optional) – The ID of the sentence, defaults to None.
offline (bool, optional) – If True, skips online validation.
- Returns:
A list of dictionaries, where each dictionary represents an extracted accession number or resource and its metadata.
- Return type:
list