Skip to contents

Corpus class

sm_corpus() `[`(<sm_corpus>) length(<sm_corpus>) dim(<sm_corpus>) as_tibble(<sm_corpus>) as.data.frame(<sm_corpus>) print(<sm_corpus>) format(<sm_corpus>) summary(<sm_corpus>) str(<sm_corpus>)
Create an sm_corpus object
validate_sm_corpus()
Validate an sm_corpus object
sm_validate()
Validate corpus integrity
is_sm_corpus()
Test if an object is an sm_corpus
as_sm_corpus()
Coerce objects to sm_corpus
sm_save_corpus() sm_load_corpus()
Save and load an sm_corpus
sm_filter_works()
Filter works in a corpus
sm_materialise()
Materialise cached enrichment into a corpus
scimapR-stability
scimapR accessor return-type stability contract

File ingestion

sm_read_auto()
Auto-detect bibliographic file format and read
sm_read_bib()
Read BibTeX files
sm_read_cochrane()
Read Cochrane Library export files
sm_read_dimensions()
Read Dimensions CSV files
sm_read_endnote()
Read EndNote XML export files
sm_read_lens()
Read Lens.org CSV files
sm_read_openalex_json()
Read OpenAlex JSON files
sm_read_pubmed_xml()
Read PubMed XML files
sm_read_ris()
Read RIS files
sm_read_scopus()
Read Scopus CSV files
sm_read_wos()
Read Web of Science plaintext files
sm_read_zotero()
Read Zotero CSV export files

API ingestion

sm_fetch_arxiv()
Fetch works from arXiv
sm_fetch_biorxiv()
Fetch works from bioRxiv/medRxiv
sm_fetch_crossref()
Fetch works from Crossref
sm_fetch_openalex()
Fetch works from OpenAlex
sm_fetch_orcid()
Fetch works from ORCID
sm_fetch_overton()
Fetch policy citations from Overton
sm_fetch_pubmed()
Fetch works from PubMed
sm_fetch_semantic_scholar()
Fetch works from Semantic Scholar

Live refresh

sm_refresh()
Refresh stale corpus data
sm_staleness()
Check corpus staleness
sm_lock() sm_unlock()
Lock or unlock a corpus

Corpus enrichment

sm_enrich_altmetric()
Enrich corpus with Altmetric attention data
sm_enrich_concepts()
Enrich corpus with concepts from OpenAlex or MeSH
sm_enrich_opencitations()
Enrich corpus with OpenCitations citation data
sm_enrich_orcid()
Enrich corpus authors with ORCID data
sm_enrich_retraction()
Enrich corpus with retraction data
sm_enrich_ror()
Enrich corpus institutions with ROR data
sm_enrich_specter()
Enrich corpus with SPECTER embeddings
sm_enrich_unpaywall()
Enrich corpus with Unpaywall open-access data

bibliometrix interop

sm_to_bibliometrix()
Convert sm_corpus to bibliometrix format

Corpus assembly

sm_build_corpus()
Build a corpus from multiple sources
sm_bind_corpora()
Bind two corpora together
sm_dedupe()
Deduplicate corpus works by DOI

Research questions and screening

sm_question() print(<sm_question>) format(<sm_question>)
Create a structured research question
is_sm_question()
Test if an object is an sm_question
sm_query()
Search works by text query
sm_corpus_for_question()
Build a corpus from a research question
sm_screen_against_question()
Screen corpus against a research question
sm_screen_regex()
Screen corpus using regex matching
sm_screen_summary()
Summarise screening decisions
sm_merge_screening_decisions()
Merge external screening decisions into corpus

Networks

sm_network_citation()
Build a direct citation network
sm_network_cocitation()
Build a co-citation network
sm_network_collab()
Build a collaboration network
sm_network_coupling()
Build a bibliographic coupling network
sm_network_coword()
Build a co-word (co-occurrence) network
sm_network_semantic()
Build a semantic similarity network

Embeddings and clustering

sm_embed_works()
Compute work embeddings using transformer models
sm_embed_save()
Save embeddings to disk
sm_embed_load()
Load embeddings from disk
sm_cluster_evolution()
Track cluster evolution over time
sm_cluster_hdbscan()
HDBSCAN clustering of works
sm_cluster_kmeans()
K-means clustering of works
sm_cluster_label()
Label clusters with representative terms
sm_cluster_leiden()
Leiden community detection

Indicators

sm_metric_collab_index()
Calculate collaboration index
sm_metric_disruption()
Calculate the CD (disruption) index
sm_metric_fnci()
Calculate Field-Normalized Citation Impact
sm_metric_g_index()
Calculate g-index
sm_metric_h_index()
Calculate h-index
sm_metric_m_index()
Calculate m-index (m-quotient)
sm_metric_novelty()
Calculate Uzzi novelty score
sm_metric_rcr()
Calculate Relative Citation Ratio
sm_metric_summary()
Robust summary of a heavy-tailed impact metric
sm_summary_authors()
Summary statistics for authors
sm_summary_period()
Summary statistics by publication period
sm_summary_sources()
Summary statistics for sources (journals)
sm_summary_works()
Summary statistics for works
sm_self_citation() print(<sm_self_citation>) summary(<sm_self_citation>)
Compute self-citation from corpus reference lists

Author trajectory

sm_author_trajectory() print(<sm_trajectory>) format(<sm_trajectory>)
Build an author trajectory analysis
sm_trajectory
sm_trajectory S3 class
is_sm_trajectory()
Test if an object is an sm_trajectory
sm_plot_trajectory()
Plot author trajectory
sm_plot_topic_pivots()
Plot topic pivots
sm_plot_collab_turnover()
Plot collaborator turnover

Equity and representation audit

sm_audit_funding() print(<sm_audit_funding>)
Funding source audit
sm_audit_gender() print(<sm_audit_gender>)
Gender representation audit
sm_audit_geographic() print(<sm_audit_geographic>)
Geographic representation audit
sm_audit_oa() print(<sm_audit_oa>)
Open access status audit
print(<sm_audit_summary>) sm_audit_summary()
Combined equity audit summary
sm_plot_equity_dashboard()
Plot equity dashboard

Coverage and reconciliation

sm_coverage_audit() print(<sm_coverage>) summary(<sm_coverage>) autoplot(<sm_coverage>)
Audit corpus coverage against a ground-truth reference
sm_coverage_breakdowns()
Coverage breakdowns as a flat tibble
sm_journal_in_index()
Verify journal source coverage against an index by ISSN
sm_match_types()
Controlled vocabulary for coverage match types
sm_reconcile() print(<sm_reconciliation>) summary(<sm_reconciliation>) autoplot(<sm_reconciliation>)
Reconcile two corpora by DOI and title

Affiliation and attribution

sm_affiliation_match()
Match author affiliations to institutions
sm_affiliation_summary()
Summarise affiliation matches
sm_affiliation_signals()
Controlled vocabulary for affiliation match signals
sm_affiliation_methods()
Controlled vocabulary for affiliation match methods
sm_attribute_institution()
Attribute matched affiliations to a controlled institution vocabulary
sm_affiliation_dict
Default affiliation-matching dictionary

Causal and policy evaluation

sm_its() print(<sm_its>) summary(<sm_its>) autoplot(<sm_its>)
Interrupted time series for a corpus outcome
sm_did() print(<sm_did>) summary(<sm_did>) autoplot(<sm_did>)
Difference-in-differences for a treated vs control institution set
sm_synth() print(<sm_synth>)
Synthetic control for a treated institution

Counting and robust impact

sm_citation_maturity()
Flag citation-immature recent years
sm_count()
Full and fractional output / impact counting
sm_metric_summary()
Robust summary of a heavy-tailed impact metric

Reproducible reporting

sm_figure_manifest()
Build a figure caption and alt-text manifest
sm_corpus_from_tables()
Construct an sm_corpus from a relational set of tables

Conversational exploration

sm_chat() print(<sm_chat_response>)
Retrieval-grounded corpus chat
sm_chat_render()
Render a chat response

Reproducibility and certificates

sm_certificate() sm_rebuild_from_cert() sm_verify_certificate() print(<sm_certificate>) print(<sm_cert_verification>)
Create, rebuild from, and verify corpus certificates
sm_provenance()
Get corpus provenance
sm_hash_corpus()
Hash a corpus for reproducibility
sm_cite_corpus()
Generate citation block for a corpus
sm_snapshot() sm_snapshot_load()
Save and load corpus snapshots
sm_diff_corpora() print(<sm_corpus_diff>) superseded
Compare two corpora

Visualization

sm_theme()
scimapR plot theme
sm_scale_color() sm_scale_fill()
Viridis colour scale for scimapR
sm_palette_qualitative()
Qualitative viridis palette
autoplot(<sm_corpus>)
Autoplot for sm_corpus
sm_plot_bradford()
Plot Bradford's law
sm_plot_citation_network()
Plot citation network
sm_plot_collab()
Plot collaboration map
sm_plot_collab_turnover()
Plot collaborator turnover
sm_plot_equity_dashboard()
Plot equity dashboard
sm_plot_evolution()
Plot topic evolution over time
sm_plot_heaps()
Plot Heaps' law
sm_plot_landscape()
Plot research landscape
sm_plot_lotka()
Plot Lotka's law
sm_plot_production()
Plot annual scientific production
sm_plot_thematic_map()
Plot thematic map (Callon centrality-density)
sm_plot_top()
Plot top entities
sm_plot_topic_pivots()
Plot topic pivots
sm_plot_trajectory()
Plot author trajectory

Export

sm_export_covidence()
Export corpus for Covidence
sm_export_csv()
Export corpus tables as CSV files
sm_export_cytoscape()
Export network for Cytoscape (JSON)
sm_export_figure()
Export a plot as a publication-ready figure
sm_export_gephi()
Export network for Gephi (GEXF format)
sm_export_quarto_report()
Export a Quarto report
sm_export_rayyan()
Export corpus for Rayyan
sm_export_rds()
Export corpus as RDS
sm_export_table()
Export a table as formatted XLSX or CSV
sm_export_vosviewer()
Export network for VOSviewer
sm_export_zip()
Export corpus as self-contained ZIP bundle

Systematic review bridge

sm_screen_prisma()
Generate PRISMA flow diagram
sm_import_rayyan()
Import screening decisions from Rayyan

Field-specific helpers

sm_field_clinical_trials()
Link corpus works to clinical trials
sm_field_funder()
Extract funding information
sm_field_pubmed_mesh()
Extract MeSH terms from corpus

Researcher profile

sm_researcher_profile()
Build a researcher profile

Shiny app

sm_run_app()
Launch the scimapR Shiny application

Example data

sm_example_corpus()
Generate a synthetic example corpus
sm_example_files()
Get paths to example data files
sm_example_db
Bundled example corpus