Back to browse
GitHub Repository

76,907 phytochemical records enriched with PubMed, ClinicalTrials.gov, ChEMBL bioactivity & USPTO patents. Production-ready JSON + Parquet. Free 400-row sample. Full dataset: ethno-api.com

5 starsJupyter Notebook

I enriched 24K phytochemicals with trials, bioactivity, and patent data

by wirthal1990·Mar 17, 2026·2 points·1 comment

AI Analysis

MidNiche GemSolve My Problem

Pre-joined USDA phytochemical records with clinical trials and patents, saving biotech ETL time.

Strengths
  • Joins four distinct biomedical APIs into a single consistent schema without rate limit headaches.
  • Parquet format ensures efficient querying for large-scale bioinformatics pipelines and local analysis.
  • Includes DOI and HuggingFace hosting, adding academic credibility and easy sample access.
Weaknesses
  • Full dataset behind a paywall limits community verification and open-source collaboration potential.
  • Competes with free APIs like PubChem and ChEMBL if you build your own ETL pipeline.
Category
Target Audience

Bioinformatics researchers, nutraceutical data scientists, drug discovery teams

Similar To

PubChem · ChEMBL · DrugBank

Similar Projects