Back to browse
NRC nuclear licensing RAG pipeline and regulatory embeddings dataset

NRC nuclear licensing RAG pipeline and regulatory embeddings dataset

by davenporten·Apr 13, 2026·2 points·0 comments

AI Analysis

●●●BangerNiche GemSolve My Problem

First public NRC regulatory embeddings dataset—37K chunks ready for ChromaDB and Pinecone.

Strengths
  • Complete regulatory corpus covering all documents needed for COL submissions
  • Pre-embedded with OpenAI text-embedding-3-small for immediate vector store integration
  • No comparable public dataset existed before this release
Weaknesses
  • Narrow applicability limited to nuclear regulatory and compliance AI use cases
  • Accompanying RAG pipeline code remains incomplete following the SaaS business pivot
Category
Target Audience

AI engineers building regulatory compliance systems, nuclear industry developers

Post Description

I've been building an AI system to automate parts of the NRC Combined Operational License process: gap analysis against the Standard Review Plan, FSAR strength scoring, and RAI prediction using vector similarity to historical NRC requests. I intended this as a SaaS business, but was ultimately beat to the market.

What I think is the most interesting artifact is the dataset: 37,734 chunks of NRC regulatory documents (NUREG-0800, 10 CFR Parts 20/50/51/52/72/73/100, and Regulatory Guides) embedded with OpenAI text-embedding-3-small. It covers the full regulatory corpus an applicant would need for a COL submission. I'm not aware of anything like this being publicly available before.

The embeddings are ready to load directly into ChromaDB, Pinecone, or any other vector store. If you're doing nuclear AI, regulatory NLP, or just want a large real-world RAG dataset to experiment with, it should be useful.

Here's the full codebase if you're interested: https://github.com/Davenporten/nrc-licensing-rag

Similar Projects

AI/ML●●Solid

Deploy a RAG pipeline as a REST API using RAGLight

Modular RAG with MCP integration, but Langchain and LlamaIndex already dominate.

Ship It
bessouat40
313mo ago