Back to browse
GitHub Repository

B站收藏夹RAG知识库:收藏不吃灰,B 站收藏夹 → 语音转写 → 向量检索 → 对话问答

1,105 starsPython

Turn Bilibili favorites into a personal RAG knowledge base

by via2026·Feb 22, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemNiche Gem

Bilibili-specific RAG pipeline with fallback ASR for inaccessible audio URLs.

Strengths
  • Bilibili auth + audio extraction with 403-fallback handling is region-specific and clever — solves real platform friction.
  • End-to-end retrieval: sync → ASR → embeddings → chat with source attribution; thoughtful pipeline design.
  • Open source with diagnostic scripts (debug_asr, diagnose_rag) shows real ops thinking; cost warnings baked in.
Weaknesses
  • 'Chat with your video collection' is a solved pattern (e.g., NotebookLM, Glasp, local RAG tools); main differentiation is Bilibili-specific.
  • Heavy dependence on DashScope (Alibaba) API for ASR and embeddings; costs unclear and tied to third-party SaaS.
Category
Target Audience

Bilibili users who save long-form videos (talks, courses, streams); Chinese-language learners and archivists

Similar To

NotebookLM · Glasp · Local RAG stacks (LlamaIndex, LangChain examples)

Post Description

I built this to solve a personal problem: I save many long-form Bilibili videos (talks/courses), but cannot retrieve key ideas later.

Pipeline: - Bilibili auth + favorites sync - Audio extraction + ASR fallback (handles inaccessible audio URLs) - Chunking + embeddings + ChromaDB - RAG chat UI with source links

Stack: FastAPI, LangChain, ChromaDB, Next.js, SQLite.

I’d love feedback on: 1) retrieval quality tradeoffs 2) better indexing strategy for long videos 3) cost control for ASR + embeddings

Similar Projects

Security●●Solid

RAG knowledge base poisoning lab, 100% local

Embedding anomaly detection cuts attack success from 95% to 20%.

Dark HorseBig Brain
aminerj
155482mo ago
AI/ML●●●Banger

Director-AI – token-level NLI+RAG

Token-level streaming halt stops hallucinations mid-sentence before user sees them—genuinely novel safety layer.

Big BrainWizardry
anulum
273mo ago