Video to Text AI Transcription

Name: Video to Text AI Transcription
Availability: InStock
Author: gregzeng95

by gregzeng95·Mar 3, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●MidShip It

Reliable transcription with diarization, but Otter.ai, Rev, and AssemblyAI already own this.

Strengths

•Thoughtful reliability work: audio extraction before transcription solved long-file stability issues.
•Speaker diarization for interviews; batch uploads on paid tier unlock real workflow value.
•Clear failure messaging and retry behavior show UX maturity beyond raw model outputs.

Weaknesses

•Crowded category with entrenched competitors (Otter.ai, Rev, AssemblyAI) offering similar or better pricing.
•No clear technical or feature differentiation: 55+ languages and 2–3 min turnaround are table stakes.

Post Description

I’ve been building a video-to-text web app and wanted to share it for feedback. The core flow is straightforward: upload files, start transcription, then track progress in a history page that refreshes automatically while jobs are running. Paid users can submit multiple files at once, and speaker diarization is supported for conversations and interviews.

Over the last few weeks I focused mostly on reliability. I changed the pipeline to extract audio first and then run transcription, which made long-file handling more stable. I also spent time improving failure handling so users see a clear message when a job fails, instead of raw model errors.

Pricing is intentionally simple right now: free users get 3 transcriptions per day, and there is one Unlimited plan at $20/month or $120/year.

I’d really appreciate feedback on the overall UX, whether the failure/retry behavior feels right, and whether the pricing is understandable for first-time users.

Similar Projects

AI/ML●●●Banger

Overlapping Speaker Transcription Model

Transcribes overlapping speakers in a single pass without needing separate diarization steps.

WizardryBig BrainSolve My Problem

mcgov

101mo ago

SaaS●●Solid

Transcriptum – fast video transcription with speaker labels and summary

It pairs WhisperX-grade transcription (speaker diarization and word-level timestamps) with optional multi-LLM analysis — summaries, Q&A, sentiment, topics and even fact-checking — plus YouTube import and standard export formats. Being vendor-agnostic and offering fact-checking is a smart differentiator, but the space is crowded (Descript/Otter/etc.); clearer accuracy numbers, pricing, or unique workflow hooks would make this stand out.

Solve My ProblemSlick

lpeancovschi

103mo ago

Productivity●Mid