I read Replika's privacy policy and then built a competitor

Name: I read Replika's privacy policy and then built a competitor
Availability: InStock
Author: krunkworx

by krunkworx·Apr 26, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidDark HorseSolve My Problem

Local inference for AI companions when Replika stores everything on their servers.

Strengths

•Bonsai-8B 1-bit quantized model (~1.3GB) makes on-device inference actually feasible on phones.
•Routes between MLX for speed and llama.cpp for vision—pragmatic multi-model architecture.
•Zero data collection policy addresses the actual privacy failure mode of cloud AI companions.

Weaknesses

•MLX and llama.cpp are standard tech—no novel architecture beyond wiring existing tools.
•Local LLM mobile apps becoming crowded (Off Grid, private LLM apps on both stores).

Post Description

I'm genuinely surprised at what people are willing to share with AI companions. Read Replika's privacy policy. Then Character.AI's. These apps store your most personal conversations on their servers, linked to your email address. A breach or subpoena and your identity is attached to everything you ever told your "AI friend." Eek.

The only thing I think actually solves this is local inference. I remember browing r/LocalLLaMA and years ago and thinking this is the future. Local models are finally good enough. I was playing with the bonsai 8B 1-bit quant model a few weeks back and I think we're almost there. I built friendAI to see if there's market demand for local inference. Everything runs on your phone.

What's actually on-device:

- Bonsai-8B (1-bit quantized Qwen3-8B, ~1.3GB) via MLX for speed - Gemma 4 E2B (~4.5GB, GGUF) via llama.cpp for vision - A unified client that routes between them

A few things I'm reasonably proud of solving in about a week:

- Turns out the hardest part was actually managing the background model downloads that survive crashes, network drops and reboots. You can start chatting before the download finishes. - Runtime thread auto-tuning that benchmarks your actual device at startup rather than guessing with a static heuristic - Local memory without a vector DB. TF-IDF style ranking with recency decay. No embedding model needed.

Happy to go deep on any of it. www.friendai.pro