Skip to main content
MP
AI Agent · Real-time · Multimodal2026 (active)

TONELENS

Google Translate tells you the words. ToneLens tells you the truth.

Role

Solo Developer

Stack
PythonFastAPIGemini Live APIGoogle ADKVertex AIFirestoreCloud RunWebSockets
Status

Submitted — Gemini Live Agent Challenge 2026

Overview

ToneLens is a real-time emotional intelligence agent that watches conversations through your camera, listens through your microphone, and streams back translation, emotional subtext, cultural context, and tactical suggestions — all simultaneously via the Gemini Live API.

The core insight behind ToneLens is that words carry only 30% of meaning. The remaining 70% lives in tone, hesitation, confidence, and cultural context. Existing tools like Google Translate decode language. ToneLens decodes intent.

Built in 7 days for the Gemini Live Agent Challenge 2026, ToneLens runs four distinct agent modes — Travel, Meeting, Present, and Negotiate — each with a specialized system prompt, real-time emotional scoring, and autonomous agent actions triggered by keyword detection across the conversation stream.

Screenshots

Key Features

Live translationDetects and translates any language in real-time with cultural context
Emotion engine8 emotion types with confidence scoring streamed per utterance
Negotiation coachTracks power balance, detects bluffing, whispers tactical strategy
Presentation coachFiller word detection, pace analysis, real-time delivery coaching
Meeting intelligenceAuto-saves decisions, deadlines, and commitments to Firestore
Agent actionsAutonomous triggers for cultural tips, emergency detection, and stress reports

Tech Deep Dive

ToneLens uses a two-model pipeline to work around a critical constraint — the Gemini Live native audio model does not support function declarations or text modality simultaneously. Audio streams bidirectionally via WebSockets to a FastAPI backend on Cloud Run. The Gemini Live session handles real-time multimodal understanding and responds in audio only. A second Vertex AI call to gemini-2.0-flash reformats the transcribed response into a strict four-line structured output: TRANSLATION, EMOTION, SUBTEXT, and SUGGEST. Agent actions are triggered via keyword detection in Python rather than function calling, keeping the Live session config clean. Session state and conversation history are persisted in Firestore. The entire pipeline runs end-to-end in under two seconds.

Next ProjectTASKDRIFT