← Back to About

AI Undergraduate Researcher & Software Developer

UC Davis Center for Mind and Brain (Miller Lab)

Apr 2023 – Jun 2025 Davis, California

Developed a wearable AR assistive device for hearing accessibility, improving real-time ASR accuracy by 12% and reducing latency below 500 ms.

Key Accomplishments

  • Migrated speech recognition pipeline from Google Cloud Speech-to-Text to AWS Transcribe; achieved 12% accuracy improvement and sub-500 ms response latency.
  • Designed diarization and confidence-scoring modules to better separate speakers in noisy, multi-speaker environments.
  • Built gaze-adaptive AR caption display using ARKit and Unity, improving readability and usability for hearing-impaired participants.

Skills & Technologies

  • Python
  • AWS Transcribe
  • Machine Learning
  • Audio Processing
  • ARKit
  • Unity
  • FastAPI
  • PostgreSQL

What I did

Led development of a wearable AR assistive device that transcribes live speech into captions displayed on smart glasses. Integrated advanced ASR, diarization, and gaze-adaptive visualization to make speech more accessible for individuals with hearing loss.

Why it mattered

  • Enabled inclusive communication for hearing-impaired users through real-time, personalized captioning.
  • Improved accuracy by 12% and cut transcription latency by 75% through optimized audio and model integration.
  • Supported 40+ user studies that shaped future iterations of assistive hearing technology.

How it worked

  • ASR pipeline: Implemented AWS Transcribe streaming with noise-reduction preprocessing and speaker diarization.
  • Accuracy optimization: Fine-tuned confidence scoring and word-timestamp smoothing for clearer real-time output.
  • AR overlay: Used ARKit gaze tracking to position captions dynamically; built Unity interface for visual comfort.
  • System architecture: Modularized services for capture, inference, display, and telemetry; FastAPI backend with PostgreSQL.
  • Research tools: Web-based dashboard for device calibration and logging, reducing setup friction for researchers.

Tech

  • Audio & ML: PyAudio, librosa, NumPy, AWS Transcribe
  • Backend: Python, FastAPI, PostgreSQL, AWS Lambda
  • AR & Frontend: ARKit, Unity, WebSocket-based dashboard
  • Data & Telemetry: Pandas, Plotly, custom analytics tracking

My role

  • Architected the ASR and audio router system, enabling real-time streaming between microphone array, beamformer, ASR engine, and AR display with sub-500 ms latency.
  • Implemented adaptive beamforming algorithms (Delay-and-Sum, MVDR) in NumPy and PyTorch, improving speech clarity and noise suppression in dynamic environments.
  • Integrated AWS Transcribe with a modular router service, supporting live and offline modes through Dockerized microservices and WebSocket messaging.
  • Developed telemetry and analytics tools to measure signal quality, latency, and model confidence, guiding major iterations of the device architecture.