Case Study 2024

Blindenbrille

Rediscover the world through innovative technologies

AIAccessibilityIoTHardware

Role: Designer & Developer
Duration: 5 months
Team: Julien Offray ,Franz Pfitzer
Tools: Raspberry Pi 4, Node.js, OpenAI Vision, OpenAI TTS

1 / 5

Overview

Blindenbrille is an IoT solution that helps visually impaired people perceive and navigate their environment. The glasses use a Raspberry Pi, a camera module, and AI to continuously describe the surroundings through a bone-conduction headset — leaving the ears open to ambient sound, which matters more than most sighted designers realise.

The project ran two design tracks simultaneously: the physical form factor and the software experience. Both had to work together for the system to be genuinely useful.

Problem

The challenge

Most assistive technologies for the visually impaired rely on haptic feedback or direct audio cues — useful for navigation but limited in conveying the richness of an environment. What if visually impaired people could experience not just where they are, but what's around them?

The technical challenge was clear enough. The harder challenge was the design one: AI scene descriptions are not inherently useful. Generic captions — "a room with furniture" — add no value to someone who can't see. The system had to produce descriptions oriented toward the specific needs of someone navigating the world without sight.

Process

How we approached it

Hardware and software design happened in parallel from day one. On the hardware side: integrating the Raspberry Pi, camera module, and touch sensor into wearable glasses frames while keeping the device light enough to wear. On the software side: building the Node.js server that manages the camera capture loop, OpenAI Vision API calls, and TTS audio generation — and iterating extensively on the prompts themselves.

Key insights:

Bone conduction was the only viable speaker choice — blocking ambient sound with in-ear audio removes the spatial awareness visually impaired users depend on
Generic scene captions are useless — prompts had to be engineered specifically for someone who cannot see
Two contexts need two entirely different modes: outdoor navigation vs. indoor spatial orientation
Audio pacing matters as much as content — too much information too fast is worse than nothing
Touch-based mode switching had to be discoverable and operable without looking

Solution

What we built

The system works as a continuous feedback loop: the camera captures frames → sends to OpenAI Vision with a mode-specific prompt → receives descriptive text → converts via OpenAI TTS → plays through the bone-conduction headset. Mode switching happens via a touch sensor built into the frame.

Two modes emerged from testing: Outdoor mode delivers rich environmental descriptions — atmosphere, surroundings, people nearby — to support social participation and navigation. Indoor mode shifts to concise spatial orientation: obstacles, room layout, directional guidance. The UX challenge was entirely in prompt engineering and audio pacing, not the pipeline itself.

Outdoor mode: Rich environmental descriptions — atmosphere, scenery, spatial context
Indoor mode: Concise spatial orientation — obstacles, layout, navigation instructions
Bone-conduction audio: Descriptions delivered without blocking ambient environmental sound
Touch sensor mode switching: Context change on the frame, no visual interaction required
Continuous capture loop: Camera → OpenAI Vision → TTS → playback, fully automated

Outcome

2 operating modes for different contexts

1 fully working hardware prototype

3 core systems: hardware, software, prompt design

Blindenbrille demonstrated how combining accessible AI APIs with thoughtful physical design can produce a meaningful assistive device at a fraction of the cost of existing solutions. Beyond the technical achievement, the project required genuine empathy for the target user — designing not for screens but for ears, and learning that the quality of a description is a design problem just as much as the quality of an interface.

The prompt engineering work was the most surprising part: the difference between a useless caption and a genuinely useful description is almost entirely in how you instruct the model.