How OpenAI and Google Are Making The World More Accessible

Throughout May, the AI landscape was buzzing with significant developments, most notably the launch of GPT-4o and updates to Google Gemini announced at Google I/O 2024. One key advancement in these new models is multimodality, which enables a model to process and output text, audio, and visual data.

This innovation has greatly enhanced accessibility technology, particularly in developing virtual assistants for visually impaired people. Here’s a rundown of the new accessibility tools that OpenAI and Google are rolling out:

GPT-4o’s ‘Be My Eyes’

Alongside the debut of its latest flagship model, GPT-4o, OpenAI introduced updates to ‘Be My Eyes’, a digital visual assistant designed to aid individuals who are blind or have low vision. Initially integrated with ChatGPT in 2023, ‘Be My Eyes’ leveraged OpenAI’s GPT-4 to convert visual input into text, providing written descriptions of surroundings through a camera, which were then read aloud by ChatGPT’s Voice Mode.

Now integrated with GPT-4o, ‘Be My Eyes’ benefits from the model’s GPT-4 level intelligence, which is faster and enhanced across text, voice, and vision capabilities. This natively multimodal tool offers a voice response that mimics human-like response time. Users can point their device at an object or ask questions vocally, receiving real-time, human-like responses from the ‘Be My Eyes’ assistant. Watch the video from the company showcasing the updated model below:

Google TalkBack

During its Google I/O developer conference, Google announced the upcoming integration of its AI model Gemini into Pixel phones, featuring a multimodal AI model called Gemini Nano for Android devices.

Later this year, Gemini Nano’s multimodal capabilities are coming to TalkBack, helping people who experience blindness or low vision get richer and clearer descriptions of what’s happening in an image. On average, TalkBack users come across 90 unlabeled images per day. This update will help fill in missing information — whether it’s more details about what’s in a photo that family or friends sent or the style and cut of clothes when shopping online. Since Gemini Nano is on-device, these descriptions happen quickly and even work when there’s no network connection.

Google Lookout

Lookout on Android helps people with blindness and low vision use their phone’s camera to get more information about the world around them. Find mode is rolling out in beta, providing a new way to find specific objects. Select from seven categories of items — like seating and tables or bathrooms — then as you move your camera around the room, Lookout will notify you of the direction and distance to the item. Earlier this year, AI-generated image captions became available on Lookout globally in English. Now if someone captures a photo directly within the app, they’ll receive an AI-generated description of the image and learn more about the images they take.

Look to Speak

With the Android app Look to Speak, you can select pre-written, customisable phrases with your eyes and have them spoken aloud. Now, Look to Speak is rolling out a text-free mode. With this mode you can also select and personalize emojis, symbols and photos to activate speech. This new feature is based on feedback from the community to help make communicating more accessible with cognitive differences, literacy challenges and language barriers.

Hands-free cursor for Android Developers

Last year Google released Project Gameface, an open-source, hands-free gaming mouse, on PC. Starting now, developers can now access Project Gameface for Android devices via Github. With the help of Accessibility Service for Android and Google MediaPipe, developers can build applications that let users customise facial expressions, gesture sizes, cursor speed and more. Through collaboration with Incluzza, a social enterprise in India that supports people with disabilities, Google were also able to learn how Project Gameface can be expanded to educational, work and other settings, like being able to type messages to family or searching for new jobs.

Expanded accessibility features in Maps

To make Maps even more helpful for people with disabilities, Google are expanding access to features and offering new options for businesses.

  • Receive more detailed walking instructions and learn more about the world around you with Maps. For those who are blind or low-vision, the company is expanding detailed voice guidance and screen reader capabilities for Lens in Maps to Android and iOS globally in all supported languages. With screen reader capabilities for Lens in Maps, you’ll hear the name and category of places around you — like ATMs, restaurants or transit stations — and how far away a place is so you can quickly orient yourself and decide where to go. And when you’re walking and can’t see your phone, detailed voice guidance provides audio prompts letting you know when you’re heading in the right direction, crossing a busy intersection, or being rerouted if you’ve gone the wrong way.
  • Get Accessibility information no matter where you search. Maps now has accessibility information for more than 50 million places, thanks to contributions from business owners and members of the Maps community. The ♿ icon indicates a place with a wheelchair-accessible entrance with more details about accessible restrooms, parking and seating under the About tab. The ♿icon was previously available worldwide on Android and iOS, and is now expanding to desktop so you can easily find this information no matter where you’re searching. And now when you’re viewing a place on mobile, you’ll also be able to filter reviews to easily find helpful information about wheelchair accessibility.
  • Find places that can cast to your hearing devices. For those in need of hearing assistance, business owners can now add the Auracast attribute to their business profile. Auracast broadcast audio allows venues — like theaters, gyms, places of worship, and auditoriums — to broadcast enhanced or assistive audio to visitors with Auracast-enabled Bluetooth hearing aids, earbuds and headphones.

New designs for Project Relate and Sound Notifications

Google say they are committed to an ongoing partnership with the disability community to improve accessibility features, including updates based on user feedback.

  • Customize how you teach Project Relate. In 2022, they launched Project Relate, an Android app for people with non-standard speech, that allows you to create a personalised speech recognition model to communicate and be better understood. Custom Cards allow you to customize the phrases you teach the model so it understands words that are important to you. Now, there’s a new way for you to select text and import phrases from other apps as Custom Cards, like a note in a Google Doc.
  • New design for Sound Notifications with feedback from you. Sound Notifications alerts you when household sounds happen — like a doorbell ringing or and a smoke alarm going off — with push notifications, flashes from your camera light, or vibrations on your phone. They’ve redesigned Sound Notifications based on user feedback, improving the onboarding process, sound event browsing, and making it easier to save custom sounds for appliances.