The smartphone reads prescriptions, describes vacation photos and recognizes friends at parties: With the iPhone application "Seeing AI" Microsoft has published a digital companion for people with visual impairments two years ago. Since today's Tuesday there is the app in addition to the English version in French, Spanish, Dutch, Japanese and German.
Speaking to SPIEGEL, Saqib Shaikh, chief developer of the app at Microsoft, explains what new features of the application made artificial intelligence (AI) possible. He talks about his vision of digital companions for visually impaired and blind people, but also about the problems that the developers face and why they need to be patient with the technology.
Saqib Shaikh is responsible for the app "Seeing AI" as chief developer at Microsoft. Shaikh lost his sight at the age of seven. At Microsoft he co-developed the search engine Bing and the language assistant Cortana. In the meantime, Shaikh's focus is on how artificial intelligence can facilitate the lives of people with visual impairments.
SPIEGEL: Mr. Shaikh, 15 years ago you had the idea for the first time to develop a digital companion for people with visual impairment. Is artificial intelligence the technology you've always been waiting for?
Saqib Shaikh: We have made tremendous progress in artificial intelligence research. But we are still many years away from computers looking around and understanding everything they see. I am blind myself and therefore like to go for a walk with my wife and friends. We then discuss what they see on the way. They tell me when they discover something exciting, or I ask if I can not assign a sound. I wish that an AI could do just that one day. Like a personal assistant.
SPIEGEL: How long do we have to wait until digital helpers replace a human companion?
Saqib Shaikh: It is very difficult to look to the future. I can only speculate wildly. I do not know if it will take another two, three, four or five years. It happens so much in the field. On the one hand, there are so many advances every year, but on the other, we also face extremely great challenges. For example, it is still very hard for an AI to recognize what exactly people are doing in a given situation.
SPIEGEL: What can AI do really well?
Saqib Shaikh: Over the past few years computers have learned incredibly fast to take on clearly defined tasks from people. Above all, they recognize images and language better and better. In these areas, artificial intelligence is the most advanced. Thanks to this technique, you can, for example, feel photos with "Seeing AI". The user touches the display of his smartphone to find out if text, a face or any other object can be seen under his finger.
SPIEGEL: The text recognition of "Seeing AI" works quite well. But the scene detection is still extremely flawed. Children slides are confused with hydrants, round windows are interpreted as stop signs, and stone benches become tombstones. What is so difficult about recognizing objects?
Saqib Shaikh: Artificial intelligence can be thought of as a three-year-old child. You show him many pictures and explain to him: "This is a car, it's a tree, it's a dog." In the beginning, the child only recognizes things that it has already seen. Then it starts to describe these things in sentences. Like a child, artificial intelligence also improves over time. Scientists are also working to make the training methods better and better.
SPIEGEL: It's a pretty big responsibility to guide the blind and visually impaired with an app through the world.
Saqib Shaikh: Yes. But it is a research project. Many features are still very experimental. Nevertheless, we want to give users the opportunity to join newly developed Microsoft technologies as early as possible. We benefit from users telling us what they think about the features. We develop the app together with our customers.
SPIEGEL: Does it bother users that the app always makes mistakes?
Saqib Shaikh: For some, it does not have to be perfect. One user told me that he wanted to send holiday photos to his family at home. Even if the app did not recognize everything, he could at least distinguish the photos and choose the right pictures. Another told me that he scans the environment to see if a photo is available for Facebook. For a rough description of the app handed him. Another told us that he uses the app to film the television in football games to hear the result. The commentator had mentioned the intermediate level too seldom. Others scan beverage cans to distinguish a coke from a diet coke. All these little things make the app a useful companion.
SPIEGEL: Companies like Google and Facebook are making a fuss over their AI research, winning duels against Go champions and defeating professional poker players. What is Microsoft doing in the competition for AI supremacy?
Saqib Shaikh: We see a great deal of potential for AI to improve the lives of people with visual impairments with "Seeing AI". But our colleagues, for example, also develop tools for the hard of hearing to display subtitles in real time. In addition, there are many other examples such as "Eye Gaze", which can control a Windows PC with eye movements alone.