Is emotional AI ready to be a key component of our cars and other devices?

Analysts are predicting huge growth for emotional AI in the coming years, albeit with widely differing estimates.

A 2018 study by Market Research Future (MRFR) predicted that the “emotional analytics” market, which includes video, speech, and facial analytics technologies among others, will be worth a whopping $25 billion globally by 2025. Tractica has made a more conservative estimate in its own analysis, but still predicted the “emotion recognition and sentiment analysis” market to reach $3.8 billion by 2025. Researchers at Gartner have predicted that by 2022 10 percent of all personal electronic devices will have emotion AI capabilities, either on the device itself or via cloud-based services. The market will be driven by use cases running the gamut from consumer experience and entertainment to healthcare and automotive.

Yet the technology itself still has strides to make. In a 2019 meta-analysis of 1,000 studies on inferring emotion from human facial expressions, a group of scientists concluded that the relationship between our faces and emotions is more complex that meets the eye. The study was published in the journal, Psychology Science in the Public Interest and reported by The Washington Post.

In an interview with The Washington Post, Lisa Feldman Barrett, a professor of psychology at Northeastern University, who worked on the study said: “About 20 to 30 percent of the time, people make the expected facial expression,” such as smiling when happy… But the rest of the time, they don’t. “They’re not moving their faces in random ways. They’re expressing emotion in ways that are specific to the situation.”

In short, if emotional AI is going to deliver on the lofty expectations placed upon it, it’s going to need a very complex understanding of how our faces and voices correspond to our emotions.

In the same article, Rana el Kaliouby, co-founder and CEO of emotional AI company Affectiva, described the emotional AI space as an ever-evolving one. She agreed that emotional AI technology hasn’t reached the level of sophistication needed for widespread deployment, but she expressed hope that more research will someday achieve this, and also better educate industries and consumers about the limitations of emotional AI.

Affectiva has emerged as one of the leaders in the emotional AI space. The company has focused primarily on applying its emotional AI to vehicles – imagining a world where cars can respond to the emotions of their drivers and passengers in a variety of ways from adjusting music and temperature to even pulling themselves over and offering emergency roadside assistance.

Just how far does emotional AI have to go? And how are we getting there?

Following his talk at the 2019 Drive World Expo and Conference, Abdelrahman Mahmoud, senior product manager at Affectiva, sat down with Design News to discuss the current state of emotional AI research and what’s needed to push the technology forward.

Abdelrahman Mahmoud

(Image source: Affectiva)

Design News: What’s your response to the meta-analysis that concluded that there needs to be more research done in the area for emotional AI to really have any efficacy?

Abdelrahman Mahmoud: A lot of that study was focused on the prototypical expression of emotion like joy, anger, and surprise. But fundamentally we believe that the emotion expression is much more than just just those five or six prototypical emotions. That is why as a company we don’t just focus on these emotions. We actually focus first on the different facial muscles, like how people express a smile.

DN: Can you talk a bit about what’s been happening at your company as far as research into emotional AI lately?

Mahmoud: From a research perspective there’s a lot of continuous focus on multi-modal[methods], for recognizing things like frustration. We’ve done a lot of internal studies and we know that you need a multi-modal approach to try to solve that problem.

Early on we did a lot studies using just the face or just the voice and we’ve seen that the accuracy jumps dramatically if we use data from both — which is kind of intuitive, but we just had to validate that. That’s the main focus for our multi-modal effort these days – detecting signals like frustration and drowsiness that are important in a car.

DN: Has their been work into study things such as body language or position as well? For example, imagine someone who shows frustration less in their face but more in as tension in their shoulders or hands?

Mahmoud:: There are there are both strong signals on the face and in voice. But for sure adding gestures would be beneficial in some cases. Keep in mind that the automotive ecosystem focuses a lot on optimizing cost, which means you don’t have a lot of room for adding a lot of models that can do different things.

DN: Meaning there has to be a balance between what information you want to capture and how many cameras you can place inside the vehicle?

Mahmoud:: For us it’s always a matter of choosing the signals that will mostly strongly give an indicator about what’s happening in the cabin. It might not be the complete picture but you want to get as close as possible. But we think it’s very short term. In the longer term computational power and better compute platforms inside the car are going to change how much we can capture.

DN: Market analysis has been talking a lot about use cases for emotional AI beyond automotive. Affectiva itself even made a deal with Softbank to supply AI to its robots. Do you think automotive is still where the greatest opportunity for emotional AI lies?

Mahmoud:: There are a lot of markets that we can deploy general emotion recognition or emotional AI in. We actually don’t see automotive as a very distinct market from things like robotics. And the reason why that’s the case is because there is a lot focus going into the HMI [human-machine interface] in the car these days.

Traditionally, OEMS didn’t really focus on the HMI and you saw very ugly HMIs in the car that were not really intuitively designed. Recently, there’s a lot of focus on how the HMI in the car is going to have to change. With the push towards more autonomy if the HMI in the car is not intuitive the driver is just going to switch to the next intuitive HMI they can interact with, which is their cellphone.

And you see a parallel to that in the cellphone market where there is a lot of focus on the UI because that’s the main differentiator in terms of different hardware manufacturers. This is very similar to robotics because a robot’s human-machine interface is the thing that would benefit most from having emotion recognition. The cool thing about automotive is that you get to test HMIs at larger scale because the robotics market is still a limited market in terms of deployment.

DN: In your talk at Drive you spoke about how context is a very important aspect in regards to emotion recognition. Can you elaborate on this?

Mahmoud:: There is definitely a lot of research that we’ve been doing with partners with regards to how to translate different facial muscles to emotions in specific contexts. The thing is the context really matters as far as detecting emotion. With frustration for instance, one of the fundamental facial expressions is a smile, which is counterintuitive, but people do smile when they are frustrated.

Context is why a hybrid and multi-modal approach is important. You can have some of the machine learning detecting things like how people express a smile or how people move their facial muscles, but then there needs to be a layer on top of that that takes into account some of the context in order to understand the difference between frustration or just a smile.

DN: Would you say context recognition is the big missing component of emotional AI right now?

Mahmoud:: I think we’re still far off of from a kind of human intuition as far as having an AI being able to just analyze all of these different signals and understand emotion. But this is an active area of research. But for the emotion recognition models you just have to understand the context that you are deploying them in and what they are trained on, which is very similar to any machine learning model you could think of.

*This interview has been edited for content and clarity.

Chris Wiltz is a Senior Editor at  Design News covering emerging technologies including AI, VR/AR, blockchain, and robotics.