AI-enabled features such as gaze correction and facial adjustment provide a more natural face-to-face feel to virtual collaboration, the company says.
In recent months, organizations around the globe have transitioned to remote work due to the coronavirus. At the same time, many schools and universities have also adopted online learning curricula to mitigate the spread of COVID-19 on campus this fall. As a result, video conferencing has replaced traditional in-person experiences ranging from work to social activities, although these virtual platforms come with their own drawbacks and limitations. On Monday, NVIDIA announced a cloud-based video conferencing platform, Maxine, to enhance remote work, online learning, and more. This includes artificial intelligence (AI) features to provide a more natural in-person experience to virtual meetings.
SEE: TechRepublic Premium editorial calendar: IT policies, checklists, toolkits, and research for download (TechRepublic Premium)
“Video conferencing is now a part of everyday life, helping millions of people work, learn and play, and even see the doctor,” said Ian Buck, vice president and general manager of accelerated computing at NVIDIA. “NVIDIA Maxine integrates our most advanced video, audio and conversational AI capabilities to bring breakthrough efficiency and new capabilities to the platforms that are keeping us all connected.”
AI gaze-correction and face-alignment
Unlike in-person meetings, demonstrating face-to-face communication is slightly more challenging on Zoom, Teams, etc. To make “direct” eye contact on a video call, attendees will need to look directly into their webcam during meetings. While this will enable a more traditional interaction for other attendees, by looking directly at the camera and not the screen itself, people risk missing critical body language cues and other nonverbal communication.
SEE: Natural language processing: A cheat sheet (TechRepublic)
To assist, NVIDIA is tapping its generative adversarial networks (GANs) research. Maxine offers face alignment and gaze correction on video conferences. Gaze correction automatically adjusts the positioning of the eyes to “simulate eye contact,” making it appear as though a person is looking at the webcam even if they are in actuality staring at the screen. Face alignment takes this a step further and alters the positioning of a person’s face for a more realistic “face-to-face” feel during virtual calls.
Doing more with less bandwidth
Maxine uses AI to increase the quality of virtual meetings while reducing bandwidth demands. Rather than sending a “person’s entire screen of pixels,” Maxine’s AI software analyzes “key focal points” of individual attendees and then “re-animates the face in the video on the other side,” to reduce data transmissions and bandwidth needs. This AI-enabled video compression can decrease bandwidth consumption “down to one-tenth of the requirements of the H.264 streaming video compression standard,” per NVIDIA.
Additionally, teams can incorporate virtual assistants using language models to enable speech recognition during video meetings. This allows virtual assistants to take notes during calls, answer questions, and more. These assistants can also provide closed captions, call transcriptions, and translations to provide greater clarity for other attendees.