Summary |
Developing technologically-integrated solutions is critical in realizing next generation AI applications. Our proposed framework integrates affective responses and contactless behavioral modeling technologies in a personalized multimedia recommendation system. We include structured crowd user feedback, audio-visual contents, and multimodal reaction extracted from video, speech, and millimeter wave (mmWave) signals. The movie contents and facial reactions are learned as representations using deep intra-genre projection network. Besides, we model breathing and heart rates using the mmWave radar in wide 5G frequency bands, which provides real-time physiological profiles of the audiences. Finally, we develop a machine learning framework to predict users's preference about multimedia contents, including movie, music, TV shows etc. Our next generation AI-based multimedia recommender solution provides a more accurate, personalized, and non-obtrusive manner. |
Scientific Breakthrough |
The scientific breakthroughs of our integrated system are demonstrated in three parts. First, audio-visual contents and reactions are embedded using deep projection achieving 79.07% in predicting box office. Second, we use multi-degree Fast Fourier Transform for mmWave signal to extract the breathing and heart rates. Third, our recommendation system uses collaborative filtering with both user feedback and multimodal reaction which improves the traditional structured features-based recommendation. It predicts preferences and content clusters simultaneously and obtains state-of-the-arts 67.61% accuracy with only 0.69% of comparisons (labels) on Last.fm music dataset. Moreover, distributed learning is proposed for faster data collection and stronger privacy protection for the end users. |
Industrial Applicability |
Recommendation system and emotion recognition services have massive markets potentials for multimedia platforms, advertisements, film, and gaming entertainment industries. Industry leaders are developing AI technologies for continuous business expansion. Disney analyzes the audience's reactions toward movie contents, and Snap detects user bored faces to refresh multimedia contents. It is reported that 3.58 billion market growth has been created from 2019, and there will be 26.12% estimated growth this year. Since our emotional and behavioral reaction integrated recommendation system can augment the current applications for both content creating entities and consumers, the multimedia industry can leverage our proposed solution with substantial future market opportunities. |