Advancements in Computer Vision Emotion Recognition: Benchmark Results Across Multiple Evaluation Scenarios

Three task-oriented visual emotion recognition models optimized for real-world environments

Key Takeaways

· Multi-scenario emotion recognition capability
MinsightAI developed three visual emotion recognition models optimized for baseline emotion detection, aggression-related emotion detection, and stress detection tasks.

· Stable performance in simulated real-world scenarios
The models demonstrate consistent performance across both close-range and high-angle monitoring environments.

· Strong generalization across datasets
Evaluation on public datasets shows stable performance in both accuracy and F1 metrics.

Abstract

Emotion recognition is a core research area within Affective Computing and a key enabling technology for natural human–computer interaction.

With advances in Computer Vision (CV) and deep learning, emotion recognition models have achieved significant improvements in both accuracy and scalability. However, real-world environments present persistent challenges, including:

· complex lighting conditions

· camera angle variations

· individual differences in emotional expression

This article presents MinsightAI’s latest research progress in visual emotion recognition, including three specialized CV emotion recognition models optimized for different tasks.

The models are evaluated using simulated operational datasets and public benchmark datasets to assess their robustness and generalization capability.

Technical Background: Emotion Recognition in Real-World Environments

In practical deployments, emotion recognition systems must operate across diverse camera setups and application scenarios.

Close-range scenarios

Examples include:

· customer service quality inspection

· pre-employment psychological screening

· mental health evaluation

In these cases, facial details are clearer but emotional changes may be subtle.

High-angle monitoring scenarios

Examples include:

· campus security monitoring

· public safety surveillance

Here, faces may appear smaller due to camera distance and may be affected by resolution, lighting, or viewing angle.

To evaluate model performance in these environments, MinsightAI designed a systematic evaluation framework based on simulated operational datasets and public benchmarks.

Model Architecture

The research includes three specialized visual emotion recognition models:

· Baseline Emotion Recognition Model

· Aggression Emotion Recognition Model

· Stress Emotion Recognition Model

Each model is optimized for specific operational requirements.

Evaluation Methodology

Simulated Operational Dataset

Due to privacy and security considerations, the evaluation uses a simulated operational dataset designed to approximate real-world environments while preserving data safety.

The dataset includes variations in:

· age groups

· lighting conditions

· camera angles

· emotion intensity levels

This design allows realistic evaluation without exposing sensitive data.

Evaluation Metrics

Two primary metrics are used:

Accuracy

Measures the overall percentage of correct predictions.

F1 Score

The harmonic mean of precision and recall, providing a balanced evaluation of model performance.

Higher F1 values generally indicate a better balance between false positives and false negatives.

Experimental Results

Baseline Emotion Recognition

In simulated operational datasets:

· Close-range accuracy: 88%

· High-angle monitoring accuracy: 84%

The model demonstrates stable performance, particularly in recognizing negative emotional states under complex conditions.

Aggression Emotion Recognition

This model focuses on detecting emotions associated with potential aggressive behavior.

Performance results include:

· Close-range accuracy: 97%

· F1 Score: 94

The model maintains strong detection capability even at high-angle scenarios

Stress Emotion Recognition

Performance metrics include:

· Close-range accuracy: 95%

· F1 Score: 94

· High-angle monitoring accuracy: 90%

The model maintains a good balance between precision and recall, making it suitable for psychological screening and stress monitoring.

Public Dataset Evaluation

The models were also tested on widely used public datasets, including:

· RAF-DB

· DFEW

Evaluation results show:

· stable accuracy performance on RAF-DB

· strong F1 performance on DFEW, particularly for negative emotion detection

These results suggest good cross-dataset generalization capability.

Technical Characteristics

Across multiple evaluations, the MinsightAI CV emotion recognition models demonstrate several strengths.

Scenario Adaptability

Models are optimized for both close-range and high-angle camera environments.

Balanced Performance

The models maintain a strong balance between precision and recall, reducing both false positives and false negatives.

Generalization Ability

Public dataset evaluation confirms stable performance across different data distributions.

Efficient Deployment

Low computational cost and fast inference make the models suitable for real-time applications.

Application Significance

As emotion recognition technologies mature, their potential applications continue to expand, including:

· public and campus safety monitoring

· customer service quality management

· mental health screening

· workplace stress monitoring

Reliable emotion recognition can provide valuable auxiliary insights for decision-support systems.

Conclusion

Emotion recognition technology is gradually transitioning from laboratory research to real-world deployment.

Through continuous optimization of algorithms and datasets, visual emotion recognition models are becoming increasingly capable of operating in complex real-world environments.

MinsightAI will continue advancing multimodal affective computing technologies, exploring higher accuracy models with stronger generalization ability to enable broader real-world applications.