Research Overview

Current Research

The core question driving my research is: How can we make AI models more fair?

This overarching question expands into several sub-questions:

What does “fair” mean in the context of AI models?
What happens when a model is unfair, and what causes such unfair performance?
How can we improve the fairness of AI models?

Although fairness and explainability are often treated as separate topics in responsible AI, I have found them to be closely interconnected. A key insight from my research is that without correctly reasoning about bias, mitigation efforts will not succeed. Therefore, explaining the sources of unfair performance becomes a crucial step toward improving fairness in AI models.

My current research can be categorized into the following three sub-topics:

Bias reasoning

Understanding and diagnosing AI model behavior to improve robustness from a fairness perspective. In particular, we investigate biased performance between genders in automatic diagnosis from X-rays and identify the source of bias by systematically ruling out potential causes (including underrepresentation, physiological differences, and shortcut learning) (Weng et al. 2023). We found that the bias arises from shortcut learning. Specifically, the model overrelies on the presence of chest drains rather than actual disease features. Because chest drain occurrence differs across genders, this shortcut leads to biased performance. We subsequently implemented a targeted mitigation strategy to address this issue (Olesen et al. 2024).

Revealing and detecting shortcut learning with explainability methods

Since step-by-step reasoning to identify bias sources is time-consuming, we first explored building an automatic pipeline using generated counterfactual visualizations to detect shortcut learning (Weng et al. 2024). This idea later expanded into a broader line of work on dataset discovery through clustering-based methods (ongoing and unpublished). We also investigated shortcut learning in segmentation tasks (Lin et al. 2024), which has been largely overlooked in the community, and demonstrated simple yet effective solutions to mitigate bias. These findings further reinforce that identifying the cause of bias allows for much more efficient bias mitigation.

Explainability in diffusion-based generative models

Research on explainability and bias assessment in generative AI remains limited. I am therefore exploring the interpretability of diffusion-based generative models, which could help detect shortcut learning by improving our understanding of their internal dynamics. To this end, we developed an intrinsically explainable diffusion-based generative model for image generation (Weng, Feragen, and Bigdeli 2025).

Future Research Directions

I’m keeping this part as bullet points for the moment and will polish it later.

I am particularly interested in the following directions:

When and why bias reasoning matters for bias mitigation.
Methods for automatic model debugging and dataset discovery.
Bias sourcing in multi-modality settings.
Responsible AI for real-world applications and value alignment.

References

Lin, Manxi, Nina Weng, Kamil Mikolaj, Zahra Bashir, Morten BS Svendsen, Martin G Tolsgaard, Anders N Christensen, and Aasa Feragen. 2024. “Shortcut Learning in Medical Image Segmentation.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, 623–33. Springer.

Olesen, Vincent, Nina Weng, Aasa Feragen, and Eike Petersen. 2024. “Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis Using Slice Discovery Methods.” In MICCAI Workshop on Fairness of AI in Medical Imaging, 3–13. Springer.

Weng, Nina, Siavash Bigdeli, Eike Petersen, and Aasa Feragen. 2023. “Are Sex-Based Physiological Differences the Cause of Gender Bias for Chest x-Ray Diagnosis?” In Workshop on Clinical Image-Based Procedures, 142–52. Springer.

Weng, Nina, Aasa Feragen, and Siavash Bigdeli. 2025. “Patronus: Bringing Transparency to Diffusion Models with Prototypes.” arXiv Preprint arXiv:2503.22782.

Weng, Nina, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. 2024. “Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation.” In European Conference on Computer Vision, 338–57. Springer.