In this talk, I will present the approach we have taken, as AI experts, to contribute to a goal of social justice with AI. The goal is to progress towards fairer gender representation on screen, and we consider detecting complex forms of bias in visual media. I will describe the pipeline we follow from problem spotting, task definition, data creation, model design, evaluation and socio-technical implementation. I will present the MObyGaze dataset we have introduced to the multimedia community, and discuss the design of models to learn from this unique dataset, tying multimodal explanations to an interpretive task label. We will specifically focus on multimodal trustworthiness and the design of explainable concept-based models suited to such interpretative tasks.