top of page
  • christophheilig

AI, Diversity, and Marginalized Perspectives

A couple of days ago, Google’s release of Gemini 1.5 Pro made headlines. While it comes with a standard 128,000 token context window, a limited group of developers and enterprise customers can already try it with a context window of up to 1 million tokens. That this might be indeed “revolutionary” (an adjective used way too much over the last 1.5 years, to be sure), is indicated by Google’s report concerning the LLMs performance in “needle-in-the-hay” tests. Now, I reserve judgement for the moment because Google has not been that reliable in the past when it came to providing reliable assessments of the capabilities of its AI systems. But if these claims turn out to be reliable, this would indeed open up a wide field of potential applications of LLMs in everyday life that would be highly commercially viable. Just think about the last time you had a fight with your spouse about who said what earlier in the day in a conversation – well, this will no longer be a problem with Gemini. You could just record your entire day of interactions and then ask “who said x?!” and get a quick and reliable response.

The convenience that will come with such applications actually makes me quite worried – once they are implemented and we get used to the benefits, I don’t think that we will have meaningful discussions about privacy and such issues any more. And this is why I think we need to creatively imagine our potential futures right now – to prepare ourselves for the realities that are approaching. Journalists who simply parrot Sam Altman’s press releases, politicians who are incredibly slow and often miss the mark won’t get us there. Scientists too face the problem that our apparatus is very slow, with the acquisition of third-party funding taking years, not to speak of the limited success of research results reaching the public anyway. And this is why I think why creative minds are challenged to step up to the plate now and to help us simulate the futures that we want to avoid and the futures that we want to chase!

Anyway, these fundamental questions were unfortunately overshadowed by Google's significant misstep last week with respect to implementing a sensible diversity policy for the generation of pictures of people in Gemini. I think by now you have all seen the images of black Nazi leaders and the like. They are all over the place. And indeed, I think the outrage was entirely justified – though, in my opinion, for reasons different from those I've seen on social media. I do not think that a marginalization of white dudes is something like an imminent danger for our western societies. It's not that I need to see even more white males to feel comfortable, being just such a human being. My problem with this diversity policy is that it actually erases diversity. For against the backdrop of this visual absurdity, historically accurate visual representations of marginalized figures and groups must seem like a joke! Just give us a dark-skinned Jesus and Asian railroad workers. Help us recognize our biases that make us overlook these voices and perspectives that have been pushed to the sidelines all throughout history – and don’t give people reasons to look down on such important challenges as mere virtue signaling and excessive wokeness.

And even more generally, I think that the big tech companies haven’t really thought through how complicated their well-meant diversity policies actually are. Take, for example, Jack Krawczyk who stated in a tweet that Google is working on fixing the problem of “inaccuracies in some historical image generation depictions.” However, he also insists on the general principle to “design our image generation capabilities to reflect our global user base, and we take representation and bias seriously.” Hence, Google “will continue to do this for open ended prompts (images of a person walking a dog are universal!).” I am wondering, to begin with, how such a big company like Google couldn’t see the problem with black Nazi leaders coming. To me, this seems to be indicative of a more fundamental naivety. And this talk about “open ended prompts” seems to confirm this. There is, of course, no such  thing as a culturally neutral human activity! Depictions, thus, will always presuppose certain perspectives and prioritize certain conceptualizations over others. Just take the example of dog-walking itself. As a regular runner, who has been confronted by aggressive dogs and irresponsible dog-owners a couple of times (I can’t even get rid of my viewpoint in describing these occurrences, which of course proves my point), I have a very specific take on such a scene. And this does not even take into account fundamental social parameters such as religion (with, for example, dogs being generally viewed unclean in Islam). Any attempt to do justice to the global diversity of users, thus, will inevitably require a selection of very specific parameters and values for these parameters that are prioritized – which will inevitably result in the perception of discrimination. (Thankfully, dogs can’t talk. Otherwise, even they would probably object, arguing that the conceptualization of people “walking dogs” is in itself discriminatory,  given that it downplays their own agency.) My personal opinion, therefore, is that it would be best for multimodal LLMs to faithfully represent the biases that they encounter in their training material visually – and then to encourage reflection on what would appear familiar images to us through dialogue.

But be this as it may, I wanted to use this opportunity to share with you a way in which LLMs in my view can indeed be used as a tool to achieve the stimulating effect of “defamiliarization.” As part of our project on narrative perspectives at the University of Munich, I recently customized a GPT that re-tells biblical stories that users copy-paste into the chat – from a different perspective. More specifically, it is trained to look out for narrative characters in the scene (either explicitly mentioned or deducible), who have not been employed by the original storyteller as focalizers (through whose perspective the events are portrayed). You can access the GPT “Marginalized Biblical Perspectives” here if you have an OpenAI subscription. You may need to push it for vividness and historical plausibility at points but I found the initial experiments quite interesting for my own work. If you like a re-narration particularly well, make sure to share it with us on Twitter/X, using the hashtag #marginalizedperspectives.

86 views0 comments


bottom of page