An exercise in frustration


There’s an anonymous facebook posting that’s been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn’t fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.
For example, I recently asked ChatGPT4:
Please generate an image of a tyrannosaurus inside a box. The box should be closed, and the tyrannosaurus should not be visible at all.
I got this completely reasonable response:
It sounds like you’re asking for an image that implies the presence of a tyrannosaurus without actually showing it, since it’s completely enclosed within a box. If the tyrannosaurus is not visible, what kind of details or themes would you like to emphasize about the box or the setting to suggest the dinosaur’s presence?
I confirmed:
No tyrannosaurus visible, just some air holes for its comfort.
And ChatGPT (using DALL-E3 for the image generation) gave me this:

It’s clearly not quite there – it’s a weird box, and the symbols on the side are particularly confusing. Are those squid? A perching flycatcher? Rolling pins?
ChatGPT has a feature where you can erase some parts of the image and submit more text describing how it should change those images. So, I blocked out the line of symbols and asked:
Replace these symbols with a silhouette of a tyrannosaurus and the word “DANGER”.
ChatGPT produced this:

Which is sort of what I asked for, but that doesn’t really look like a tyrannosaurus. I erased that part of the box again and asked for more revisions:
The T-rex should be facing the word “DANGER”. Its silhouette needs to have the powerful, heavy look of a T-rex, especially in the head and neck.
ChatGPT gave me:

The dinosaur is facing away but it’s still not facing the word “Danger” and it’s still not a very good tyrannosaurus silhouette. I asked for revisions again.
Keep the heavy dark font, but go back to the red warning text color. The t-rex still needs to face toward the words, not away from it, and its lower jaw is too heavy.
ChatGPT responds with:

Which… no. That does not count as facing toward the text, the lower jaw is still not right, and does the word still say “danger” or more like “dandier”?
The T-rex needs to be beside the text, not on top of it. And what’s up with the extra spiky hump on its shoulders? Its lower jaw is still much too heavy.
ChatGPT responded with this:

My response:
… that’s perfect. Now can you keep everything – textures, lighting, materials – the same but make the box appear much larger? Right now it looks like it’s only knee height. It needs to be large enough to hold a T-rex.
ChatGPT’s revision:

It seems like the more revisions I ask for, the worse the result becomes. If I allow it to revise the entire image, the whole process starts over.
Here’s a similar revision process, this time attempting to revise my way toward prettier pies.
Me: Please generate an image of a giraffe entering the great british bakeoff.

Revision 1: I erased the counter and asked for the giraffe to be making a cherry pie with intricate latticework.

Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.

And when I don’t restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it’s raining outside the tent. Not only did it completely change the image, but it’s not even raining.

In this image I asked for a snake making a painting of a bear, then erased the painting and asked for the bear to be standing on all fours. The revision is much worse.


Image description: On the left, a rainbow-colored stake is coiled in font of an easel. (The snake’s coil splits in two so I guess it’s a fork-tailed snake). On the easel is a painting of a seated bear. It’s fairly realistic. On the right, the bear has been replaced by a smaller, much worse, replica of the snake. There is no bear.
Here’s another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.


Image description: In the original image on the left, the deer looks a bit like a plastic mannequin, but fits lighting and shading wise with the rest of the grocery store scene. In the revised image on the right, the deer is now a fawn with strange flat shading, superimposed on the scene rather than standing in it. Its eyes are strangely liquid and 3D, but the rest of the fawn is staring straight on at the camera with no apparent depth. Where the grocery store aisle background has been revised, the shelves have lost all detail and resemblance to a grocery store, and the ceiling light fixture now descends into the floor.
Here’s a one-shot, in which I asked ChatGPT/DALL-E3 to generate a damaged fresco of Jesus, and then restore the fresco to as good as new.



Image description: Image on the left appears to be a damaged fresco of Jesus (with telltale long fingers that are connected to both hands). Image in the middle is the same except for Jesus’s face and halo, which no longer appear damaged. However, Jesus’s face is now weirdly smooth and shiny and doesn’t fit with the rest of the painting any more, and his eyes are weird blue and black spirals. Image on the right has Jesus’s face in a completely different, almost airbrushed, style, and his eyes don’t remotely match any more.
So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system – or go in with a mentality that anything that fits the brief is good enough.
There’s also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.
I’m not surprised that there are some places looking for cheap filler images that don’t mind the problems with AI-generated imagery. But for everyone else I think it’s quickly becoming clear that you need a real artist, not a knockoff.
Bonus content: a unicorn goes hilariously bad