I'm no image gen expert but these prompts are downright terrible even by my standards.
Are you really complaining that ", from the British Museum." leads to it a painting in the actual British Museum? Just remove the sentence, and you'll be fine. Now good luck trying to make Midjourney place the image at the museum!
I'm a paying MJ user and am impressed by Nano Banana. They're different models. They each serve their purpose.
This analysis is just noise. Yawn.
Ironically, even an LLM with its fake reasoning capabilities can point out the issue with the prompts if you ask it to critique this article.
It is interesting what the nbp model takes away from the prompt, though
Eg instead of focusing on the artist, it focuses on the location
This makes sense! I imagine it was trained in some sort of rlvr like way where you give it a prompt and then interrogate "does this image ..." (where each question examines a different aspect of the prompt)
It's obviously an incredible model. I think there's a limit to how useful another article praising it is in contrast with one expressing frustration
I would also welcome someone writing a short takedown where they fix the prompts and get better-than-2022 results from nbp
> I would also welcome someone writing a short takedown where they fix the prompts and get better-than-2022 results from nbp
NBP (and the new ChatGPT generator) are integrated with LLMs to various degrees, so seems like the obvious starting point is a reverse approach: ask them to describe the old images which has the esthetics that Fernando Borretti likes, and start generating from those prompts. If you can recover the old images, then it was just a prompting issue. ("Sampling can show the presence of knowledge but not the absence.") If you can't even with their own 'native' descriptions, then that points to mode-collapse (especially all of the 'esthetic tuning' like DPO everyone does now) as being the biggest problem.
These sorts of prompts used to be quite important when DALL-E was new. I do feel like a lot of the article is just that prompts should be written differently though I think there’s some truth in the idea that nanobanana feels less artistic in some ways.
The author is using special prompts exploiting flaws of the old models, and doesn't like that new models interpret the hacks literally instead.
The new models have prompt adherence precise enough to distinguish what "British Museum" or "auction at Christie's" is from the art itself, instead of blending a bag of words together into a single vector and implicitly copying all of the features of all works containing "museum" or "ArtStation" in their description.
The prompts bothered me a lot, too. I don't do a lot of work with AI, but
> A painting sold at Sotheby's
and
> A painting in the style of something that would be sold at Sotheby's
convey very different meaning (to me).
Eno applies:
> It's the sound of failure: so much modern art is the sound of things going out of control, of a medium pushing to its limits and breaking apart. The distorted guitar sound is the sound of something too loud for the medium supposed to carry it. The blues singer with the cracked voice is the sound of an emotional cry too powerful for the throat that releases it. The excitement of grainy film, of bleached-out black and white, is the excitement of witnessing events too momentous for the medium assigned to record them.
And
> "By the time a whole technology exists for something it probably isn't the most interesting thing to be doing."
Where did you get this from? Searching for it, in a weird irony I guess, just leads me back to this post.
I recognize it as a quote from A Year With Swollen Appendices, which is a great read even if you aren't an Eno fan (although I am, which admittedly makes me biased :P)
Years of refinement on the taste of people with no taste has produced a model with no taste. Crazy
it's not shocking that this is the result of "art" from people that think complexity and accuracy are the only qualifying factors.
I tasted the model, but then I spit it right back out.
they put a special coating on the model to discourage this behavior
Ah, that explains it.
Lol, yeah.
It's ridiculous lol.
Midjourney is optimized for beautiful images, while Nano Banana is optimized for better prompt adherence and (more importantly) image editing. It should be obvious for anyone who spent 20 minutes trying out these models.
If your goal is to replace human designers with cheaper options[0], Nano Banana / ChatGPT is indefinitely more useful than Midjourney. I'd argue Midjourney is completely useless except for social media clout or making concept art for experienced designers.
[0]: A hideous goal, I know. But we shouldn't sugarcoat it: this is what underpin the whole AI scheme now.
It is what has underpinned all of human progress towards automation. It isn't a bad thing. Every time we automate something the luddites cry out about the coming mass unemployment. It has never happened.
>Every time we automate something the luddites cry out about the coming mass unemployment. It has never happened
It has happened each and every time, it just haven't affected you personally. Starting of course with the original luddites - they didn't complain out of some philosophical opposition to automation.
Each time in changes like this a huge number of people lost their jobs and took big hits in their quality of life. The "new jobs", when they arrive, arrive for others.
This includes the post 1990s switch to service and digital economies and outsourcing, which obliterated countless factory towns in the US - and those people didn't magically turn to coders and creatives. At best they took unemployment, big decreases in job prospects, shitty "gig" economy jobs, or, well, worse, including alcohol and opiods.
With AI it's even worse, since it has the capacity to replace jobs without adding new ones, or a tiny handful at a hugely smaller rate.
Strictly speaking outsourcing to cheap labour isn't automation.
It literally happens every single time - people DO lose jobs. They might get new jobs, but they definitely lose their old ones.
And not everyone gets new jobs, because usually the new job is fundamentally different and might not be compatible with the person or their original desire out of their employment.
The problem isn't so much automation, but that the benefits of automation are invariably reaped by a few tech CEOs. It's not society in general that benefits, it's that the rich get richer, and the rest of us barely scrape by. If wealth were evenly distributed, nobody would bat an eyelid at AI.
AI is not the problem. Late-stage capitalism and wealth disparity is.
It has happened. There is a related term we use which is related to a historical fact .. see https://en.wikipedia.org/wiki/Luddite
GP is saying mass unemployment caused by technology hasn't happened, not that the Luddites weren't a real historical group.
Correct, and I am saying the Luddites were a group of people that suffered mass unemployment following a technological change. Specifically, the luddites were a group of 19th century textile workers that were left out of work due to the introduction of automated machinery in the textile industry. In other words, they are a perfect example of what GP claims hasn’t happened.
A small group is not "mass unemployment" -- that's the point.
> In a British textile industry that employed a million people, the [Luddite] movement’s numbers never rose above a couple of thousand.
https://www.cam.ac.uk/research/news/rage-against-the-machine
The "never rose above a couple of thousand" small group refers to the number of activist Luddites. It doesn't refer to the people working in the textile industry in general - which was a big group, and which was heavily affected.
What other automations have been hyped to automate and replace so many different types of jobs at once?
Whether or not it comes to fruition, it's making large portions of society feel uneasy, and not just programmers, or artists, or teachers.
The steam engine, for example
Not finding a lot of sears and roebuck ads for steam engine driven girlfriends.
You’re got the wrong catalog.
The promise is to automate the drudge work, freeing people to pursue their passions.
Like, you know... creating art.
But most work IS drudge work and the automation causes new different drudgery. Use to be you could dictate a letter and someone from the typing pool would clean it up, proof it, and send it. Now those same people get to write their own crappy email themselves
Art will be created like AI - like it already got its hands on graphic design, and game art, and vfx, and music.
It will leave not-yet-automatable grudge work to people instead.
I mean...
There's the concept, and then there's the painting.
AI slop from a generic prompt is not the same as "using AI to get my concept in physical form faster."
Imagine, for example, a one-man animated movie. But, like, with a huge amount of work put into good, artistic, key-frames; what would previously have been a manga. That's possible, soon, and I think that's huge and actual art.
> what would previously have been a manga
Completely out of touch to downplay the entire manga industry as "skill issue".
Akira Toriyama totally created Dragonball as a manga because he was just wasn't good enough to make an animated movie!
Berserk is a book because Kentaro Miura just had skill issue!
Only imagine if Tolkien wanted to create the Lord of the Rings if he had AI!
As if a medium only artistic merit because sufficiently advanced technology just didn't exist yet. groooaaaaan
Except all the manufacturing jobs got shipped overseas and now those people are Walmart greeters or similar unskilled labor. Having a shit job isn’t unemployment but it’s not a huge step up
That isn't what happened. American jobs are more productive than ever. Americans are richer than ever. The modern luddites dramatically underestimate how bad the past was.
> Americans are richer than ever.
By what metric? One way is to look at the Gini coefficient - that’s worse than ever.
The bottom 20% has 2-3% of total net worth in the US. The middle 40% has seen a decline from 36% in 1989 to 28% in 2020. The top 0.1% has seen their net worth capture double from 7% in 1989 to 14% today.
The subtle thing that net worth ignores of course is inflation from growth in costs, so actually it’s harder for most people than in 1989, unless you’re talking about the ease of buying a TV or phone. Technology is more available and cheaper than ever but food and medicine is more expensive than ever.
> Every time we automate something the luddites cry out about the coming mass unemployment. It has never happened.
It has happened every single time.
While I don’t disagree with the author, these are simply two completely different tools with different use cases. Nano Banana Pro throws out fantastic images you can actually use in your marketing right away. It’s not an art tool - it’s a business tool
As long as the older tools still exist to make art, I don’t see what the problem is. Use NBP to make your marketing pics, MJv2 for your art
You’re definitely on to something, people wouldn’t criticize as much as they are otherwise, they’d ignore it.
I think the whole point is that in optimizing for instruction following and boring realism we’ve lost what could have been some unique artistic elements of a new medium, but anyway.
The author's prompts are fighting against what Nano Banana was optimized for. Saying "British Museum" to MJv2 worked because it blurred all images tagged with museums into the aesthetic. NBP interprets it literally: show me something IN a museum.
This isn't worse - it's different. MJv2 was a happy accident machine. NBP is a precision tool.
If you want the coarse aesthetic, prompt for it: "rough brushstrokes, visible canvas texture, unfinished edges, painterly, loose composition". NBP will give you exactly that because it actually understands what you're asking for.
The real lesson: we're in a transition period where prompting strategies that exploited old model quirks no longer work. That's fine - we just need to adapt our prompting to match what the model was designed to do.
Thanks ChatGPT. I’m wondering about the motivation to spam HN with LLM generated comments. Not the worst comments though.
Have to agree that is sounds GPT-generated. Why so many colons? And the incurable marketing-speak.
I don't think that comment is LLM generated. I would've certainly written it like that myself.
I love the inherent wonder and joy in this post around the original images.
I had similar feelings with art generation. The early midjourney was definitely impressionistic, and I just kind of like impressionism. It's cool how accurate these have become, but they also feel closer to uncanny or boring.
Maybe it's better that this author is using LLMs because they would be an immensely frustrating client for an artist. Asks for futurism: complains about getting it. Wants bright colors: refuses to ask. Parts of the request are supposed to be evocative and parts are supposed to be literal, who knows which.
Why does anyone serious about art want to make art with AI?
A large part of the magic of art is the human choices that go into it.
Prompting an AI and then filtering the results is a "human choice".
Two choices - one of prompt, one of the result, versus hundreds or thousands from the subject and composition, through the medium, to every single brushstroke, where one may have a significant meaning. To be improved upon when your skill improves as well.
This is more akin to going to a supermarket and buying peanut butter (prompt: peanut butter, filter by brand/price/taste). The product may be tasty and enjoyable but I am not impressed by that.
I don’t see splashes of primary color as more artistic. Anyway, what if you just ask it “more coarse”? I see impressive depth in the latest outputs, but as with all technically proficient performers, you might just have to consciously scale it back.
The problem is not in the image models rather the training data and its context. "British museum" for MJ is the image source, "British museum" is the setting for Nano Banana.
The author claims the old models are better at creating art than the new ones. I disagree; art requires consciousness and intent while this type of model is capable of neither.
I define art as something that evokes an emotion or feeling. I’ve seen people wax poetic about the ”meaning” of an imagine only to find out that the image was created synthetically.
Were those “feelings” not authentic?
If I see a cloud in the shape of my childhood dog and start to cry, is the cloud art?
Yes. The Earth and its formations are art. I disagree that art requires consciousness and intent, but those admittedly do improve its value [to me]. (For reference, I value AI content/art poorly and avoid it)
Everything is art, fantastic. I see nothing wrong with this definition.
We have at least established that very boring pieces, such as Andy Warhol's Empire, Kazimir Malevich's White on White, and John Cage's As Slow As Possible, are not art.
Bad code is still code. A painting of code is not code.
I think you're saying bad art is still art, but I'm unsure what to do with the second sentence. I'm toying with "an encoding of art is not art", which might mean that art has to be available to an audience.
I don't think it is about the feelings or emotions evoked in the observer. At least not in that generality. It only is, if there is an intention in the creating process of the art, that aims at evoking the emotions or feelings. Otherwise going by the more general definition, many everyday objects become art. Home becomes art. The way to the office becomes art, even if it completely sucks.
Is a car crash art?
A drawing/painting of a car crash certainly can be
https://www.etsy.com/listing/4329570102/crash-impact-car-can...
As can a photo of one (sorry, I don't have a good example of that).
And, both a camera and AI are an example of "using a tool to create an image of something". Both involve a creator to determine what picture is created; but the tool is central/crucial to the creation.
When I was about 12 a car crashed in my quiet street (somebody tried to drive it through a concrete fence), so the next day I sat in the street and did an ink drawing of the wreckage with a mapping pen nib. That was excellent art. Then I stole one of the gigantic suspension springs and took it home to use as a stool, which by some silly definitions was also an act of art. But this all evades the original question about whether the actual car crash is art for evoking feelings, or whether art in fact must involve pictures, or human communication, or what. It's one of the impossible definitions, along with "intelligence" and "freedom". I'm a fan of "I know it when I see it".
I would never argue that a painting of a car crash couldn’t be art. It’s funny your bringing up that a camera is a tool for creating art; I also hold photographic art in lower esteem than other kinds of visual art (though I still think some kind of photography can be art).
At a certain point, we need to be realistic about the amount of effort involved in artistic creation. Here’s a thought experiment: someone puts two paintings in a photocopier and makes a single sheet of paper with both paintings. Did that person create art? They certainly had the vision to put those two specific paintings together, and they used a tool to create that vision in reality!
> Here’s a thought experiment: someone puts two paintings in a photocopier and makes a single sheet of paper with both paintings. Did that person create art?
Yeah, it gets really murky there. For that specific thought experiment, I would say it depends on if it's something that people will see and think about and talk about, etc. For example, a collection of pairs of images of people that were assassinated over the years and an image of their assassin would certain get people talking (some in a good way, some bad).
When it comes to effort, I think that's only a factor, too; and not even necessarily a good one. There's art out there like
- Someone taped a banana to a wall (and included instructions for taping another banana to replace it)
- Someone (literally) threw a few cans of paint at a canvas and created something chaotic looking
Both of those things are "low effort" at first glance. But someone spent time thinking about it, and what they wanted to do, and what people might think of it. And, without a doubt, there's people that would refer to both as art.
It's going to be "creativity" (another hazy definition!) rather than effort, though. Photography, often said to be all about framing, seems very low effort. You might take one lucky snap. Then the effort can be claimed to be in years of getting ready to be lucky, which is a fair point, but that displaced effort isn't really in the specific photo. Besides, maybe you're a very happy photographer, loved every minute of learning your craft, and found it no effort at all, just really interesting.
Yeah, photography (editing aside) is about having taste and getting lucky. A good photographer can of course raise their odds of getting lucky, but still. There's some technique in there too, but that's really not all that complicated. That said, I think few things match a good photo. There's something about a photo subject being real that I find fascinating. A photo exhibition does not display the imagination of the photographers, but rather the incredible in the real world.
It does, however, display the photographers ability to say "hey, you should see this" and be right about it.
Perhaps it has to be a more sophisticated emotion, such as feeling tired of a hackneyed definition.
If someone lies and convinces you that a loved one has died and you cry, were those feelings authentic?
Art that provokes emotion in a cheap or manipulative way is often, if not always, bad art.
I'm pretty sure people have created images via random physical processes, then selected the best ones, and people have called it "art." That's no different than cherry picking AI generated images that resonate. The only difference is the anti-generative AI crusade being spearheaded by gatekeepers who want to keep their technical skills scarce in their own interests.
I think one could still point out a little difference: Random physical processes do usually not involve mix and matching millions of other people's works. Instead, something new in every aspect and its origin can emerge.
It feels like AI art is often just a version of: "I take all the things and mix them! You can't tell which original work that tree is taken from! Tiihiiihi!"
Where "tree" stands for any aspect of arbitrary size. The relationship is not that direct, of course, because all the works gen AI learns from kind of gets mixed in the weights of edges in the ANN. Nevertheless, the output is still some kind of mix of the stuff it learned from, even if it is not necessarily recognizable as such any longer. It is in the nature of how these things work.
Good title!
Just fucking by canvas, brushes and good quality oil paint. You need only five colours[1]. Cost you maybe 50-80 euros. And any mess you produce will give you more joy thanand shot produced by any clanker brain. Keep at it for few years, take evening classrs, look tutorials and you have learned yourself a skill. You can now travel to any majos art museum across the world and have a discussion with masters through their works hanging on the wall.
And you will also see how fucking sad and inferior all these ai images are. Really, trust me, please. There is more to art than this. There is more to life.
Is some kind of MoE or routing (but for image models obviously), depending on the prompt ask, a possible solve?
The OP would likely prefer Disco Diffusion if they want their art to remain coarse. Modern models possess advanced spatial understanding and adhere strictly to prompts, whereas the OP is using unstructured inputs better suited for older models with CLIP or T5 encoders that lack that spatial awareness. These legacy prompting styles are incompatible with Gen3 models that utilize VLMs as text encoders. If the OP wants to explore modern architecture, they should use Flux.2 with a LoRA or perhaps a coarser model like Zit if they prefer to rely solely on text conditioning. Nano Banana Pro requires extremely long and distinctive prompting to achieve specific aesthetics. His blog post shows a lack of understanding and a lack of adaption to modern architecture which would be fine if it wasn't that dismissive.
Here is an image from NBP with an adapted prompt for Italian futurism: https://imgur.com/a/4pN0I0R
and for Kowloon:
Peanut butter. Agree.
Another word for coarse is impasto technique, where the paint is so thick the painting-knife or brush strokes are visible and leave a pronounced texture (e.g. Van Gogh, Rembrandt).
Another cool prompt could be specific painting techniques (e.g. pencil shading, glaze) as if you were training an actual artist in a specific technique.
Just asked sora for an impasto image of a coca cola bottle. But it still came out looking like a coca cola ad/AI art. Super glossy, slick, meaningless. It didn't look like paint. (And the logo wasn't impasto, which I thought was interesting - I guess that logo's utterly ingrained in the model, it's seen it so many times).
AI doesn’t make art. The OP is trying to fit the square peg of their intuitive understanding about the art creation process into the round hole of generating it via AI
Correct! The process and struggle of creation is a large part of what makes art art. Removing friction from the process makes something artless.
Yes, but: when I was young I used to love photorealism and hyperrealism, which is super-smooth-and-shiny art that conceals its process in order to awe simpletons. Then I bought an airbrush, and then true color computer graphics happened, and soon after that I began to appreciate brush strokes and the texture of pen marks and the idea of the personality of the artist's hand. But that doesn't mean the process-hiding stuff is non-art, or even bad art. What's wrong with creating an amazingly convincing illusion, wasn't that always the goal, historically? Also there are no prizes for effort, and if your artwork is only struggle, I don't want to see it. Unless you're really badass about it.
I really like Cory Doctorow’s description of why it feels empty, quote:
“Herein lies the problem with AI art. Just like with a law school letter of reference generated from three bullet points, the prompt given to an AI to produce creative writing or an image is the sum total of the communicative intent infused into the work. The prompter has a big, numinous, irreducible feeling and they want to infuse it into a work in order to materialize versions of that feeling in your mind and mine. When they deliver a single line's worth of description into the prompt box, then – by definition – that's the only part that carries any communicative freight.”
OK, but then there's the possibility of reestablishing the bandwidth by selecting the output. If the artist selects one AI image from hundreds, that's like photography, or collage, or "found sculpture" if you can dig it. Then we can do away with the need for hundreds of versions by saying that the artist selected this image from among all the assorted sights seen during the day to frame as art and present to the viewer, and that's just like picking a preferred version from among hundreds, and thus is just like crafting an image. Tenuously. (This falls apart because the selectivity of the selection isn't good enough, I guess. But the process - throwing away bad ideas as you go along - is just like drawing.)
art without will is like street vomit: it might be pretty but it's just lumps of old content arranged how you'd expect. less than food; more a waste than a triumph. and it always smells the same.
the street vomit photographer is offering a bit more art through his choices but I can already see he makes poor choices
Sort of. It’s like selecting from hundreds of versions of a letter of reference that word the same three bullet points slightly differently. It still feels empty to me, but I guess that’s personal.
I reckon it's not personal, and you and Doctorow are objectively correct, but the explanation isn't great.
Art that takes tremendous effort but looks effortless isn't negated by my comment. The process and struggle is still there.