This reminded me of Youtube videos that are one stock video after another: the emotional moments didn't register as real or have a feeling of continuity as the story unfolded.
Amazing, especially for $500 - but this feels like Fiverr Pixar to me, even in this advancing state of the art.
I wonder if anyone has tried to replicate a AAA movie scene from a prompt, or it would be interesting to try and do a whole movie. I’d want to see the side by side, but the Only problem is since that movie might be in its training data, it might not make a good test.
That's what I've noticed: if you reference something real that exists in an LLM's training, it will cling onto that because it then has something credible to work from.
On the other hand, it is also challenging to accurately describe an AAA movie scene in any terms where the AI won't then connect the dots to a familiar scene from an AAA movie and incorporate those details.
... or refuse to do the work if it recognises the scene/movie/artist (etc).
it's way more than 500 when you account for the month of extreme editing by the author. you probably couldn't hire him for a week for 500 let alone half a month of overtime.
Joanna Stern at the Wall Street Journal presented a project earlier this year where she and a team of editors created a short film using AI tools - not exactly the same thing here but the results were very good. They had a bigger budget too.
You can see it here: https://www.youtube.com/watch?v=US2gO7UYEfY
In their case, they interspersed live actors with AI-generated imagery.
From YouTube, which, I second:
"Pinned by @hashemalghailiofficialchannel @philipashane 4 days ago (edited)
I’ve been wondering when this day would come, when we’d see an AI film that was just a damn good film, without the distraction of AI blemishes. This is well written, well directed, well edited, just about everything is top notch. The “acting” isn’t stellar but nor is it bad. This is very impressive and a landmark achievement, kudos to you."
> without the distraction of AI blemishes
Maybe I'm just detail oriented, but "police" wasn't even spelled right on the officer's sparse uniform. That isn't even half way into the movie, and by then I'd spotted dozens of other weird AI details.
Fair enough.-
I think the point being not "no blemishes" but "story and execution good enough for blemishes not to be distracting" still stands.-
I...don't. The story was fine, and the execution was understandable given the state of the tooling, but viewed as a film and not a tech demo of advances in what is acheivab;e with modern AI tools it's not great. Many of the voices have the same very noticeable robotic features, and the delivery, whether narration or diegetic dialogue, is monotonous; the "angry crowd" is almost the only place in the whole work that speaking voices appear impacted by emotion, and even that feels off. The scenes have consistent, very limited range of lengths and a very limited palette of simple continuous camera movements, consistently using one per clip.
Even though the mockumentary format is an excellent choice for minimizing the impact of several of those problems, they are still pretty glaring there, even if less so than if you tried to make literally any other style of film with the same techniques.
It reminded me a lot of many video game cut scenes. It is still hitting uncanny valley a lot. eg. Voice acting recorded phrase by phrase with limited context and odd pacing better suited for the stage. Acting by puppets with somewhat inhuman movement.
I really appreciate your insightful look into this, and will again view the video with an eye to these issues you point out.-
PS. We might be a tad beyond "train arriving at the station" territory, at least - that much can be granted methinks.-
The point is that if you are looking for any amount of details, you will notice this was full of AI blemishes.
The common word "POLICE" misspelled with non-letters on an otherwise empty uniform was an obvious one, but so was the historical singer's face changing in just the first few cuts.
This short is an amazing achievement, and (at least to me) a very skillful and clever use of AI. But I don't think it's a good film. And that has nothing to do with AI blemishes. If it had been made shot-for-shot without AI, I probably wouldn't have watched til the end. If I had to put my finger on it, I'd say we spend way too long with a character (the only character) we don't ever really know. The sort-of movie reel concept that keeps her at a distance could work, but I think it would need to be cut way down, maybe half the time. Cutting the clone comeback (which doesn't really advance the plot) would save 4 or 5 minutes.
The film is about us not accepting a clone for the original. The massive irony is that the film is likely going to generate the same response from the commenters.
At a meta level it is also about LLMs/AI generated content too, the twist at the end makes that clear.
from the reddit post:
AI tools used to make this short film: Image Generation: Whisk, Runway, Midjourney, Dreamina, Sora Video Generation: FLOW & Veo 3, Dreamina, HIGGSFIELD, Kling AI Voice Generation: ElevenLabs Lip Sync: FLOW & Veo 3, Dreamina, HeyGen Music Generation: Suno AI Sound FX Generation: MMAudio, ElevenLabs Prompt Optimization: ChatGPT
I’m curious if there’s a limit in how good AI can get at movie making. I think it will take revolutionary new algorithms/tech.
This video is a great example. Looks great, sounds great, but also looks like a really good amateur found a bunch of clips on a stock video site and edited them together, probably because stock video is a really plentiful source of learning data. The interviews look the best, but again, lots of interviews in the training data.
When you combine the skill it takes to generate good prompts, with the lack of sufficient training data, I’ll just say I don’t think Christopher Nolan has anything to worry about just yet. Maybe Wes Anderson does though.
It is impressive technically but I think the whole plot and story details are pretty bad.
The gum just doesn't work for me. A black and white mega popular white female jazz singer doesn't really make sense. Maybe a Judy Garland type singer would work but she is singing a style that I don't think makes sense. Like someone making what they think jazz vocals should sound like but they don't really listen to much jazz. Billie Holiday wasn't even that popular.
Just like the black and white part doesn't work for me because you can tell it is just the same color clips but desaturated. While real black and white would be on film and look shot on film.
I think the AI stuff is actually pretty good but the director/human creativity here is actually what is not that good. The sound design and music are pretty bad.
I am waiting to see what Aronofsky can do with these tools since the studios won't let him set 30 million dollars on fire again like with The Fountain.
I wouldn't be surprised if the video models were vastly undertrained compared to our text models. There's probably millions of hours of video we haven't used to train the video models yet.
Still seems like early days on this tech. We're nowhere near the limits.
Just a year ago we could only create the distorted video of Will Smith eating spaghetti. A year from now this is going to be even more flawless.
But what does flawless mean, how is this not flawless? I see very few “flaws” in this. But the comprehensiveness of the video training space is probably just miniscule compared to photo and text.
I don't think Wes Anderson has anything to worry about either, it isn't only panning shots in pastel colors.
The reason it looks like many joined clips is because long-form video generation is currently not possible. Most SOTA models only allow generating a few seconds at a time. Past that it becomes much harder for the model to maintain consistency; objects pop in and out of existence, physics errors are more likely, etc.
I think that these are all limitations that can be improved with scale and iterative improvements. Image and video generation models are not affected as much by the problems that plague LLMs, so it should be possible to improve them by brute force alone.
I'm frankly impressed with this short film. They managed to maintain the appearance of characters across scenes, the sound and lip syncing are well done, the music is great. I could see myself enjoying this type of content in a few years, especially if I can generate it myself.
> I’ll just say I don’t think Christopher Nolan has anything to worry about just yet.
The transition will happen gradually. We'll start seeing more and more generated video in mainstream movies, and traditional directors will get more comfortable with using it. I think we'll still need human directors for a long time to come. They will just use their talents in very different ways.
This is probably the worst it's ever going to be?
That’s not incompatible with a ceiling. I’m not sure what the point you’re trying to make is?