I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:
https://static.simonwillison.net/static/2024/pelicans-on-bic...
Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.
All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/
There's another important contender in the space: Hunyuan model from Tencent
My company (Nim) is hosting Hunyuan model, so here's a quick test (first attempt) at "pelican riding a bycicle" via Hunyuan on Nim: https://nim.video/explore/OGs4EM3MIpW8
I think it's as good, if not better than Sora / Veo
> A whimsical pelican, adorned in oversized sunglasses and a vibrant, patterned scarf, gracefully balances on a vintage bicycle, its sleek feathers glistening in the sunlight. As it pedals joyfully down a scenic coastal path, colorful wildflowers sway gently in the breeze, and azure waves crash rhythmically against the shore. The pelican occasionally flaps its wings, adding a playful touch to its enchanting ride. In the distance, a serene sunset bathes the landscape in warm hues, while seagulls glide gracefully overhead, celebrating this delightful and lighthearted adventure of a pelican enjoying a carefree day on two wheels.
What does it produce for “A pelican riding a bicycle along a coastal path overlooking a harbor”?
Or, what do Sora and Veo produce for your verbose prompt?
If Sora is anything like Dall-e a prompt like "A pelican riding a bicycle along a coastal path overlooking a harbor" will be extended into something like the longer prompt behind the scenes. OpenAI has been augmenting image prompts from day 1.
Hard to say about SORA but the video you shared is most definitely worse than Veo.
The Pelican is doing some weird flying motion, motion blur is hiding a lack of detail, cycle is moving fast so background is blurred etc. I would even say SORA is better because I like the slow-motion and detail but it did do something very non physical.
Veo is clearly the best in this example. It has high detail but also feels the most physically grounded among the examples.
The prompt asks that it flaps its wings. So it's actually really impressive how closely it adheres (including the rest of the little details in the prompt, like the scarf). Definitely the best of the three, in my opinion.
Pretty good except the backwards body and the strange wing movement. The feeling of motion is fantastic though.
I was curious how it would perform with prompt enhancement turned off. Here's a single attempt (no regenerations etc.): https://www.youtube.com/watch?v=730cb2qozcM
If you'd like to replicate, the sign-up process was very easy and I was easily able to run a single generation attempt. Maybe later when I want to generate video I'll use prompt enhancement. Without it, the video appears to have lost a notion of direction. Most image-generation models I'm aware of do prompt-enhancement. I've seen it on Grok+Flow/Aurora and ChatGPT+DallE.
Prompt A pelican riding a bicycle along a coastal path overlooking a harbor Seed 15185546 Resolution 720×480I mean, you didn’t SAY riding forwards…
I suppose if you reverse it would look okish
FYI your website shows me a static image on iOS 18.2 Safari. Strangely, the progress bar still appears to “loop,” but the bird isn’t moving at all.
Turning content blockers off does not make a difference.
Fwiw, it is finicky but the video played after a couple seconds (iOS 18.2 Safari).
Reddit says it is much better than Sora. Are you hosting the full version of Nunyuan? (Your video looks great.)
HunYuan is also open source / source available unless you have 100M DAU.
Then there's Lightricks LTX-1 model and Genmo's Mochi-1. Even the research CogVideoX is making progress.
Open source video AI is just getting started, but it's off to a strong start.
Our limited tests show that yes, Hunyuan is comparable or better than Sora on most prompts. Very promising model
Is it still better if you copy his whole prompt instead of half of it?
- [deleted]
I mean, the pelican's body is backwards...
Here's one of a penguin paragliding and it's surprisingly realistic https://x.com/Plinz/status/1868885955597549624
This is the first GenAI video to produce an "oh shit" reflex in me.
oh, shit!
As long as at least one option is exactly what you asked for throwing variations at you that don't conform to 100% of your prompt seems like it could be useful if it gives the model leeway to improve the output in other aspects.
Here is my version of pelican at bicycle made with hailuoai:
His little bike helmet is adorable
The AI safety team was really proud of that one.
It's funny having looked forward to Sora for a while and then seeing it be superseded so shortly after access to it is finally made public.
I am surprised that the top/right one still shows a cut and switch to a difference scene. I would assume that that's something that could be trivially filtered out of the training data, as those discontinuities don't seem to be useful for either these short 6sec video segments or for getting an understanding of the real world.
It looks much better than Sora but still kind of in uncanny valley
This is the worst it will ever be…
That is surprisingly good. We are at a point where it seems to be good enough for at least b-roll content replacing stock video clips.
Well yeah, if you look closely at the example videos on the site, one of them is not quite right either:
> Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. [...]
In the video, the bacon is unceremoniously slapped onto the pancakes, while the prompt sounds like it was intended to be a separate shot, with the bacon still in the pan? Or, alternatively, everything described in the prompt should have been on the table at the same time?
So, yet again: AI produces impressive results, but it rarely does exactly what you wanted it to do...
Technically speaking I'd say your expectation is definitely not laid out in the prompt, so anything goes. Believe me I've had such requirements from users and me as a mere human programmer am never quite sure what they actually want. So I take guesses just like the AI (because simply asking doesn't bring you very far, you must always show something) and take it from there. In other words, if AI works like me, I can pack my stuff already.
This tech is cute but the only viable outcomes are going to be porn and mass produced slop that'll be uninteresting before it's even created. Why even bother?
There will be both of those things in abundance.
But I'm also seeing some genuinely creative uses of generative video - stuff I could argue has got some genuine creative validity. I am loathe to dismiss an entire technique because it is mostly used to create garbage.
We'll have to figure out how to solve the slop problem - it was already an issues before AI so maybe this is just hastening the inevevitable.
The real problem is that trust in legacy media hit rock bottom right as we enter the era where we would need such trust the most. Soon enough, nothing you see on video can be believed, but (perhaps more importantly) nothing must be believed either.
Comments like this one are so predictable and incredulous. As if the current state of the art is the final form of this technology. This is just getting started. Big facepalm.
Have you already noticed the trend of image search results for porn containing inferior AI slop porn?
I have. It sucks. The world we're headed for maybe isn't one we actually wind up wanting in the end.
I like the idea of increasingly advanced video models as a technologist, but in practice, I'm noticing slop and I don't like it. Having grown up on porn, when video models are in my hands, the addiction steers me in the direction of only using the the technology to generate it. That's a slot machine so addictive akin to the leap from the dirty magazines of old to the world of internet porn I witnessed growing up. So, porn addiction on steroids. I found it eventually damaging enough to my mental health that I sold my 4090. I'm a lot better off now.
The nerd in me absolutely loves Generative models from a technology perspective, but just like the era of social media before it, it's a double edged sword.
It sounds like you have a personal problem that you’re trying to project onto the rest of society.
No, I'm providing a personal anecdote that some members of society that do have, or may develop, the same or similar problems are having both the (perceived) good and the bad aspects of those problems seriously magnified by this technology. This can have personal consequences, but also the consequences can affect the lives of others.
Hence, a certain % of the population will be negatively affected by this. I personally personally think it's worth raising awareness of.
I hope they're right. If the technology improves to such a degree that meaningful content can be produced then it could spell global disaster for a number of reasons.
Also I just don't want to live in a world where the things we watch just aren't real. I want to be able to trust what I see, and see the human-ness in it. I'm aware that these things can co-exist, but I'm also becoming increasingly aware that as long as this technology is available and in development, it will be used for deception.
That ship sailed shortly after the invention of photography. Photos were altered for political purposes during the US Civil War.
Now, we have entire TV shows shot on green screen in virtual sets. Replacing all the actors is just the next logical step.
- [deleted]
That's exactly what I mean, all of those methods take some human effort, there is a human involved in the process. Now we face a reality that it might take no human effort to do... well, anything. Which is terrifying to me.
I do believe that humans are restless, and even when there is no longer any point to create, and it is far easier to dictate, we still will, just because we are too driven not to.
you know that there is still offline artforms like concerts theaters opera installations etc so i wouldn see it that negative. and we have nearly 100years of music and film we can enjoy. so maybe video is a dying artform for human to act in but there is so much more.
The most predictable comment is yours, especially since you completely missed the point of the original comment which had nothing to do with the video quality.
AI generated slop content begets human generated slop comment.
So, even better porn?
Winning 2:1 in user preference versus sora turbo is impressive. It seems to have very similar limitations to sora. For example- the leg swapping in the ice skating video and the bee keeper picking up the jar is at a very unnatural acceleration (like it pops up). Though by my eye maybe slightly better emulating natural movement and physics in comparison to sora. The blog post has slightly more info:
>at resolutions up to 4K, and extended to minutes in length.
https://blog.google/technology/google-labs/video-image-gener...
It looks Sora is actually the worst performer in the benchmarks, with Kling being the best and others not far behind.
Anyways, I strongly suspect that the funny meme content that seems to be the practical uses case of these video generators won't be possible on either Veo or Sora, because of copyright, PC, containing famous people, or other 'safety' related reasons.
I’ve been using Kling a lot recently and been really impressed, especially by 1.5.
I was so excited to see Sora out - only to see it has most of the same problems. And Kling seems to do better in a lot of benchmarks.
I can’t quite make sense of it - what OpenAI were showing when they first launched Sora was so amazing. Was it cherry picked? Or was it using loads more compute than what they’ve release?
The SORA model available to the public is a smaller, distilled model called SORA Turbo. What was originally shown was a more capable model that was probably too slow to meet their UX requirements for the sora.com user interface.
> the jar is at a very unnatural acceleration (like it pops up).
It does pop up. Look at where his hand is relative to the jar when he grabs it vs when he stops lifting it. The hand and the jar are moving, but the jar is non-physically unattached to the grab.
Last time Google made a big Gemini announcement, OpenAI owned them by dropping the Sora preview shortly after.
This feels like a bit of a comeback as Veo 2 (subjectively) appears to be a step up from what Sora is currently able to achieve.
Some PM is literally sitting on this release waiting for their benchmarks to finish
- [deleted]
And it's going to be hard for OpenAI to do that again, now that Google's woken up.
I appreciate they posted the skateboarding video. Wildly unrealistic whenever he performs a trick - just morphing body parts.
Some of the videos look incredibly believable though.
our only hope for verifying truth in the future is that state officials give their speeches while doing kick flips and frontside 360s.
sadly it's likely that video gen models will master this ability faster than state officials
Remember when the iPhone came out and BlackBerry smugly advertised that their products were “tools not toys”?
I remember saying to someone at the time that I was pretty sure iPhone was going to get secure corporate email and device management faster than BlackBerry was going to get an approachable UI, decent camera, or app ecosystem.
Maybe they will do more in person talks, I guess. Back to the old times.
Until the AI can master the art of creating "footage of someone's phone if they were in the crowd of the speech in this other video", then we can't even trust that.
What officials actually say doesn't make a difference anymore. People do not get bamboozled because of lack of facts. People who get bamboozled are past facts.
Off topic from the video AI thread, but to elaborate on your point: people believe what they want, based on what they have been primed to believe from mass media. This is mainly the normal TV and paper news, filtered through institutions like government proclamations, schools, and now supercharged by social media. This is why the "narrative" exists, and news media does the consensus messaging of what you should believe (and why they hate X and other freer media sources).
By the time the politician says it, you've been soaking in it for weeks or months, if not longer. That just confirms the bias that has been implanted in you.
If anything I'd say the opposite. Look at the last US elections, a lot of the criticisms against the side that lost were things people "thought" and "felt" they were for/against, without them actually coming out and saying anything of the like. It was people criticising them for stuff that wasn't actually real on X, traditional TV, and the like that made voters "feel" like that stuff is real.
And X is really egregious, where the owner shitposts frequently and often things of dubious factuality.
You say offtopic, but I think AI video generation is the most on-topic place to bring up the subject of falsified politically charged statements. Companies showcasing these things aren't exactly lining up to include "moral" as one of the bullet point adjectives in a limitations section.
> and why they hate X and other freer media sources
I left X precisely because it was flooded with Russian propaganda/misinformation.
Such as?
Nonsense like:
- People were 'forced' into vaccinations
- Covid 19 was a testing ground for the next global pandemic so that "they" can control us
- Climate change is a hoax/Renewables are our doom
- Everything our government does is to create a totalitarian state next.
- Putin is actually the victim, it is all NATO fault and their imperialism
Why is it impossible for these opinions to be homegrown? Would people be a hivemind without Putin?
It's not impossible, but of course they're not homegrown.
Putin's apologists always demand he be given the benefit of the doubt. That's akin to convicting a spy beyond a reasonable doubt. That standard is meant to favor false negatives over false positives when incarcerating people. Better to let a thousand criminals go free than to imprison an innocent person.
If we used that for spies, we'd have 1000 of them running around for each convicted one. Not to mention that they have a million ways to avoid detection. They rely on their training, on the resources of the state, and on infiltrators who sabotage detection efforts. The actual ratio would be much higher.
In the case of opinion manipulation, the balance is even more pernicious. That's because the West decided a couple decades ago to use the "it's just a flesh wound" approach to foreign interference.
The problem is that we're not just protecting gullible voters. We're also defending the reputation of democracy. Either democracy works, or it doesn't. If it doesn't, then we're philosophically no better than Russia and China.
But if it was possible to control the outcome of elections by online manipulation alone, that would imply that democracy doesn't really work. Therefore online manipulation "can't work." Officially, it might sway opinion by a few points, but a majority of voters must definitionally be right. If manipulation makes little difference, then there's not much reason to fight it (or too openly anyways.)
Paradoxically, when it comes to detecting Russian voter manipulation, the West and Putin are strange bedfellows. Nothing to see here, move along.
That's an interesting question.
My sense is that the "hivemind" is, in a symbiotic way, both homegrown and significantly foreign-influnced.
More specifically: the core sentiment of the hivemind (basically: anti-war/anti-interventionist mixed with a broader distrust of anything the perceived "establishment" supports) is certainly indigenous -- and it is very important to not overlook this fact.
But many of its memes, and its various nuggets of disinformation do seem to be foreign imports. This isn't just an insinuation; sometimes the lineage can actually be traced word-for-word with statements originating from foreign sources (for example, "8 years of shelling the Donbas").
The memes don't create the sentiment. But they do seem to reinforce it, and provide it with a certain muscle and kick. While all the while maintaining the impression that it's all entirely homegrown.
And the farther one goes down the "multipolar" rabbit hole, the more often one encounters not just topical memes, but signature phrases lifted directly from known statements by Putin and Lavrov themselves. E.g. that Ukraine urgently needs to "denazify". The more hardcore types even have no qualms about using that precious phrase "Special Military Operation", with a touch of pride in their voice.
It's really genuinely weird, what's happening. What people don't realize is that none of this is happening by accident. It's a very specific craft that the Russian security services (in particular) have nurtured and developed, literally across generations, to create language that pushes people's buttons in this way.
The Western agencies and institutions have their own way of propaganda of course, but usually it's far more bland and boring (e.g. as to how NATO "supports fosters broader European integration" and all that).
Would we have the same kind of hivemind without Putin? There's always some kind of a hivemind -- but as applies to Eastern Europe, it does seem that the general climate of discourse was quite different before his ascendancy. And that it certainly took a very sharp, weird bend in the road after the start of Special Military Operation.
> People were 'forced' into vaccinations
"Take this novel vaccine primarily for someone else's benefit or lose your job: it's your choice, you totally aren't being 'forced.'"
For people like nurses, yes. That just a normal job requirement.
How about remote software engineers who often equally befell this issue?
What are you talking about? News media LOVE twitter/X, it is where they get all their stories from and journalists are notoriously addicted to it, to their detriment.
This was my favorite of all of the videos. There's no uncanny valley; it's openly absurd, and I watched it 4-5 times with increasing enjoyment.
Cracks in the system are often places where artists find the new and interesting. The leg swapping of the ice skater is mesmerizing in its own way. It would be useful to be able to direct the models in those directions.
It is great so see a limitations section. What would be even more honest is a very large list of videos generated without any cherry picking to judge the expected quality for the average user. Anyway, the lack of more videos suggests that there might be something wrong somewhere.
The honey, Peruvian women, swimming dog, bee keeper, DJ etc. are stunning. They’re short but I can barely find any artifacts.
The prompt for the honey video mentions ending with a shot of an orange. The orange just...isn't there, though?
Just pretend it's a movie about a shape shifter alien and it's just trying it's best at ice skating, art is subjective like that doesn't it? I bet Salvador Dali would have found those morphing body parts highly amusing.
I don't know why they say the model understands physics when it makes mistakes like that still.
Imho is stunning, yet what is happening there is super dangerous.
These videos will and may be too realistic.
Our society is not prepared for this kind of reality "bending" media. These hyperrealistic videos will be the reason for hate and murder. Evil actors will use it to influence elections on a global scale. Create cults around virtual characters. Deny the rules of physics and human reason. And yet, there is no way for a person to detect instantly that he is watching a generated video. Maybe now, but in 1 year, it will be indistinguishable from a real recorded video
Are Apple and other phone/camera makers working on ways to "sign" a video to say it's an unedited video from a camera? Does this exist now? Is it possible?
I'm thinking of simple cryptographic signing of a file, rather than embedding watermarks into the content, but that's another option.
I don't think it will solve the fake video onslaught, but it could help.
Leica M11 signs each photo. "Content Authority Initiative" https://leica-camera.com/en-US/news/partnership-greater-trus...
Cute hack showing that its kinda useless unless the user-facing UX does a better job of actually knowing whether the certificate represents the manufacturer of the sensor (dude just uses a self signed cert with "Leica Camera AG" as the name. Clearly cryptography literacy is lagging behind... https://hackaday.com/2023/11/30/falsified-photos-fooling-ado...
Even if the certs were properly cryptographically vetted, you could just point the camera at a high-enough resolution screen displaying false content.
I think this will be a thing one day, where photos are digitally watermarked by the camera sensor in a non-repudiable manner.
This is a losing battle. You can always just record an AI video with your camera. Done, now you have a real video.
This is what I think every time I hear about AI watermarking. If anything, convincing people that AI watermarking is a real, reliable thing is just gonna cause more harm because bad actors that want to convince people something fake is real would obviously do the simple subversion tactics. Then you have a bunch of people seeing it passes the watermark check, and therefore is real.
I agree is probably a losing battle, but maybe worth fighting. If the metadata is also encrypted, you can also verify the time and place it was recorded. Of course, this requires closed/locked hardware and still possible to spoof. Not ideal, but some assurances are better than a future of can't trust anything.
Potential solutions:
1. AI video watermarks that carry over even if a video of the AI video is taken
2. Cameras that can see AI video watermarks and put an AI video watermark on the videos of any AI videos they take
Nikon has had digital signature ability in some of their flagship cameras since at least 2007, and maybe before then. The feature is used by law enforcement when documenting evidence. I assume other brands also have this available for the same reasons.
We've had realistic sci-fi and alternate history movies for a very long time.
Which take millions of dollars and huge teams to make. These take one bored person, a sentence, and a few minutes to go from idea to posting on social media. That difference is the entire concern.
If “evil actors” could really “manipulate elections” with fake video, would they really let a few million dollars stop them?
That’s not that much money.
Who says they don't? Interference is being "democratized".
Any examples of hoax videos that you can name? I’m having a hard time placing any.
I really find the threat to be overhyped.
You can find citations in https://en.wikipedia.org/wiki/Russian_interference_in_the_20...
So, nothing. Got it.
- [deleted]
[flagged]
We already have hate and murder, evil actors influencing elections on a global scale, denial of physics and reason, and cults of personality. We also already have the ability to create realistic videos - not that it matters because for many people the bar of credulity isn't realism but simply confirming their priors. We already live in a world where TikTok memes and Facebook are the primary sources around which the masses base their reality, and that shit doesn't even take effort.
The only thing this changes is not needing to pay human beings for work.
Instead of calling for regulations, the big tech companies should run big campaigns educating the public, especially boomers, that they no longer can trust images, videos, and audio on the Internet. Put paid articles and ads about this in local newspapers around the world so even the least online people gets educated about this.
Do we really want a world where we can't trust anything we see, hear, or read? Where people need to be educated to not trust their senses, the things we use to interpret reality and the world around us.
I feel this kind of hypervigilance will be mentally exhausting, and not being able to trust your primary senses will have untold psychological effects
You can trust what you see and hear around you. You might be able to trust information from a party you trust. You certainly shouldn't trust digital information from unknown entities with unknown agendas.
We're already in a world where "fake news" and "alt-facts" influence our daily lives and political outcomes.
What I see and hear around me is a miniscule fraction of the outside world. To have a shared understanding of reality, of what is happening in my town, my city, my state, my country, my continent, the world, requires much more than what is available in your immediate environment.
In the grand scheme of understanding the world at large, our immediate senses are not particularly valuable. So we _have_ to rely on other streams of information. And the trend is towards more of those streams being digital.
The existence of "fake news" and "alt facts", doesn't mean we should accept a further and dramatic worsening of our ability to have a shared reality. To accept that as an inevitability is defeatist and a kind of learned helplessness.
Have you seen the Adam Curtis documentary "Hypernormalisation"? It deals with some similar themes, but on a much smaller scale (at least it is smaller in the context of current and near future tech)
One absolutely should not trust what you see and hear around you. One cannot trust the opinions of others, one should not trust faith, one can only reliable develop critical analysis and employ secondary considerations to the best of their ability, and then be cautious at every step. Trust and faith are relics of a time now gone, and it is time to realize it, to grow up and see the reality.
- [deleted]
I wonder if we’ll eventually see people abandoning the digital reality in favor of real-life, physical interactions wherever possible.
I recently had an issue with my mobile service provider and I was insanely glad when I could interact with a friendly and competent shop clerk (I know I got lucky there) in a brick&mortar instead of a chatbot stuck in a loop.
Yeah I think it's a real possibility that people will disconnect from the digital world. Though I fear the human touch will become a luxury only afforded by the wealthy. If it becomes a point of distinction, people will charge extra for it. While the rest are pleaing with a brainless chat bots
That world is already here. Nothing you can do about it, might as well democratize access to the technology.
No it's not. We are not at the stage where reality is completely indistinguishable from fiction. We are still in the uncanny valley. Nothing is inevitable
Do you think China will stop here?
This is like trying to hide Photoshop from the public. Realistic AI generated videos and adversary-sponsored mass disinformation campaigns are 100% inevitable, even if the US labs stopped working on it today.
So, you might as well open access to it to blunt the effect, and make sure our own labs don't fall behind on the global stage.
That is reality, that is nature. The natural world is filled with camouflaged animals and plants that prey on one another via their camouflage. This is evolution, and those unable to discriminate reality from fiction will be the causalities, as they always have since the dawn of life.
The naturalistic fallacy is weak at best, but this is one of the weirdest deployments of it I've encountered. It's not evolution, it's nothing like it.
If it's kill or be killed, we should do away with medicine right? Only the strong survive. Why are we saving the weak? Sorry but this argument is beyond silly
Deception is a key part of life, and the inability to discriminate fact from fiction is absolutely a key metric of success. Who said "kill or be killed"? Not I. It is survival or not, flourish or not, succeed or not.
But why must the deception take place? Evolution is natural, The development of AI generated videos takes teams of people, years of effort and millions of pounds. Why should those that are more easily deceived be culled? Do you believe that the future of technology is weeding out the weak? Do you believe the future of humanity is the existence of only those that can use the technologies we develop? You might very well find yourself in a position, a long time from now, where you are easily deceived by newer technologies that you are not familiar with.
Deception takes places because deception takes place, because it can. I'm not the gatekeeper of it, I'm just acknowledging it and some of the secondary effects that will occur due to these inevitable technologies. I don't believe the future is anything other than a hope. That hope will require those future individuals to be very discriminating of their surroundings to survive, all surroundings includes all the society information and socialization, because that is filled with misinformation too. All that filled with misinformation right now, and it will just get more sophisticated. That's what I'm saying.
I can't disagree with you there. It's a shame I can't. The future is a scary place to be.
Maybe people should have some of those psychological effects.
Maybe operation Timber Sycamore, that bears fruit in Syria right now wouldn't happen, if the population was less trusting of the shit they see on tv.
We have evolved to trust our senses as roughly representative of reality. I'm not convinced we are able to adapt to that kind of rapid shift.
I have not heard of Timber Sycamore until this comment. A quick look at Wikipedia I'm struggling to see the relevance here. Can you elaborate?
Sure. No amount of perception will let you see the financing of Al-Quida or Al-Nusra soldiers. You can't perceive your way out of your blindness. You need to reflect.
Of all the different sci-fi futures I’ve encountered, I never thought we’d end up in the Phillip k Dick one.
It will also reinforce whatever bias we have already. When facing ambiguous or unknowable situations our first reaction is to go with "common sense" or "trusting our gut".
"Uh, Is that video of [insert your least favourite politician here] taking a bribe real or not? Well, I'm going to trust my instincts here..."
What would motivate "big tech" to warn people about their own products, if not regulations?
Don't forget text. You can't trust text either.
And no big tech company would run the ads you're suggesting, because they only make money when people use the systems that deliver the untrustworthy content.
The same things could be said when everyone could print their own newspapers or books. How would people distinguish between fake and real news?
I think we will need the same healthy media diet.
There wasn't even a healthy media diet before generative AI given the amount of 'fake news' in 2016 and 2020.
Photoshop has been a thing for over 30 years.
Isn't the whole point of OP that we're currently watching the barrier to generating realistic assets go from "spend months grinding Photoshop tutorials" to "type what you want into this box and wait a few minutes"?
I still don't really know why we're doing this. What is the upside? Democratising Hollywood? At the expense of... enormous catastrophic disinformation and media manipulation.
The society voted with their money. Google refrained from launching their early chatbots and image generation tools due to perceived risks of unsafe and misleading content being generated, and got beaten to the punch in the market. Of course now they'll launch early and often, the market has spoken.
We have constructed a society where market forces feel inevitable, but it doesn't have to be that way.
Of course; but this is the current society, and attempts to reform it, e.g. communism, failed abjectly, so by evolution pressure, the capitalist society dominated by market forces is the best that we have
Right, but there are plenty of middle grounds between true communism and just letting markets freewheel.
And places with these systems are those that achieved the best quality of life and peace.
There's no evidence that this fearmongering over safety is actually correct. The worst thing you can do is pummel an emerging technology into the grave because of misplaced fear.
Just take a look at how many everyday things were "incredibly dangerous for society" - https://pessimistsarchive.org/
Spend any amount of time on mainstream social media and you'll see AI-generated media being shared credulously. It's not a hypothetical risk, it's already happening.
Even if you're not convinced that it's dangerous, at the very least it's incredibly annoying.
If someone dumped a trailer full of trash in your garden, you're not going to say "oh well, market forces compelled them to do that".
Eh, growing pains.
FWIW it feels like Google should dominate text/image -> video since they have access to Youtube unfettered. Excited to see what the reception is here.
Everyone has access to YouTube. It’s safe to assume that Sora was trained on it as well.
All you can eat? Surely they charge a lot for that, at least. And how would you even find all the videos?
Nobody in this space gives a fuck about anyone or anything further upstream than the file sitting in their ingestion queue. If they can see it, they 'own' it.
Who says they've talked to Google about it at all?
I can't speak to OpenAI but ByteDance isn't waiting for permission.
ByteDance has their own unlimited supply of videos...
That hasn't stopped them.
They already did it, and I’m guessing they were using some of the various YouTube down loaders Google has been going after.
Does everyone have "legal" access to YouTube.
In theory that should matter to something like Open(Closed)Ai. But who knows.
I mean, I have trained myself on Youtube.
Why can't a silicon being train itself on Youtube as well?
Because silicon is a robot. A camcorder can't catch a flick with me in the theater even if I dress it up like a muppet.
Not with that attitude.
A corporation "is a person" with all the rights that come along with that - free speech etc.
What if I'm part-carbon, part-silicon?
Like, a blind person with vision restored by silicon eyes?
Do I not have rights to run whatever firmware I want on those eyes, because it's part of my body?
Okay, so what if that firmware could hypothetically save and train AI models?
presumably, it should be illegal to record a movie with with an inbuilt camera. capturing the data in such a way that an identical copy can be automatically be reproduced brakes the social contract around the way those works are shared. the majority of media is produced by large companies that are ultimately not harmed by such activities, but individual artisans that create things shouldn't be subjected to this.
we can take this a step further: if your augmented eyes and ears can record people in a conversation, should you be allowed to produce lifelike replicas of people's appearance and voice? a person can definitely imagine someone saying/doing anything. a talented person with enough effort could even make a 3D model and do a voice impression on their own. it should be obvious that having a conversation with a stranger doesn't give them permission to clone your every detail, and shouldn't that also be true for your creations?
The difference is that you didn't need to scrape millions of videos from YouTube with residential proxy network scrapers to train yourself.
Only because I'm significantly more intelligent than ChatGPT, so I can achieve its level of competency on a lot of things with a thousand videos instead of a million videos.
If it just reduces to an issue of data efficiency, AI research will eventually get there though.
Humans have rights, machines don't.
When a company trains an AI model on something, and then that company sells access to the ai model, the company, not the ai model, is the being violating copyright. If Jimmy makes an android in his garage and gives it free will, then it trains itself on youtube, i doubt anyone would have an issue.
If OpenAI training on youtube videos violates copyright then so does Google training on them.
In what possible way is that true? Not that I like it, but google has its creators sign away the rights to their material for uses like this. Nobody signs a contract with openai when they make their youtube videos.
When you sign away full rights to one company, that one company can give rights to another company (for money or not).
They could also just acquire that other company.
From the creator's standpoint, signing away rights to one company is as good as gone.
Did openai make a deal with google to train on youtube?
They also had a good chunk of the web text indexed, millions of people's email sent every day, Google scholar papers, the massive Google books that digitized most ever published books and even discovered transformers.
Superficially impressive but what is the actual use case of the present state of the art? It makes 10-second demos, fine. But can a producer get a second shot of the same scene and the same characters, with visual continuity? Or a third, etc? In other words, can it be used to create a coherent movie --even a 60-second commercial -- with multiple shots having continuity of faces, backgrounds, and lighting?
This quote suggests not: "maintaining complete consistency throughout complex scenes or those with complex motion, remains a challenge."
B-roll for YouTube videos.
This is still early. It's only going to get better.
Fun. Fun! I find it a lot of fun to have a computer spit out pixels based on silly ideas I have. It is very amusing to me
You blend them and extend the videos and then you connect enough for a 2 min short
That's what I think the tech at this stage cannot do. You make two clips from the same prompt with a minor change, e.g.
> a thief threatens a man with a gun, demanding his money, then fires the gun (etc add details)
> the thief runs away, while his victim slowly collapses on the sidewalk (etc same details)
Would you get the same characters, wearing the identical clothing, the same lighting and identical background details? You need all these elements to be the same, that's what filmmakers call "continuity". I doubt that Veo or any of the generators would actually produce continuity.
Dank memes.
- [deleted]
> "what is the actual use case of the art?"
Not much. Low quality over-saturated advertising? Short films made by untalented lazy filmmakers?
When text prompts are the only source, creativity is absent. No craft, no art. Audiences won't gravitate towards fake crap that oozes out of AI vending machines, unrefined, artistically uncontrolled.
Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?
There's one exception. Video-to-video and image-to-video, where your own original artwork, photos, drawings and videos are the source of the generated output. Even then, it's like outsourcing production to an unpredictable third party. Good luck getting lighting and details exactly right.
I see the role of this AI gen stuff as background filler, such as populating set details or distant environments via green screen.
> Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?
That's an obvious yes from me. I liked it, and not only that, but I can reasonably assume it will be consistently good in the future, something lot's of places can't do.
So you'd forgive the deception (there's no "chef" only a button pusher) and revisit the restaurant even though you could easily generate the food yourself.
You don't care about the absence of a lifetime of hard work behind your meal, or the efforts of small business owners inspired by good food and passion in the kitchen. All that matters to you is that your taste buds were satisfied?
Interesting. Perhaps we can divide the world into those who'd happily dine at "Skynet Gourmet", and those who'd seek a real restaurant.
I think it's more complex than that. At least to me food in particular is nothing special, I truly eat just for substance, so if it doesn't taste bad, I'm happy.
I still believe that there's place for creative work, I just don't see why something created by something other than a human is inherently bad.
short video creation tools, its a huge market
Misinformation
This looks great, but I'm confused by this part:
> Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models' durations are 5s. We show the full video duration to raters.
Could the positive result for Veo 2 mean the raters like longer videos? Why not trim Veo 2's output to 5s for a better controlled test?
I'm not surprised this isn't open to the public by Google yet, there's a huge amount of volunteer red-teaming to be done by the public on other services like hailuoai.video yet.
P.S. The skate tricks in the final video are delightfully insane.
> I'm not surprised this isn't open to the public by Google yet,
Closed models aren't going to matter in the long run. Hunyuan and LTX both run on consumer hardware and produce videos similar in quality to Sora Turbo, yet you can train them and prompt them on anything. They fit into the open source ecosystem which makes building plugins and controls super easy.
Video is going to play out in a way that resembles images. Stable Diffusion and Flux like players will win. There might be room for one or two Midjourney-type players, but by and large the most activity happens in the open ecosystem.
> Hunyuan and LTX both run on consumer hardware
Are there other versions than the official?
> An NVIDIA GPU with CUDA support is required. > Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
https://github.com/Tencent/HunyuanVideo
> I am getting CUDA out of memory on an Nvidia L4 with 24 GB of VRAM, even after using the bfloat16 optimization.
Yes you can, with some limitations