I'm always curious with the examples in theses announcements, how close is the training data to the sample prompts? And how much of the prompt is important or ends up ignored in the result?
The prompt for the figure running through glowing threads seems to contain a lot of detail that doesn't show up in the video.
In the first example (close-up of DJ), the last line about her captivating presence and the power of music I guess should give the video a "vibe" (compared to prescriptively describing the video). I wonder how the result changes if you leave it out?
Cynically I think that it's a leading statement there for the reader rather than the model. Like now that you mention it, her presence _is_ captivating! Wow!
I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:
https://static.simonwillison.net/static/2024/pelicans-on-bic...
Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.
All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/
As long as at least one option is exactly what you asked for throwing variations at you that don't conform to 100% of your prompt seems like it could be useful if it gives the model leeway to improve the output in other aspects.
It looks much better than Sora but still kind of in uncanny valley
His little bike helmet is adorable
The AI safety team was really proud of that one.
Winning 2:1 in user preference versus sora turbo is impressive. It seems to have very similar limitations to sora. For example- the leg swapping in the ice skating video and the bee keeper picking up the jar is at a very unnatural acceleration (like it pops up). Though by my eye maybe slightly better emulating natural movement and physics in comparison to sora. The blog post has slightly more info:
>at resolutions up to 4K, and extended to minutes in length.
https://blog.google/technology/google-labs/video-image-gener...
It looks Sora is actually the worst performer in the benchmarks, with Kling being the best and others not far behind.
Anyways, I strongly suspect that the funny meme content that seems to be the practical uses case of these video generators won't be possible on either Veo or Sora, because of copyright, PC, containing famous people, or other 'safety' related reasons.
I’ve been using Kling a lot recently and been really impressed, especially by 1.5.
I was so excited to see Sora out - only to see it has most of the same problems. And Kling seems to do better in a lot of benchmarks.
I can’t quite make sense of it - what OpenAI were showing when they first launched Sora was so amazing. Was it cherry picked? Or was it using loads more compute than what they’ve release?
The SORA model available to the public is a smaller, distilled model called SORA Turbo. What was originally shown was a more capable model that was probably too slow to meet their UX requirements for the sora.com user interface.
> the jar is at a very unnatural acceleration (like it pops up).
It does pop up. Look at where his hand is relative to the jar when he grabs it vs when he stops lifting it. The hand and the jar are moving, but the jar is non-physically unattached to the grab.
I appreciate they posted the skateboarding video. Wildly unrealistic whenever he performs a trick - just morphing body parts.
Some of the videos look incredibly believable though.
our only hope for verifying truth in the future is that state officials give their speeches while doing kick flips and frontside 360s.
sadly it's likely that video gen models will master this ability faster than state officials
Remember when the iPhone came out and BlackBerry smugly advertised that their products were “tools not toys”?
I remember saying to someone at the time that I was pretty sure iPhone was going to get secure corporate email and device management faster than BlackBerry was going to get an approachable UI, decent camera, or app ecosystem.
Maybe they will do more in person talks, I guess. Back to the old times.
This was my favorite of all of the videos. There's no uncanny valley; it's openly absurd, and I watched it 4-5 times with increasing enjoyment.
It is great so see a limitations section. What would be even more honest is a very large list of videos generated without any cherry picking to judge the expected quality for the average user. Anyway, the lack of more videos suggests that there might be something wrong somewhere.
Cracks in the system are often places where artists find the new and interesting. The leg swapping of the ice skater is mesmerizing in its own way. It would be useful to be able to direct the models in those directions.
The honey, Peruvian women, swimming dog, bee keeper, DJ etc. are stunning. They’re short but I can barely find any artifacts.
The prompt for the honey video mentions ending with a shot of an orange. The orange just...isn't there, though?
Just pretend it's a movie about a shape shifter alien and it's just trying it's best at ice skating, art is subjective like that doesn't it? I bet Salvador Dali would have found those morphing body parts highly amusing.
I don't know why they say the model understands physics when it makes mistakes like that still.
Last time Google made a big Gemini announcement, OpenAI owned them by dropping the Sora preview shortly after.
This feels like a bit of a comeback as Veo 2 (subjectively) appears to be a step up from what Sora is currently able to achieve.
Some PM is literally sitting on this release waiting for their benchmarks to finish
My friend working in a TV station is already using these tools to generate videos for public advertising programs. It has been a blast.
FWIW it feels like Google should dominate text/image -> video since they have access to Youtube unfettered. Excited to see what the reception is here.
Everyone has access to YouTube. It’s safe to assume that Sora was trained on it as well.
All you can eat? Surely they charge a lot for that, at least. And how would you even find all the videos?
Who says they've talked to Google about it at all?
I can't speak to OpenAI but ByteDance isn't waiting for permission.
They already did it, and I’m guessing they were using some of the various YouTube down loaders Google has been going after.
Does everyone have "legal" access to YouTube.
In theory that should matter to something like Open(Closed)Ai. But who knows.
I mean, I have trained myself on Youtube.
Why can't a silicon being train itself on Youtube as well?
When a company trains an AI model on something, and then that company sells access to the ai model, the company, not the ai model, is the being violating copyright. If Jimmy makes an android in his garage and gives it free will, then it trains itself on youtube, i doubt anyone would have an issue.
Because silicon is a robot. A camcorder can't catch a flick with me in the theater even if I dress it up like a muppet.
They also had a good chunk of the web text indexed, millions of people's email sent every day, Google scholar papers, the massive Google books that digitized most ever published books and even discovered transformers.
This looks great, but I'm confused by this part:
> Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models' durations are 5s. We show the full video duration to raters.
Could the positive result for Veo 2 mean the raters like longer videos? Why not trim Veo 2's output to 5s for a better controlled test?
I'm not surprised this isn't open to the public by Google yet, there's a huge amount of volunteer red-teaming to be done by the public on other services like hailuoai.video yet.
P.S. The skate tricks in the final video are delightfully insane.
> I'm not surprised this isn't open to the public by Google yet,
Closed models aren't going to matter in the long run. Hunyuan and LTX both run on consumer hardware and produce videos similar in quality to Sora Turbo, yet you can train them and prompt them on anything. They fit into the open source ecosystem which makes building plugins and controls super easy.
Video is going to play out in a way that resembles images. Stable Diffusion and Flux like players will win. There might be room for one or two Midjourney-type players, but by and large the most activity happens in the open ecosystem.
> Hunyuan and LTX both run on consumer hardware
Are there other versions than the official?
> An NVIDIA GPU with CUDA support is required. > Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
https://github.com/Tencent/HunyuanVideo
> I am getting CUDA out of memory on an Nvidia L4 with 24 GB of VRAM, even after using the bfloat16 optimization.
Yes. Lots of folks on reddit running it on 24gb cards.
Yes you can, with some limitations
I wonder if the more decisive aspect is the data, not the model. Will closed data win over open data?
With the YouTube corpus at their disposal, I don't see how anyone can beat Google for AI video generation.
Stable Diffusion and Flux did not win though. Midjourney and chatGPT won.
“Won” what exactly? I have no issues running stable diffusion locally.
Since Llama3.3 came out it is my first stop for coding questions, and I’m only using closed models when llama3.3 has trouble.
I think it’s fairly clear that between open weights and LLMs plateauing, the game will be who can build what on top of largely equivalent base models.
The quality for SD is no where near the clear leaders.
> The quality for SD is no where near the clear leaders.
It absolutely is. Moreover, the tools built on top of SD (and now Flux) are superior to any commercial vertical.
The second-place companies and research labs will continue to release their models as open source, which will cause further atrophy to the value of building a foundation model. Value will accrue in the product, as has always been the case.
SD will also generate what I tell it, unlike the corporate models that have all kinds of “safeguards”.
OpenAI is like the super luxurious yacht all pretty and shiny, while Google's AI department is the humongous nuclear submarine at least 5 times bigger than the yacht with a relatively cool conning tower, but not that spectacular to look at.
Like the tanker which is still steering to fully align with the course people expect it to be, which they don't recognize that it will soon be there and be capable of rolling over everything which comes in its way.
If OpenAi claims they're close to having AGI, Google most likely already has it and is doing its shenanigans with the US government under the radar. While Microsoft are playing the cool guys and Amazon is still trying to get their act together.
All it took was a good old competition that has potential to steal user base from core Google search product. Nice to be back to competition era of web tech.
google definitely does not have AGI hhaaha
ex-googler confirms :/
Yeah pretty bad example from parent but the point stands I think... I mostly just assume that for everything ChatGPT hypes/teases Google probably has something equivalent internally that they just aren't showing off to the public.
I know that Google's internal ChatGPT alternative was significally worse than ChatGPT(confirmed both in news and by Googlers) around a year back. So you might say they might overtake OpenAI because of more resources, but they aren't significantly ahead of OpenAI.
Or, using Occams Razor; Sundar is a shit CEO and is playing catchup with a company largely fueled by innovations created at Google but never brought to market because it would eat into ads revenue.
That, or they have a secret super human intelligence under wraps at the pentagon.
just to remind everyone that state of the art was Will Smith Eating Spaghetti in April of 2023
https://arstechnica.com/information-technology/2023/03/yes-v...
We're not even done with 2024.
Just imagine what's waiting for us in 2025.
This might be a dumb question to ask, but what exactly is this useful for? B-Roll for YouTube videos? I'm not sure why so much effort is being put into something like this when the applications are so limited.
If you want to train a model to have a general understanding of the physical world, one way is to show it videos and ask it to predict what comes next, and then evaluate it on how close it was to what actually came next.
To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.
Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.
It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.
All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.
This is not bad, or make it inherently dumb, a major component of human intelligence is built on similar strategies. I couldn’t tell what grammatical rules are broken in text or what physical rules in a photograph but can tell it is wrong using the same methods .
Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .
This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.
Reasoning is essential for higher order functions and transformers is not the path for that
That's like saying that your brain doesn't understand anything, it just analyzes the visual data coming in via your eyes and predicts the next step of reality
Back when computers took up a whole room, you'd also have asked: "but what exactly is this useful for? B-Roll some simple calculations that anybody can do with a piece of paper and a pen."?
Think 5-10 years into the future, this is a stepping stone
That's comparing apples to oranges though isn't it? Generating videos is the output of the technology, not the tech itself. It would be like someone asking "this computer that takes up a whole room printed out ascii art, what is this useful for?"
this is kind of an unfair comparison. Whats the endpoint of generating AI videos? What can this do that is useful, contributes something to society, has artistic value, etc etc. We can make educational videos with a script but its also pretty easy for motivated parties to do that already, and its getting easier as cameras get better and smaller. I think asking "whats the point of this" is at least fair.
They’re a way firo
They were calculating missile trajectories, everybody understood what they were useful for.
TV commercials / youtube ads. You don't need a video team anymore to make an ad.
We're preparing to use video generation (specifically image+text => video so we can also include an initial screenshot of the current game state for style control) for generating in-game cutscenes at our video game studio. Specifically, we're generating them at play-time in a sandbox-like game where the game plays differently each time, and therefore we don't want to prerecord any cutscenes.
Okay, so is the aim to run this locally on a client's computer or served from a cloud? How does the math work out where it's not just easier at that point to render it in game?
in it's current state, it's already useful for b-roll, video backgrounds for websites, and any other sort of "generic" application where the point of the shot is just to establish mood and fill time.
but more than anything it's useful as a stepping stone to more full-featured video generation that can maintain characters and story across multiple scenes. it seems clear that at some point tools like this will be able to generate full videos, not just shots.
Are they that limited? It's a machine that can make videos from user input: it can ostensibly be used wherever you need video, including for creative, technical and professional applications.
Now, it may not be the best fit for those yet due to its limitations, but you've gotta walk before you can run: compare Stable Diffusion 1.x to FLUX.1 with ControlNet to see where quality and controllability could head in the future.
I have observed some musicians creating their own music videos with tools like this.
This silly music video was put together by one person in about 10 hours.
https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1...
Another more serious music video also made entirely by one person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't know how long it took though.
Because it's pretty cool to be able to imagine any kind of scene in your head, put it into words, then see it be made into a video file that you can actually see and share and refine.
Use your imagination.
this is perfect for the landing page of any website I make
my templates all are waiting for stock videos to be added looping in the background
you have no idea how cool I am with the lack of copyright protections afforded to these videos I will generate, I'm making my money other ways
Streaming services where there is no end to new content that matches your viewing patterns.
this sounds awful haha
You really think making videos with computers is not useful? Is this a joke?
The example of a "Renaissance palace chamber" is very historically inaccurate by around a century or two, the generated video looks a lot like a pastiche of Versailles from the Age en Enlightenment instead. I guess that's what you get by training on the internet.
I watched that 10 times because the details are bonkers and I find amazing that she and the candle is visible in the mirror! Speaking of inaccuracy though are these pencils/textmarkers/pens on the desk? ;)
It's interesting they host these videos on YouTube, cause it signals they're fine with AI generated content. I wonder if Google forgets that the creators themselves are what makes YouTube interesting for viewers.
What makes you think that viewers wouldn't be watching AI generated content? Considering the possibilities of fake videos, I'm sure that it can be very engaging. And the costs are zero.
Website keeps crashing and reloading on Brave iOS.
as OpenAI released a feature that hit Google where it hurts, Google released Veo 2 to utterly destroy OpenAI's Sora.
Google won.
Is it just me or do all these models generate everything in a weird pseudo-slow motion framerate?
Anybody does realize this is very sad?
Namely, so few neurons to get picture in our heads.
I guess, end of the world scenarios may lead us to create that super intelligence with a gigantic ultra performant artificial "brain".
It’s telling that safety and responsibility gets so much fluff words, technical details are fairly extensive, but no mention of the training data? It’s clearly relevant for both performance and ethical discussions.
Maybe it’s just me who couldn’t find it, (the website barely works at all on FF iOS)..
Most people called that the second one of the companies stop caring about safety, others will stop as well. People hate being told what they’re not supposed to do. And not companies will go forward with abandoning their responsible use policies.
Judging by how they've been trying to ram AI into YouTube creators workflows I suppose it's only a matter of time before they try to automate the entire pipeline from idea, to execution, to "engaging" with viewers. It won't be good at doing any of that but when did that ever stop them.
They basically already have this: https://workspace.google.com/products/vids/
Last week I started seeing a banner in Google Docs along the lines of "Create a video based on the content of this doc!" with a call to action that brought me to Google Vids.
Hey, it's AI and so it is good, right?
Seriously, it sounds like something kids can have fun with, or bored deskworkers. But a serious use case, at the current state of the art? I doubt it.
And then suddenly this is not something that fascinates people anymore… in 10 years as non-synthetic becomes the new bio or artisan or whatever you like.
Humanity has its ways of objecting accelerationism.
> Humanity has its ways of objecting accelerationism.
Actually, typically human objection only slows it down and often it becomes a fringe movement, while the masses continue to consume the lowest common denominator. Take the revival of the flip phone, typewriter, etc. Sadly, technology marches on and life gets worse.
Does life get worse for the majority of people or do the fruits of new technology rarely address any individual person’s progress toward senescence? (The latter feels like tech moves forward but life gets worse.)
Of course, it depends on how you define "worse". If you use life expectancy, infant mortality, and disease, then life has in the past gotten better (although the technology of the past 20 years has RARELY contributed to any of that).
If you use 'proximity to wild nature', 'clean air', 'more space', then life has gotten worse.
But people don't choose between these two. They choose between alternatives that give them analgesics in an already corrupt society creating a series of descending local maximae.
Are you kidding?
TikTok is one of the easiest platforms to create for, and look at how much human attention it has sucked up.
The attention/dopamine magnet is accelerating its transformation into a gravitational singularity for human minds.
TikTok’s main attraction are the people, not just the videos. Trends, drama and etc. all involve real humans doing real human stuff, so it’s relatable.
I might be wrong, but AI videos are on the same path as AI generated images. Cool for the first year, then “ah ok, zero effort content”.
Put another way, over time people devalue things which can be produced with minimal human effort. I suspect it's less about humanity's values, and more about the way money closely tracks "time" (specifically the duration of human effort).
Yes, exactly. Marx had this right. Money is a way to trade time.
I strongly disagree. How many clothes do you buy that have 100 thread count, and are machine-made, vs hand-knit sweaters or something?
When did you ask people for directions, or other major questions, instead of Google?
You can wax poetic about wanting "the human touch", but at the end of the day, the market speaks -- people will just prefer everything automated. Including their partners, after your boyfriend can remember every little detail about you, notice everything including your pupils dilating, know exactly how you like it, when you like it, never get angry unless it's to spice things up, and has been trained on 1000 other partners, how could you go back? When robots can raise children better than parents, with patience and discipline and teaching them with individual attention, know 1000 ways to mold their behavior and achieve healthier outcomes. Everything people do is being commodified as we speak. Soon it will be humor, entertainment, nursing, etc. Then personal relations.
Just extrapolate a decade or three into the future. Best case scenario: if we nail alignment, we build a zoo for ourselves where we have zero power and are treated like animals who have sex and eat and fart all day long. No one will care about whatever you have to offer, because everyone will be surrounded by layers of bots from the time they are born.
PS: anything you write on HN can already have been written by AI, pretty soon you may as well quit producing any content at all. No one will care whether you wrote it.
>PS: anything you write on HN can already have been written by AI
Yeah in some broad sense, the same as we've always had: back in the 2010s it could have been generated by a Markov chain, after all. The only difference now is that the average quality of these LLMs is much, much higher. But the distribution of their responses is still not on par with what I'd consider a good response, and so I hunt out real people to listen to. This is especially important because LLMs are still not capable of doing what I care most about: giving me novel data and insights about the real world, coming from the day to day lived experience of people like me.
HN might die but real people will still write blogs, and real people will seek them out for so long as humans are still economically relevant.
> PS: anything you write on HN can already have been written by AI, pretty soon you may as well quit producing any content at all. No one will care whether you wrote it.
People theoretically would care, but the internet has already set up producing things to be pseudo-anonymous, so we have forgotten the value of actually having a human being behind content. That's why AI is so successful, and it's a damn shame.
What exactly is the value of having a human behind content if it gets to the point that content generated by AI is indistinguishable from content generated by humans?
The fact that anyone would ask this question is incredible!
It's so we can in a fraction of those cases, develop real relationships to others behind the content! The whole point of sharing is to develop connections with real people. If all you want to do is consume independently of that, you are effectively a soulless machine.
What does indistinguishable even mean here?
If a fish could write a novel, would you find what it wrote interesting, or would it seem like a fish wrote it? Humans absorb information relative to the human experience, and without living a human existence the information will feel fuzzy or uncanny. AI can approximate that but can't live it for real. Since it is a derivative of an information set, it can never truly express the full resolution of it's primary source.
I think "indistinguishable" is a receding horizon. People are already good at picking out AI text, and AI video is even easier. Even if it looks 100% realistic on the surface, the content itself (writing, concept, etc) will have a kind of indescribable "sameness" that will give it away.
If there's one thing that connects all media made in human history, it's that humans find humans interesting. No technology (like literally no technology ever) will change that.
I have both machine-made and hand-knit sweaters. In general, I expect handmade clothes to be more expensive than machine-made, which kinda proves my point. I never said machine-made things had zero value. I said we will tend to devalue them relative to more human-intensive things.
Asking for directions is a bad example, because it takes very little time for both humans and machines to give you directions. Therefore it would be highly unusual for anyone to pay for this service (LOL)
Sure, humanity has its ways of objecting Accelerationism, but the process fundamentally challenges human identity:
"The Human Security System is structured by delusion. What's being protected there is not some real thing that is mankind, it's the structure of illusory identity. Just as at the more micro level it's not that humans as an organism are being threatened by robots, it's rather that your self-comprehension as an organism becomes something that can't be maintained beyond a certain threshold of ambient networked intelligence." [0]
See also my research project on the core thesis of Accelerationism that capitalism is AI. [1]
[0] https://syntheticzero.net/2017/06/19/the-only-thing-i-would-...
Who needs viewers anyway? Automate the whole thing. I just see the endgame for the internet is https://en.wikipedia.org/wiki/Dead_Internet_theory
> Judging by how they've been trying to ram AI into YouTube creators workflows […]
Thanks for sharing that video and post!
One way to think about this stuff is to imagine that you are 14 and starting to create videos, art, music, etc in order to build a platform online. Maybe you dream of having 7 channels at the same time for your sundry hobbies and building audiences.
For that 14 year old, these tools are available everywhere by default and are a step function above what the prior generation had. If you imagine these tools improving even faster in usability and capability than prior generations' tools did …
If you are of a certain age you'll remember how we were harangued endlessly about "remix culture" and how mp3s were enabling us to steal creativity without making an effort at being creative ourselves, about how photobashing in Photoshop (pirated cracked version anyway) was not real art, etc.
And yet, halfway through the linked video, the speaker, who has misgivings, was laughing out loud at the inventiveness of the generated replies and I was reminded that someone once said that one true IQ test is the ability to make other humans laugh.
> laughing out loud at the inventiveness of the generated replies
Inventive is one way of putting it, but I think he was laughing at how bizarre or out-of-character the responses would be if he used them. Like the AI suggesting that he post "it is indeed a beverage that would make you have a hard time finding a toilet bowl that can hold all of that liquid" as if those were his own words.
"remix culture" required skill and talent. Not everyone could be Girl Talk or make The Grey Album or Wugazi. The artists creating those projects clearly have hundreds if not thousands of hours of practice differentiating them from someone who just started pasting MP3s together in a DAW yesterday.
If this is "just another tool" then my question is: does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?
I have not seen any evidence that it does.
Another idea: What the pro generative AI crowd doesn't seem to understand is that good art is not about _execution_ it's about _making deliberate choices_. While a master painter or guitarist may indeed pull off incredible technical feats, their execution is not the art in and of itself, it is widening the amount of choices they can make. The more and more generative AI steps into the role of making these choices ironically the more useless it becomes.
And lastly: I've never met anyone who has spent significant time creating art react to generative AI as anything more than a toy.
> does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?
Yes. A thousand hours confers you with a much greater understanding of what it's capable of, its constraints, and how to best take advantage of these.
By comparison, consider photography: it is ostensibly only a few controls and a button, but getting quality results requires the user to understand the language of the medium.
> What the pro generative AI crowd doesn't seem to understand is that good art is not about _execution_ it's about _making deliberate choices_. While a master painter or guitarist may indeed pull off incredible technical feats, their execution is not the art in and of itself, it is widening the amount of choices they can make.
This is often not true, as evidenced by the pre-existing fields of generative art and evolutionary art. It's also a pretty reductive definition of art: viewers can often find art in something with no intentional artistry behind it.
> I've never met anyone who has spent significant time creating art react to generative AI as anything more than a toy.
It's a big world out there, and you haven't met everyone ;) Just this last week, I went to two art exhibitions in Paris that involved generative AI as part of the artwork; here's one of the pieces: https://www.muhka.be/en/exhibitions/agnieszka-polska-flowers...
> Just this last week, I went to two art exhibitions in Paris that involved generative AI as part of the artwork; here's one of the pieces
The exhibition you shared is rather beautiful. Thank you for the link!
> If this is "just another tool" then my question is: does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?
Yes, absolutely. Not necessarily in apparent execution without knowledge of intent (though, often, there, too), but in the scope of meaningful choices that fhey can make and reflect with the tools, yes.
This is probably even more pronounced with use of open models than the exclusively hosted ones, because more choices and controls are exposed to the user (with the right toolchain) than with most exclusively-hosted models.
> "remix culture" required skill and talent.
We were told that what we were doing didn't require as much skill as whatever the previous generation were doing to sample music and make new tracks. In hindsight, of course you find it easy to cite the prominent successes that you know from the generation. That's arguing from survivorship bias and availability bias.
But those successes were never the point: the publishers and artists were pissed off at the tens of thousands of teenagers remixing stuff for their own enjoyment and forming small yet numerous communities and subcultures globally over the net. Many of us never became famous so you can cite our fame as proof of skill but we made money hosting parties at the local raves with beats we remixed together ad hoc and that others enjoyed.
> The artists creating those projects clearly have hundreds if not thousands of hours of practice differentiating them from someone who just started pasting MP3s together in a DAW yesterday.
But they all began as I did, by being someone who "just started pasting MP3s together" in my bedroom. Darude, Skrillex, Burial, and all the others simply kept doing it longer than those who decided they had to get an office job instead.
The teenagers today are in exactly the same position, except with vastly more powerful tools and the entire corpus of human creativity free to download, whether in the public domain or not.
I guess in response to your "required skill and talent", I'm saying that skill is something that's developed within the context of the technology a generation has available. But it is always developed, then viewed as such in hindsight.
We should collectively ignore these announcements of unavailable models. There are models you can use today, even in the EU.
Actually there is a pretty significant new model announced today and available now: "MiniMax (Hailuo)Video-01-Live" https://blog.fal.ai/introducing-minimax-hailuo-video-01-live...
Although I tried that and it has the same issue all of them seem to have for me: if you are familiar with the face but they are not really famous then the features in the video are never close enough to be able to recognize the same person.
It was announced weeks ago.
50 cents per video. Far more when accounting for a cherrypick rate.
I don't see why, unless you think they're lying and they filmed their demos, or used some other preexisting model. I didn't ignore the JWST launch just because I haven't been granted to ability to use the telescope.
Back when Imagen was not public, they didn't properly validate whether you were a "trusted tester" on the backend, so I managed to generate a few images..
..and that's when I realized how much cherry picking we have in these "demos". These demos are about deceiving you into thinking the model is much better than it actually is.
This promotes not making the models available, because people then compare their extrapolation of demo images with the actual outputs. This can trick people into thinking Google is winning the game.
That product name sucks for Veo the AI sports video camera company who literally makes a product called the Veo 2. (https://www.veo.co)
Time and money are better spent on creating actual video, animation, and art than this gen AI drivel.
Huge swathes of social media users are going to love this shit. It makes me so sad.
Google being Google:
> VideoFX isn't available in your country yet.
Don't worry, even if it was "available" in your country, it's not really available. I am in the US and I just see a waitlist sign up.
Give it a few months and it'll get cancelled
Why would the country get cancelled?
He means the project, obviously
Random fact: Veo means "I see" in Spanish. Take it on any way you want.
Hernan Moraldo is from Argentina. That may be all there is to it.
While Video means "I see" in latin
My theory as to why all the bigtech companies are investing so much money in video generation models is simple: they are trying to eliminate the threat of influencers/content creators to their ad revenue.
Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore. On the other hand, a lot of people including myself look into buying something advertised implicitly or explicitly by content creators we follow. Say a router recommended by LinusTechTips. A lot of brands started moving their as spending to influencers too.
Google doesn't have a lot of control on these influencers. But if they can get good video generations models, they can control this ad space too without having human in the loop.
It's so much simpler than that:
1) AI is a massive wave right now and everyone's afraid that they're going to miss it, and that it will change the world. They're not obviously wrong!
2) AI is showing real results in some places. Maybe a lot of us are numb to what gen AI can do by now, but the fact that it can generate the videos in this post is actually astounding! 10 years ago it would have been borderline unbelievable. Of course they want to keep investing in that.
> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.
This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.
> But if they can get good video generations models, they can control this ad space too without having human in the loop.
Looks like based on a misguided assumption. Format might have significant impacts on reach, but decision factor is trust on the reviewer. Video format itself does not guarantee a decent CTR/CVR. It's true that those ads company find this space lucrative, but they're smart enough to acknowledge this complexity.
> This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.
Even if its not, TV ads, newspaper ads, magazine ads, billboards, etc... get exactly 0 clickthrus, and yet, people still bought (and continue to buy) them. Why do we act like impressions are hunky-dory for every other medium, but worthless for web ads?
> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.
I remember saying this to a google VP fifteen years ago. Somehow people are still clicking on ads today.
I did not think about that angle yet but I have to admit, I agree. I rarely ever even pay attention to the YT ads and kind of just zone out but the recommendations by content creators I usually watch are one of the main sources I keep up with new products and decide what to buy.
> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.
Most people have claimed not to be influenced by ads since long before networked computers were a major medium for delivering them.