I'll be sure to post an update! It's not a Diffusion Model, but I believe this VQGAN-CLIP implementation has weights available - https://github.com/nerdyrodent/VQGAN-CLIP
Appreciate it! And I know ... I feel like we're really hitting a watershed moment with ML/DL. It looks like artists on DeviantArt have already objected to AI-generated art being allowed on the website or suggested a mandatory watermark at the very least.
Maybe I'm wrong that it was DeviantArt, but I read something about this. Sorry I don't have a source, I should've checked before mentioning it. The closest thing I could find were rules only for specific groups!
Thanks, appears one of the nine admins posted a message to a group that has less than two hundred members; to me, that neither significant, nor representative of DeviantArt. Also appears they got push back from the group.
While I didn’t review much of the “art” from the group, looked like clip art memes with text; if so, little odd that admin would take issue with anything being made using text-to-image generated art.
Seems a bit daft to me. Who are they to say that AI generated art isn't art? Couldn't you say the brush has been replaced keystrokes? Someone still needs to type in inputs and decide what is good/bad. I can understand banning bots that are automatically generating/uploading stuff. Also I wonder how they can prove if a human vs. AI made something if the quality is good enough.
I think for a creative website like that is intended to showcase artists it makes sense. Unfortunately I don't think there's much they can do about losing in the market to text-to-image models in the long run... the costs are essentially zero
Who's to say a monkey picking stocks by throwing darts at a wall can't be a portfolio manager?
"AI Art" all looks the same to me. Just enough fuzziness in the style so you can't see the hard edges of what they copied, or rather indexed in memory as part of the training dataset, then created a small variation of that.
That might be good enough for replacing Pexels, Unsplash or any of those stock photo sites that blogs pull from. But not much else.
You're going to be proven wrong in weeks to months. Just a couple of years ago the consensus was that DALL-E/Imagen/Stable Diffusion quality image generation was impossible. Now it's very real and quality is improving every month.
Interesting, thanks for the link. It seems that CLIP encodings aren't as useful as frozen encoders from the textual domain, which is a little unintuitive imo. Can't keep up with all these advancements!
You don't think in a few years AI art will be indistinguishable for human generated? In 2012, self driving cars were a funny joke. A decade later, here they are. I think AI is somehow chronically both overestimated and underestimated
Everything on Stock photo sites is human generated, free and effectively infinite. In other words, commodified to the point of having its market value be $0.
I'm sure this can be monetized to generate convincing AI porn, but for non-porn uses, what will it replace? Deep fried memes?
The only photos worth money are those of real people photographed at real moments in time. Nobody ever bought a Getty subscription for photorealistic clipart.
Stock images are not free. Hiring an artist to create a concept or illustration based on your instructions is also not free. Creating art assets for games is not free. Copyright is a pretty big thing and these models currently seem to sidestep it wonderfully.
Also I can generate a photo of Leonardo DiCaprio picking his nose with a French fry, so that has some value for me.
Yeah, but the joke is on us because we're the ones who are being forced to share the road with them. I sure didn't sign up to be a part of the beta testing.
Exactly, it's just not feasible. Maybe they will train a discriminator to determine which ones are generated, but I don't think that would work very well. Also, DALL-E 2 images come with a watermark but I'm pretty sure there are already tools to remove that...
I remember the stink-eye cast by (some) artists who grew up with traditional media at how digital painting didn't count, and was somehow 'cheating'. As if the masters of the past wouldn't have loved having layers and control-z!
Same situation now. I know working artists thrilled with the idea of being able to iterate in hours what previously would have taken weeks to test composition, form, and the like as they focus in on what they have in mind. I can't wait to see what they come up with.
I think Digital media is ok for most artists. Almost all the commercial science visualization folks are using it (based on a web-mini conference panel), and a fair number of artist in the nonprofit I help out with use digital tools.
I think this is different than machine generated art.
I know a few of us who have dabbled in creating procedurally generated art have found it fun and useful for some things, but kinda soulless, and less satisfying. AI art gets around that by using huge training sets of human generated stuff and mimicking. Its good and getting better, but its not like you get exactly what you had envisioned (though you get what you asked for)..
I understand that artists are scared. Personally I look at it as freedom. Having a tool like this in your artists toolkit will augment performance by a stretch
And, of course, these models learn in part from the art generated by these artists. If we stop having humans create art, we're no longer generating data to train the next generation of models and so in some sense the "creativity" of these models would seem to necessarily be hindered.
Don't you think that the selection event that occurs when human minds choose exactly what to use and post for others to see will continue pushing things forward, even if what's driving the brush strokes changes a bit?
I suspect it will take more like 10 years at least to produce convincing video. The technology isn't too far off except that the compute requirements are pretty extreme without some clever work. Lots of clever stitching needs to be done too.
You need models that can take a description of a scene and produce a story-board like series of low res images. (And maybe vice versa). Then, you need a model that can infer semantics and movement in logical ways between those panels to generate images to fill in the gaps. Then lots and lots of clever cleanup and resolution enhancement both of individual frames and the changes between neighboring frames without introducing all kinds of weird, fuzzy, moving, dream-like artifacts.
...Then you've got to somehow add audio that ALSO understands semantics in the same way as the story board. Maybe something that can generate an audio clip to go with the storyboard. ....And then fill in the gaps based on the generated video. Making those match seems like a really hard, but not impossible problem. In the short term, a bunch of moaning at appropriate times to mouths moving and whatnot seems feasible though.
Although, I expect fairly high quality text-to-image porn is likely only a few months to a year away.
The technology is there, someone just needs to pay to train the model... and then the cost of compute is what, like 300 grand? A few hundred more should get you enough engineering to apply existing techniques. Say $1 million in costs for a product that seems like incentive enough to get a bunch of members to pay a monthly fee.
At the rate AI image generation is going, I highly doubt it'll take another 10 years. Only 10 years ago did AlexNet come onto the scene and blow away image recognition contests.
[0] https://www.assemblyai.com/blog/diffusion-models-for-machine...
I'm eager to see what the next 5 years will bring us
A very interesting time we're living in
(Few quick Google searches turned up nothing.)
https://www.deviantart.com/rtnightmare/journal/New-Group-Rul...
While I didn’t review much of the “art” from the group, looked like clip art memes with text; if so, little odd that admin would take issue with anything being made using text-to-image generated art.
"AI Art" all looks the same to me. Just enough fuzziness in the style so you can't see the hard edges of what they copied, or rather indexed in memory as part of the training dataset, then created a small variation of that.
That might be good enough for replacing Pexels, Unsplash or any of those stock photo sites that blogs pull from. But not much else.
They continued building off their latent diffusion direction (encode with vqgan-vae and then diffusion in latent space)
All roads lead to rome
https://github.com/CompVis/stable-diffusion https://arxiv.org/abs/2112.10752
I'm sure this can be monetized to generate convincing AI porn, but for non-porn uses, what will it replace? Deep fried memes?
The only photos worth money are those of real people photographed at real moments in time. Nobody ever bought a Getty subscription for photorealistic clipart.
Also I can generate a photo of Leonardo DiCaprio picking his nose with a French fry, so that has some value for me.
That would lead to even higher quality images.
I support this effort.
Same situation now. I know working artists thrilled with the idea of being able to iterate in hours what previously would have taken weeks to test composition, form, and the like as they focus in on what they have in mind. I can't wait to see what they come up with.
I think this is different than machine generated art.
I know a few of us who have dabbled in creating procedurally generated art have found it fun and useful for some things, but kinda soulless, and less satisfying. AI art gets around that by using huge training sets of human generated stuff and mimicking. Its good and getting better, but its not like you get exactly what you had envisioned (though you get what you asked for)..
- we will surely see sequential image synthesis by then
- we will surely see matching motion audio synthesis
- we will surely see single image to 3D reconstruction
- we will surely see haptic feedback and VR progress
- we will win.
...Then you've got to somehow add audio that ALSO understands semantics in the same way as the story board. Maybe something that can generate an audio clip to go with the storyboard. ....And then fill in the gaps based on the generated video. Making those match seems like a really hard, but not impossible problem. In the short term, a bunch of moaning at appropriate times to mouths moving and whatnot seems feasible though.
Although, I expect fairly high quality text-to-image porn is likely only a few months to a year away.
The technology is there, someone just needs to pay to train the model... and then the cost of compute is what, like 300 grand? A few hundred more should get you enough engineering to apply existing techniques. Say $1 million in costs for a product that seems like incentive enough to get a bunch of members to pay a monthly fee.