How Imagen Works

(assemblyai.com)

142 points | by SleekEagle 12 days ago

13 comments

  • skinner_ 12 days ago
    > The central intuition in using T5 is that extremely large language models, by virtue of their sheer size alone, may still learn useful representations despite the fact that they are not explicitly trained with any text/image task in mind. [...] Therefore, the central question being addressed by this choice is whether or not a massive language model trained on a massive dataset independent of the task of image generation is a worthwhile trade-off for a non-specialized text encoder. The Imagen authors bet on the side of the large language model, and it is a bet that seems to pay off well.

    The way out of this dilemma is to fine-tune T5 on the caption dataset instead of keeping it frozen. The paper notes that they don't do fine-tuning, but does not provide any ablation or other justification. I wonder if it would help or not.

  • varispeed 12 days ago
    > is trained on hundreds of millions of images and their associated captions

    So how do you get access to hundreds of millions of images and use them to create derivative works? Did they get consent from millions of authors?

    Or is something like that only available to the rich with access to lawyers on tap?

    I mean I can imagine if a nobody wanted to do something like this, they'd get bankrupted by having to deal with all the photographers / artists spotting a tiny sliver of their art in the image produced by the model.

    Furthermore, would something like this work with music? For instance, train the model on all Spotify songs and then generate songs based on "Get me a Bach symphony played on sticks with someone rapping like Dr Dre with lisp." Or do music industry have enough money to bully anyone into not doing that?

    • sumy23 12 days ago
      There are open datasets with that many image-text pairs. E.g. https://laion.ai/blog/laion-400-open-dataset/ There is even a dataset with 5 billion image-text pairs if you're feeling adventurous: https://laion.ai/blog/laion-5b/
      • varispeed 11 days ago
        I didn't know about this! Thank you
      • SleekEagle 12 days ago
        Presumably Google's terms of service or fair use laws. The real restriction is that, even if you had the dataset, training costs tens of thousands of dollars. Only corporations can really afford to train these things.

        Regarding music - audio generation with Diffusion Models (the main component of Imagen and DALL-E 2) has been done, but not sure about music specifically. We will definitely reach the point where most e.g. pop beats will be able to be made by AI relatively soon.

        All a producer has to do is generate 100 beats and select the one s/he likes, potentially interpolate between 2 or finetune it.

        • astrange 12 days ago
          This is a real issue, but it's solvable with work.

          It's claimed that ML models' output isn't copyrightable because it's fair use, but that's hard to believe; a large model can easily memorise and output exactly one of its inputs again. This is easier to see with text, where GPT and Copilot both do it, but images can do it too.

          > So how do you get access to hundreds of millions of images and use them to create derivative works? Did they get consent from millions of authors?

          Build the model out of Creative Commons images only. There's a lot of 'em and it's good enough. You may need to exclude CC-BY since they currently can't follow the attribution requirement.

          > Or is something like that only available to the rich with access to lawyers on tap?

          More likely companies willing to license a stock photography database.

          • davikr 12 days ago
            I've seen an image generated by AI contain an "Alamy" watermark before.
          • astrange 12 days ago
            Is there a compare and contrast between Imagen and Parti anywhere? I realize the paper came out yesterday, but maybe other people remember what "autoregressive" means better than I do.
            • SleekEagle 11 days ago
              Upon first inspection, Parti is not as good. This is perhaps unsurprising - in DALL-E 2 the prior model tested between autoregressive and diffusion models and the diffusion model outperformed
            • Workaccount2 12 days ago
              I have shown imagen (and dalle2) to a number of people now (non-tech, just everyday friends, family, co-workers) and I have been pretty stunned by the response I get from most people:

              "Meh, that's kinda cool? I guess?" or "What am I looking at?"..."Ok? So a computer made it? That seems neat"

              To me I am still trying to get my jaw off the floor from 2 months ago. But the responses have been so muted and shoulder shrugging that I think either I am missing something or they are missing something. Even really drilling in, practically shaking them "DO YOU NOT UNDERSTAND THAT THIS IS A ORIGINAL IMAGE CONSTRUCTED ENTIRELY BY AN AI?!?!" and people just seem to see it as a party trick at best.

              • dougmwne 12 days ago
                I think I can explain this that for most people the whole world is basically magic anyway. They don’t understand any of the details about how any digital tech works so to them they have no framework for which things are impressive and which things are not. The just know that computers can do a great many things that they know nothing about. “Oh I can bank online? Ok.” “Oh, I can have the computer write my book report for me? Ok.” “Oh, this McDonalds is fully staffed by sentiment robots? Ok.”
                • GrabbinD33ze69 12 days ago
                  A pretty common generalization I've witnessed is many non technical people (even people who are tech savvy but have no CS background do this) is people assuming the feature that is in reality quite difficult to implement won't take much effort, and vice versa.
                • endymi0n 12 days ago
                  I think that hits home.

                  A lot of people would just answer something to the likes of "Well, they made The Matrix with a computer 20 years ago", and technically that's just as true.

                  From their remote viewpoint on what's happening in IT, the rest is an implementation detail to them.

                  • mortenjorck 12 days ago
                    This is the other side of the classic XKCD "Tasks" (https://xkcd.com/1425/).

                    A non-technical person in 2014 (when the above was originally published) would likely have the same conception of the difficulty of recognizing a bird from an image as they would in 2022, even though the task itself has gone from near-insurmountable to off-the-shelf-library in eight years.

                    Even as Imagen and Dall-E 2 amaze us today, these feats will likely be commonplace in a few years. The non-technical may have only a vague sense that their new TikTok filter is doing something that was impossible only a few years prior.

                    • dougmwne 12 days ago
                      Exactly and I was thinking of that XKCD. Very much case in point, I have the Merlin Bird ID app which can determine species from ridiculously blurry photos and can also identify hundreds of birds from their calls alone in noisy environments. In 2014 I would have sworn this would be impossible.
                      • DonHopkins 11 days ago
                        The tooltip you get when you hover your cursor over the comic:

                        "In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it."

                        I'm working with his son Henry Minsky and other great people at Leela AI on that same old problem, applying hybrid symbolic-connectionist constructivist AI by combining neat neural networks with scruffy symbolic logic to understand video, and it's mind boggling what is possible now:

                        https://leela.ai/

                        >Our AI system, Leela, is motivated by intrinsic curiosity. Leela creates theories about cause and effect in her world, and then conducts experiments to test these theories. Leela can connect all her knowledge and use this network to make plans, reason about goals, and communicate using grounded natural language.

                        >Leela has at her core a hybrid symbolic-connectionist network. This means that she uses a dynamic combination of artificial neural networks and symbol networks to learn. Hybrid networks open the door to AI agents that can build their own abstractions on the fly, while still taking full advantage of the power of deep learning.

                        https://en.wikipedia.org/wiki/Neats_and_scruffies

                        >Neats and scruffies: Neat and scruffy are two contrasting approaches to artificial intelligence (AI) research. The distinction was made in the 70s and was a subject of discussion until the middle 80s. In the 1990s and 21st century AI research adopted "neat" approaches almost exclusively and these have proven to be the most successful.

                        >"Neats" use algorithms based on formal paradigms such as logic, mathematical optimization or neural networks. Neat researchers and analysts have expressed the hope that a single formal paradigm can be extended and improved to achieve general intelligence and superintelligence.

                        >"Scruffies" use any number of different algorithms and methods to achieve intelligent behavior. Scruffy programs may require large amounts of hand coding or knowledge engineering. Scruffies have argued that the general intelligence can only be implemented by solving a large number of essentially unrelated problems, and that there is no magic bullet that will allow programs to develop general intelligence autonomously.

                        >The neat approach is similar to physics, in that it uses simple mathematical models as its foundation. The scruffy approach is more like biology, where much of the work involves studying and categorizing diverse phenomena.

                        We're looking for talented engineers and designers to help, including neats and scruffies working together!

                        https://leela.ai/jobs/

                      • DonHopkins 11 days ago
                        That is exactly what Will Wright (the creator of SimCity and The Sims, and Robot Wars / Battle Bots contestant) was getting at when we made these one-minute robot reality videos about "Empathy" and "Servitude".

                        His idea was to probe just how much random people on the street (or in a diner) would believe about autonomous intelligent robots operating in the real world.

                        Of course we were actually hiding behind the scenes tele-operating the robots through hidden cameras and a wireless web interface, listening to what the people said and making the robots respond with a voice synthesizer and sound effects, clicking on pre-written phrases and typing ad-libbed responses.

                        Empathy (a broken down robot begs for help from passers by on the streets of Oakland):

                        https://www.youtube.com/watch?v=KXrbqXPnHvE

                        Servitude (a robot waiter takes orders and serves food in a diner in Oakland, making stupid mistakes and asking for a good review):

                        https://www.youtube.com/watch?v=NXsUetUzXlg

                        All his robots aren't as harmless, non-violent, polite, and obsequious as those two. Here's an old interview with Will at Robot Wars 1997:

                        https://www.youtube.com/watch?v=5nmbs0WqDQM

                        Here is Super ChiaBot and her MiniBots, created by Will and his daughter Cassidy, getting its leaves shredded and body slammed at BattleBots:

                        https://www.youtube.com/watch?v=DrArvRG2yQA

                        Here's a more recent video of Will throwing a tantrum about the failure of SimSandwich, destroying his old creations because they're pixely and poorly rendered, then complaining about how those jerks at EA hate him:

                        https://www.youtube.com/watch?v=i-7F7s46-9A

                        • firecall 12 days ago
                          "Oh, they have the internet on computers now!" Homer J Simpson.
                        • jazzyjackson 12 days ago
                          I think if you've been paying attentiont to the space, this generation of image diffusion is shocking in how quickly it has improved on what we had a year ago.

                          But if you've never considered that a computer can produce an original image, this is just a new thing computers can do. OTOH I think it's also a lack of imagination in how useful this is, so far the output has been kind of random, so it seems a little gimmicky. Already "Parti" has gotten much closer to allowing a user to describe exactly what they want in the image, and as people start to see the use cases for them personally, it will hit them that they no longer have to hire someone, they can just type a request into a box.

                          • astrange 12 days ago
                            You can just type a request in the box if you don't particularly care what the result looks like and also don't care that some of the features might be copyrighted (since large models are quite capable of memorizing their training data.)

                            Asking for two different images in a series that have similar "art styles" is going to be enough work to still need a specialist aka an artist; it'll be most useful in cases you never would've bothered finding one before.

                            • treesprite82 11 days ago
                              > Asking for two different images in a series that have similar "art styles" is going to be enough work to still need a specialist aka an artist

                              Running a separate style transfer network on the generated images is currently possible, although won't achieve the best possible results.

                              I wouldn't be surprised in the near future to see generation models that can take a text prompt and an image to mimic the style of, which could let it take style into account when generating the image rather than at just the surface level.

                              • astrange 11 days ago
                                Since “style” isn’t at the surface level, you can’t take it into account with a single input. It means whatever the client wants it to mean. Getting an AI to do what you want there is still going to be a long conversation they won’t want to do.

                                It might (likely will be) easier to use the AI as a storyboard generator and have your in-house artists redraw it.

                            • SleekEagle 12 days ago
                              I'm not sure there has been a period of more rapid development in DL than Diffusion Models (maybe transformers?). The next few years will be really interesting.
                            • rasz 12 days ago
                              Its because people have been able to do this for years now, and so did you. You can try right now. Go to google, type "cat on a bicycle" and hit image search. TADA, computer made cats on bicycles images appear! Wheres the magic in that?

                              >THIS IS A ORIGINAL IMAGE

                              Yeah, about that. Ask it to draw you a fast inverse square root.

                              • ratioprosperous 12 days ago
                                I love this comment, and I'd love to see an AI conjecture an explanation as to why I love it...
                              • clircle 12 days ago
                                People dont care because all their text to image needs are well covered by Google Images.
                                • monkeybutton 12 days ago
                                  Perhaps it's the combination of AI being so overhyped in the general public plus media that's already inundated with CGI, that it just doesn't blow them away?
                                  • joshcryer 12 days ago
                                    I've made perhaps overly absolutist statements like "don't you see! this kills artists jobs!" and it was shrugged off as if I was insane. I probably could've phrased it differently, but to me this is game changing in several fields. Granted, it will open up a new field of "generative artists" but, having played with these things, this is a pretty trivial job, and their training nets are only going to get better.
                                    • Uehreka 12 days ago
                                      I’ve had a lot of fun playing with Disco Diffusion prompts, but I agree that the people excited about “a generation of prompt artists” are a bit misguided. Soon an AI will emerge that can come up with “better” prompts than you, and the “art” of creating prompts will have a lower skill ceiling.
                                      • tsol 12 days ago
                                        Like a neutral network just for making prompts that result in aesthetically pleasing Imagen images? And then maybe we can come up with a neutral net that can decide which pictures are good and which aren't. Then we can just have robots making art for the sake of consumption solely by robots.
                                        • russdill 12 days ago
                                          The GPT algorithms are actually pretty good at making detailed image generation prompts if you ask it to describe in detail the general idea you want.
                                          • SleekEagle 12 days ago
                                            Do you have a link to any papers about this? Would love to check them out
                                            • russdill 12 days ago
                                              No, just playing around with dall-e mini (no access yet to anything else) and beta.openai.com's text-davinci-002 model. For instance, if I ask dall-e mini for "painting with dancers":

                                              https://i.imgur.com/flXoTgZ.png

                                              I can ask davinci-002 "Vivid description of a painting with dancers:" and get:

                                              The painting is of two dancers in a passionate embrace, their bodies entwined as they move together in a sensual dance. The woman's dress is flowing and reveals her curves, while the man's shirt is open, revealing his muscular chest. They are surrounded by a crowd of people who are watching them with looks of admiration and desire. The painting is full of color and movement, and the dancers seem to be in a world of their own, lost in their passion for each other.

                                              And then pass that to dall-e mini:

                                              https://i.imgur.com/eOIQuPF.png

                                              dall-e mini is sadly not quite up to the challenge, but it gives the generation a lot more detail. Some other examples:

                                              "The painting is of two dancers in the middle of a dance. They are both wearing white, and their hair is flowing around them as they move. The background is a blur of color, and the light is shining on the dancers, making them look like they are in the spotlight."

                                              https://i.imgur.com/ldktMHO.png

                                              "The painting is full of energy and movement, with the dancers leaping and spinning around the stage. They are all wearing brightly coloured costumes, which stand out against the dark background. The light from the stage spotlight is shining on them, making them look even more vibrant. The whole scene is full of life and excitement."

                                              https://i.imgur.com/1KFJbzJ.png

                                        • danielvaughn 12 days ago
                                          To me, it paves the way for creative prototyping. I don't see this as a zero-sum game between artists and AI. Instead, I could see artists using this for some serious time saving, and leveraging that extra time and energy for creating better results.
                                          • SleekEagle 12 days ago
                                            It could also be used for more nefarious reasons like disinformation campaigns though... it will be interesting to see what the next few years have in store
                                            • astrange 12 days ago
                                              You don't need good-looking pictures for propaganda. Old people (the main targets) believe literally anything they see on Facebook, especially if it confirms their priors aka fits their worldview, and prefer it to look bad because that's more authentic. For anyone else, the point is to make them disbelieve everything, not to believe you specifically.
                                          • corysama 12 days ago
                                            Over a decade ago, Will Wright (of SimCity fame) faked conversational AGI robots in the streets and restaurants of Oakland. It consistently took people 2.4 seconds to go from “Oh look. The robots have arrived.” to “And, I’ll have fries with that.”

                                            Hollywood and the media have taught the public that tech is literally magic and can do literally anything. “Anything” is expected and pedestrian.

                                            • ALittleLight 12 days ago
                                              I often think a similar thing about aliens. That is, instead of the panicking and hysteria or whatever that fiction imagines might accompany the discovery of aliens I fully expect that people will mostly go "Oh, neat. Aliens." And go on with their lives.
                                            • thruuavay 12 days ago
                                              Well, I'm still in awe that I have a bunch of walls around me and can cover my body with clothes, or that I'm still alive after all this time, and that I can even rest most of the day and not spend body energy running after or from animals. Amazing stuff.

                                              A program that transforms text to an image? Huh.

                                              • Genbox 12 days ago
                                                I find that most people are primarily driven by a need. You need food? Pick some berries. You need warmth? Start a fire.

                                                When it comes to technology - especially advanced technology like Imagen - people don't see the value because they don't have a need associated with it.

                                                • Wistar 12 days ago
                                                  I haven't gotten such dismissive responses, but probably only because those I'm inclined to share such things with are the exact kinds of people who'd be blown away by them, and immediately grasp the significance.
                                                  • ja3k 12 days ago
                                                    I couldn't convince my mother in law it was more impressive than photoshop.
                                                    • madiator 12 days ago
                                                      Similarly, when I sometimes talk to people about AI (and AGI) and how it will change the world, people respond, meh, yeah ok, so what?
                                                      • SleekEagle 12 days ago
                                                        I've gotten a lot of "wow, that's cool!"s, which is a pretty fair response for a non-technical person if you ask me!
                                                        • yreg 12 days ago
                                                          Non-techy people understandably don't have a grasp of the difficulty of (programming) tasks. I think that makes it hard for them to get amazed in cases like this.

                                                          https://xkcd.com/1425/

                                                          • trention 12 days ago
                                                            It's just an illustration of the fact that the average person doesn't give a sh*t about AI "art" and that it will have ~zero cost and ~zero value.
                                                            • phailhaus 12 days ago
                                                              Treating Imagen as just an "AI art generator" is extremely short sighted. Sure, you could just try to sell the outputs directly. But the real value is using it to supplement larger works. No need for a stock photo subscription service if you can just generate them automatically. Don't need artists to create textures for your simple games. I can spin up a merch shop powered entirely by AI art and nobody would know. The marginal cost of creation is approaching zero.
                                                              • SleekEagle 12 days ago
                                                                And perhaps even more interestingly these things not only exist but there is competition in this space! Essentially unregulated competition as well (and likely for the next 10 years). The cost will be driven into the ground.
                                                              • Miraste 12 days ago
                                                                The apocryphal Henry Ford quote about the average person wanting better horses comes to mind. People off the street have no concept of the impact this tech and the methods behind it will have. Sure, no one is going to be printing these and hanging them in museums. Very few artists support themselves that way, though. The people diffusion models are coming for are the graphic designers, the concept artists, the marketers, and everyone else with a copy of Photoshop and a Getty subscription. GPT-3 is amazing, but it's also not good enough to be useful. Imagen is industry-destroying.
                                                                • trention 12 days ago
                                                                  Although I agree that a somehow less extreme version of that will happen in the course of this decade bar a legal decision to prohibit using those models, that won't translate to comparable revenues. The companies providing those services will struggle to make even 10% of the salaries of the displaced workers in revenue. In fact, this will probably be a GDP-destroying (though not value-destroying) application of technology.
                                                                  • SleekEagle 12 days ago
                                                                    It's not about generating more revenue, it's about cutting costs. Any company that employs graphic designers etc. will be able to cut 90% of the staff.

                                                                    Video game companies that need concept art? How about 1 guy/gal with Imagen to generate baselines and then curating/tailoring as necessary instead of a team of 5

                                                                    • trention 12 days ago
                                                                      That has nothing to do with anything I wrote. And doesn't contradict it actually.

                                                                      Saved costs will not translate to higher margins for those that cut them because all competitors will be able to slash them as well, resulting in lower prices across the board.

                                                                • bergenty 12 days ago
                                                                  With the amount of context awareness this AI has, there’s nothing all that special about human “art” to be honest.
                                                                  • trention 12 days ago
                                                                    I am willing to bet that the revenue from AI-generated "art" will be smaller than the revenue from human-generated art in 5 years (or even 10 years) despite the former probably being at least 2 orders of magnitude higher in volume. This is basic supply and demand + acknowledging the fact that humans don't care about AI "achievements".
                                                                    • bergenty 12 days ago
                                                                      AI achievements will be indistinguishable from human achievements. Humans will try to pass off AI achievements as their own. The line will become so blurred that it will be impossible to tell the difference.
                                                                      • trention 12 days ago
                                                                        If that happens, all art will simply have no value and art as % of GDP will plummet.

                                                                        Incidentally, this hasn't happened in areas where AI already dominates like chess and go. Magnus Carlsen alone probably generates more "revenue" than all chess AIs combined.

                                                                        • astrange 12 days ago
                                                                          In general, it's not possible for machines to replace labor - this is the Luddite fallacy. If the machines do exactly what you ask them to do this becomes even more true, because labor has the comparative advantage that they'll do things you don't know to ask for.

                                                                          It is possible for the labor to quit and find something better to do, as happened to elevator operators, but that's a good thing.

                                                                          In the case of chess, AIs don't want money and Magnus does, so they're not going to help you find ways to get more of it.

                                                                          • Workaccount2 11 days ago
                                                                            If there was a way to have an AI feed you moves without being caught, then I am positive Carlson wouldn't be at the top for long.
                                                                            • trention 10 days ago
                                                                              He wouldn't be at the top only if he was not using said way and his competitors were. This is of course missing the point. If chess enthusiasts knew that grandmasters were using AI to aid them when playing in tournaments, the interest (and the revenues) would simply plummet.
                                                                • coding123 12 days ago
                                                                  Is this by a person that knows or is guessing?
                                                                • sagarpatil 12 days ago
                                                                  I wonder how developers can monetise this? What use cases does it have?
                                                                  • natch 12 days ago
                                                                    > Imagen, released just last month, can generate high-quality, high-resolution images given only a description of a scene

                                                                    “Released”? What? Papers are published. Websites are published. Tools are “released.”

                                                                    Where has Imagen been released?

                                                                    • bpiche 12 days ago
                                                                      This implementation popped up on hacker news not too long ago. I got it working on Colab first, and then my own GPU at home. But just barely. Need more memory :)

                                                                      https://github.com/lucidrains/imagen-pytorch

                                                                      • Voloskaya 12 days ago
                                                                        The value is in the data and the trained weights, the implementation is not where the bottleneck is in term of reproducing those models.

                                                                        Still great work from the author though, but we most definitely cannot say that imagen is released.

                                                                        • echelon 12 days ago
                                                                          Are there any large publicly available models, ready to fine tune and deploy, that were trained on massive data sets?

                                                                          I really want to build services with these.

                                                                        • stavros 12 days ago
                                                                          Wait, so I can try this on Colab right now?
                                                                          • refulgentis 12 days ago
                                                                            No, something that's been causing a lotta confusion in AI art is people stand up quick implementations generally matching the general description in the paper, but, they're not really investing in training them. Then people see "imagen-pytorch" on GitHub and get confused, either think it's Imagen itself or a suitable replica of it.

                                                                            There's like 3 projects named DallE, and then the 2 real DallEs...frustrating.

                                                                            • joshcryer 12 days ago
                                                                              People are really thirsty to play with this tech, you can't blame them. Just search for dataset creators on Hugging Face. I'd link directly to several of them running but it would just overwhelm the creators. If you want to be in early you'll find them. The beautiful thing is open source is going to make this stuff available for everyone and in very short timeframe. It's crazy how fast it moves.
                                                                              • yreg 12 days ago
                                                                                >The beautiful thing

                                                                                I'm very much looking forward to be able to play with this tech asap. I'm still excited about AIDungeon.

                                                                                However, OpenAI and creators of the other big name models are restricting the access for good reasons and I'm unsure whether it will be a beautiful thing once it's available for everyone…

                                                                                • paconbork 12 days ago
                                                                                  Eh, it exists and it's an inevitability that it will eventually be used in terrible ways. OpenAI and Google people just want the CV booster for having created it but want to pretend it's not their fault when it's used to do a racismsexism.
                                                                                  • bpiche 12 days ago
                                                                                    I agree. Tay is still fresh in a lot of folks’ minds.
                                                                              • spullara 12 days ago
                                                                                It is a suitable replica of it. Just isn't trained.
                                                                                • mapt 12 days ago
                                                                                  "I gave you an open implementation of NAND and NOR gates. That's the core of this groundbreaking CPU. Just finish the job!"
                                                                                  • natch 12 days ago
                                                                                    But the training is the thing that would make it suitable.
                                                                                    • bpiche 12 days ago
                                                                                      I mean, you try training this thing without a warehouse full of GPUs… to me, the algorithm is just as interesting as the model. Perhaps more so.
                                                                                      • natch 12 days ago
                                                                                        "This thing" has already been trained. Nobody is saying the algorithm is not interesting. Just that "this thing" has not been released.
                                                                                        • bpiche 12 days ago
                                                                                          Yes, nobody said one way or another. I chose to shine a light on the algorithm. What’s your point?
                                                                                          • natch 10 days ago
                                                                                            Rather rude to ask it that way. My point is there for you to read, or miss. Up to you.
                                                                          • alexccccc 12 days ago
                                                                            Super interesting
                                                                            • dubswithus 12 days ago
                                                                              If Google has something similar or better it definitely makes it look like OpenAI is wasting its time. None of this relates to AGI.
                                                                              • SleekEagle 12 days ago
                                                                                I don't think anyone is saying that humanity is close to AGI, but check out DeepMind's Gato work for a more well-rounded agent:

                                                                                https://www.deepmind.com/publications/a-generalist-agent

                                                                                • visarga 12 days ago
                                                                                  I think we're past a certain threshold, maybe not AGI but some definite qualitative change is happening.
                                                                                  • SleekEagle 12 days ago
                                                                                    I mean DALL-E 2 was the first time my jaw really hit the floor, although in fairness GPT-3 probably should've done that, but it's easier to do with images.

                                                                                    And then for this to drop just a month later? Insane. It makes you wonder if they're actually releasing cutting edge, or Google decided to write this paper just because of the publication of DALL-E 2. Maybe they've had this model in the bag for a year.

                                                                                    • alphabetting 12 days ago
                                                                                      Google also released this different text to image model yesterday

                                                                                      https://parti.research.google/

                                                                                      I think they've just got a lot of projects going on under the hood and timing was coincidence.

                                                                                      • SleekEagle 12 days ago
                                                                                        Looks cool although not as good as Imagen. Autoregressive vs Diffusion i guess
                                                                                      • astrange 12 days ago
                                                                                        It seems you can do a lot by making a really big model, but it'd be more impressive to do a lot with a small model, or build one that can explain itself and its "inspirations" in the training data.

                                                                                        WebGPT can do the last one, and seems more useful than GPT3, but also like less of a magic trick so it might not impress people as much.

                                                                                    • Veedrac 12 days ago
                                                                                      Lots of people are saying that. I am saying that. OpenAI has it as a foundational mid-term goal.
                                                                                  • funstuff007 12 days ago
                                                                                    What's the highest price paid for an AI-generated image NFT?
                                                                                    • lofatdairy 12 days ago
                                                                                      Unfortunately it seems like it's greater than 0...

                                                                                      If we ignore the procedurally generated NFTs created from mixing and matching various assets and go with ones where AI is the selling point, we're left with a few notable ones: Sophia, a robot w/ some low-level AI sold a single piece for 689k USD [^1]. Botto, a VQGAN-based algorithm sold a single piece for 430k USD and has sold multiple other pieces for tens to hundreds of thousands of dollars. Slightly more modest are some other projects like Metascapes [^3] and Eponym [^4], which produced some really tedious pieces that managed to sell for 3.5k USD and 10k USD respectively. That said, the Eponym piece seems to be some sort of self promotion, so maybe we can say that the actual prices for these collections are somewhere in the fraction of an ETH range if they can be sold at all.

                                                                                      Honestly, only the Botto piece is remotely interesting to look at, and even then I feel as if the blurred, "dreamy" aesthetic that seems to be in so many different AI painting approaches (style-transfer, VQGANS, DALL-E, maybe others I'm not aware of). I think it was more interesting back when we could pretend that these were the electric sheep at the fringes of some deep-sleeping latent intelligent potential but now they just feel kinda arbitrary and lacking deliberation. I absolutely love the field and think these researchers have done tremendous work, but I feel as though all the lay news attention is on the art, and not on the algorithm that generated it. The fascinating thing is that we have a machine that can produce novel something from words or basic ideas and that the output's content retains these ideas, not so much that art itself has that much compositional or stylistic merit.

                                                                                      [^1]: https://niftygateway.com/itemdetail/primary/0xbe60d0a37ebde6...

                                                                                      [^2]: https://superrare.com/artwork-v2/scene-precede-29922

                                                                                      [^3]: https://opensea.io/assets/ethereum/0x75d639e5e52b4ea5426f2fb...

                                                                                      [^4]: https://opensea.io/assets/ethereum/0xaa20f900e24ca7ed897c44d...

                                                                                    • aceon48 12 days ago
                                                                                      AI is now creative
                                                                                      • DonHopkins 12 days ago
                                                                                        Wait, this isn't about the line of intelligent xeroxographic laser printers developed by Imagen Corporation in 1981, supporting the Impress printer language?

                                                                                        https://tug.org/TUGboat/tb02-2/tb03imagen.pdf

                                                                                        https://www.openprinting.org/driver/imagen

                                                                                        • SleekEagle 12 days ago
                                                                                          How do you think it prints the images!