Show HN: Generate webpage summary images with DALL-E mini

(research.google.com)

75 points | by txtai 662 days ago

9 comments

  • danso 662 days ago
    Meta: I know that official Google Research content (e.g. their AI blog) redirect to the `research.google` domain, but maybe there should be a special rule for the `colab.research.google.com` subdomain, so that it's more obvious that submissions are coming from a Colab notebook.

    (don't get me wrong, it would be very funny if Google started using AI to generate summary images for webpages, given how haphazard their Featured Snippets text extraction can be)

  • txtai 662 days ago
    This link describes a workflow to create webpage summary images with DALL-E mini. The workflow extracts text at a specified article, builds a summary and then generates an image for the summary text.
    • TOMDM 662 days ago
      This comment describes a workflow to create Hacker News summary comments. The workflow extracts text at a specified article, builds a summary and then posts as a comment the summary text.
      • Tao3300 662 days ago
        Smells snarky. OP isn't allowed to summarize?
        • TOMDM 662 days ago
          I'm joking about how the technique seems to relate to the recent trend in generated text in general and how the comment is somewhat self referential in how it looks like such generated text while describing a use for it.

          The OP's comment is useful, and while I probably came across snarky I don't think I came across in a manner where I diminish the value of the comment or the content it's describing.

  • Tao3300 662 days ago
    That picture of "artificial intelligence" it made is hilariously derpy. It looks like frame -1 on the galaxy brain meme.
    • txtai 662 days ago
      Agree, that one stood out to me too. Though they are all fascinating in a way.

      For example, if you read the Wikipedia article on the War of 1812 and then look at the image generated, it makes sense why it chose frigates at sea. If you've been to the Epcot, the image looks similar to the world showcase around Germany/Italy, but it's original in it's own way.

  • trebligdivad 662 days ago
    Any cosmologists who can say if the generated 'galaxy' image follows the mass distribution rules?
  • jtwaleson 662 days ago
    When re-running the notebook I get the same results with every execution. I'd like to see some alternatives for the same text. Any pointers on how to achieve that?
    • txtai 662 days ago
      You can change the seed. If you search the notebook for 1024, you can replace that argument with a different seed to get different results.
      • txtai 662 days ago
        You can also change the prefix (or remove it) to change how the drawings are created. Right now it's set to "Illustration of ". You can try different things like "Sketch of ", "Oil painting of ", "3d animation of " and "Mosaic of ".
        • txtai 662 days ago
          You can also change the min/max summary length, which is defaulted to 0 and 60 respectively.
  • bredren 661 days ago
    I realize this point is to use AI to generate something new—-but you could also submit these keywords to the Unsplashes API and get some relevant, useful images.
    • txtai 661 days ago
      That is a good point if you're looking for an existing image. This post is looking towards DALL-E mini as a start in generating descriptive images for any text.
  • moneywoes 662 days ago
    What’s the cost to use this on a larger scale?
    • txtai 662 days ago
      Everything in the notebook is open-source. It mainly uses the following projects:

      https://github.com/neuml/txtai https://github.com/kuprel/min-dalle

      txtai workflows can be containerized and run as a cloud serverless function - https://neuml.github.io/txtai/cloud/

      • wutbrodo 662 days ago
        I assume he meant the computational cost.
        • txtai 662 days ago
          OK, I misunderstood there. Running the code, which generates 15 images, on a standard GPU Colab environment takes about 2 minutes. It may be possible to submit a single batch of text summaries to the DALL-E mini model, which would improve performance a good deal.
    • fswd 662 days ago
      We looked into it. At current TPU/GPU prices it's about $.60 to $.30 an image.
      • txtai 662 days ago
        Was this a cost just for the DALL-E piece or the full text extraction, summarization and image generation workflow?
        • KaoruAoiShiho 662 days ago
          DALL-E mini not DALL-E, just making sure everyone has the right expectations.
        • fswd 662 days ago
          Just DALL-E
          • txtai 662 days ago
            Thanks, good to know.
  • kache_ 662 days ago
    I love collab & I love all these cool little toy models coming out :)
  • liuxiaopai 661 days ago
    cool, I like it