- Fanfiction: https://www.reddit.com/r/harrypotterfanfiction/comments/1bn6... (includes some sounds from outside Eggnog)
- Comedy: https://x.com/jitsvm/status/1771609353725919316?s=20
- Post-apocalyptic vibes: https://x.com/saucebook/status/1771212617601659279?s=20
We got into making funny AI videos over the last year, but felt annoyed that the characters always looked different in each scene. It made it harder to make cool videos and for our friends to understand the plot.
Diffusion models, like those that make AI videos, start with random noise, then add detail. So the little things that make a character recognizable will almost always come out looking different across generations, no matter how many tricks you add into the prompt. For instance, if you want your wizard to have a red hat, it might be crooked in one generation and straight in the next.
Eggnog allows users to make consistent characters by using Low-Rank Adaption (LoRA). LoRA takes in a set of images of a character in different scenarios and uses those images to teach the model that character as a concept. We do this by taking a single prompt that a user writes for a character (e.g., an ancient Greek soldier with dark hair and bushy beard) and turning it into a training set of images of the character in different poses, shot from different angles. Once the character is trained into the model, the user can then invoke that character concept in the prompt and get consistent generations about 80% of the time.
There is still a lot of room to make Eggnog generations more consistent and controllable. Generations sometimes come out with the wrong gender, missing key details from the costume, or fail in a long tail of other ways. We also often struggle to control the character’s exact body movement. We’re planning to address these cases with more optimization of the prompt that invokes the character concept and by using new open source video models that bake in 3D representations of humans.
The other fun thing about making characters with Eggnog is that you can share them. We already made one San Francisco “American Psycho” video that got over 100k views on Twitter (https://x.com/jitsvm/status/1766987382916894966?s=20). Then we expanded on the SF universe by making another video with the same character and a new friend for him (https://x.com/SamuelMGPlank/status/1767405784986718406?s=20). Eventually, you’ll be able to create and remix all the components of a good video—the characters, the costumes, the sets, and the sounds will all be part of a library of assets built up by the Eggnog community.
Eggnog is free to use, and you can try it out in the playground: https://www.eggnog.ai/playground. If you’re looking for some inspiration, you can try using the character “herm” waving a glowing wand or the character “lep” walking down a Dublin street. We’ll make money eventually by showing ads to viewers who come to Eggnog to watch AI videos.
We’re really excited to see all the fun videos and characters people are making with Eggnog, and looking forward to hearing what you all think!
For more explanation: I've been playing around with stable diffusion on my laptop recently; I have a gtx 4070 with 8GB dedicated VRAM so it's not nothing.
The main problem I have is that it takes a lot of iteration on a prompt to get what I want, at lower resolution and sampling steps, before I know that I'll get roughly what I want.
I tried making a character in Eggnog, and before I could be sure what I was getting, it told me it'd take 15-20 minutes to be ready. I worry that this will just make me wait a long time for a character that isn't what I want, and starting again too many times will put me off.
The iteration and feedback loop needs to be tighter in my opinion, or people will get unsatisfactory results and be unwilling to go back and fine tune.
Something is wrong, it looks like it is just using multiple layers of images/video and cutting back and forth to predetermined combinations of layers...
I would polish the idea a bit more before publishing it or people may think, (quoting your Reddit link) "Looks like stinky doo doo. Don’t quit your day job"
> We think you can make fun stuff ...
To be brutally honest, don't say "We think you can". It does not matter if you think people will like it. Do people like it? If the average Joe sits down and plays a with the model, will they have a good time?
I'm not trying to be abrasive or rude here, just honest.
To do that I figure I would need about 5-6 minutes of video per chapter (perhaps less with some looping), and the ability to DL the video or otherwise export it (assuming I can upload my other media into your Composite tool) to put on YT or the like. And would probably want to lose the watermark in that case as well.
If so, then what is the plan (if any) to monetize for creators?
Good luck with Eggnog! I think AI generated media is really cool.
Eventually, you should probably mention this on your website: a broad range of starter projects and high quality assets, etc.
Cache invalidation, naming things, and off by one errors.
Given all that, I believe that an adequate compensation for unethical sourcing in AI - the absolute best we can do for humanity - is this:
1. We admit that the works are unethically sourced, which means they could be banned the future, and may require switching to an "ethical piracy" distribution model
2. We ensure that the models produced this way belong to the entirety of the humanity by default, by distributing them for free under copy left licenses like GPL3
3. We abstain from monetizing or otherwise drawing revenue or profit from unethically sourced models
4. We assemble zero cost service models for the AI, drawing on volunteers to publicly pool compute
Case Study: Whisper
I follow these guides in my own work on OpenAI's Whisper. Whisper can do much good for humanity, as it allows them to freely transcribe and translate speech, meeting a core human communication need.
But whisper needs many improvements before it is a freely available service making an impact in millions of people's lives, for zero cost. To that end I'm building extensions that let people pool CPUs and other cheap hardware to put together independent and free transcription services based on whisper. I'm building rapid customization models to help people with accents. And I'm building real-time feedback and correction models to enhance the accuracy of the naive model
Yes, even here we are faced with the possibility of future bans, especially given the unethically sourced nature of whispers training set. That means I have chosen to abstain from ever collecting revenue through my whisper work, and consider my whisper work to be my contribution towards the legacy of all of humanity.
I encourage you to consider the upsides of this form of engagement.
Objection: How Do I Feed Myself?
Yes, this model requires you to have a fully ethical alternative business that you run. I support myself on about 25k-60k of CAD design revenue through my semi-automated CAD reverse engineering service, which is based on fully ethical automation models that I built by hand and calibrated ethically on my own work stream.
Objection: This does not protect us from bans.
No, adopting this model does not prevent the law from banning models like yours in the future. It can even enable legal audits of your code, and make you seen as a "flight risk" - someone likely to continue illegally distributing models for ethical reasons, even after they have been banned for legal reasons. I have no good answer here yet.
Closing Statement: Choose the ethical high ground. It is Based, and that will guide you.
I encourage you to see this strategy as "Ethically Based". I define Ethical Basedness as a form of true ethical high ground that can guide you towards the best contribution you can make towards the shared knowledge of humanity.
Given the extreme ethical quandaries of attempting to monetize a proprietary service on top of unethical AI, you stand to lose that higher ethical authority, and be cut off from its guidance. But you have an opportunity, now and in the future, to pick it back up.
Good luck and keep up the good work. I hope I have moved your views, rather than merely agitating your feelings. I hope you will embrace the need for a free and openly shared AI legacy for all humanity.
Overcome the ethical quandaries of AI sourcing by giving it away.
Provide freely for the core human needs of creativity and visual story telling.
Consider it.