12 comments

  • fxtentacle 4 days ago

    While the results certainly are pretty, I don't see how learning has taken place.

    This cartoon filter still has the same issues as previous attempts, which is:

    - omitting borders that are semantically important but have a low color gradient

    - not collapsing small areas into lines

    For the first issue, there's an example image of a picnic on white background. A human cartoonist would most likely draw a full outline around the white spoon, because it is important for conveying the type of object that this is supposed to be. With this algorithm, the spoon gets partially merged with the white background and without the reference photo I would have a hard time identifying it as a spoon.

    For the second issue, kook at the photo of the Asian girl with patterned skirt. A human cartoonist would most likely observe the regular grid pattern and replace it with thin lines, thereby communicating that all of it is one same thing. This algorithm, on the other hand, treats each tile of the pattern individually, thereby making it look more like a crystal or crumbled foil.

    I personally also prefer white-box algorithms, but there's no denying that creating a cartoon requires a lot of prior knowledge about which features to retain as important and which features to abstract away. As such, I see the real challenge in somehow producing good saliency training data for millions of images. I mean ideally you would want the 5 year video stream plus eye tracking data of a baby starting to grow up...

    • thaumasiotes 4 days ago

      > For the second issue, kook at the photo of the Asian girl with patterned skirt. A human cartoonist would most likely observe the regular grid pattern and replace it with thin lines, thereby communicating that all of it is one same thing. This algorithm, on the other hand, treats each tile of the pattern individually, thereby making it look more like a crystal or crumbled foil.

      Something similar is going on with the photo of the merlion statue. The entire body is scaled, and a cartoonist would definitely represent that. But (because of lighting?), the algorithm renders the tail smooth instead of scaled.

      • posterboy 4 days ago

        Assume this were a photo in a pictureframe in the background of a bigger picture

      • xingyzt 5 days ago

        This could replace all of the WikiHow illustrators that trace from stock photos, and muddle the waters even more for their copyright status.

        https://onezero.medium.com/we-finally-figured-out-who-makes-...

        • enriquto 4 days ago

          It's probably much harder to arrange people and objects to reproduce such a scene than to make a basic wikihow-style drawing.

          • cheerlessbog 4 days ago

            Thank you for that very amusing article.

          • taneq 5 days ago

            If I understand this right, they've built a 'cartoonify' filter to convert real-world images into cartoon format, and then trained a neural net based on these image pairs? If so, what does the neural net add?

            • ramblerman 4 days ago

              Kind of agree, I think if they have a good way to create the underlying dataset, a NN to go in the opposite direction might be more interesting.

            • sk0g 5 days ago

              Sorta, it breaks down the images (from anime?) into three representations - surface, structure, and details, and also extracts each of those representations from generated images. Those representations are then cross-checked by the adverserial network, which improves the GAN's anime-esque generation ability.

              • travbrack 4 days ago

                I know in the case of physics simulations the neural net can perform better than the classical algorithm. Not sure if that's the case here but just thought it was worth mentioning.

                • conradludgate 4 days ago

                  By "perform better" I assume you mean time/memory performance and not accuracy

              • xwdv 5 days ago

                I could have sworn this kind of thing already exist in some apps. Prism?

                • JKCalhoun 4 days ago

                  Yeah, looks exactly like some of the Prism filtering I've toyed with.

                • dzink 4 days ago

                  The big question: Does copyright law apply to cartoon version of copyrighted images? Transformative work can circumvent copyright law, but are you allowed to feed copyrighted images into an AI algorithm to create cartooned versions? Who owns the copyright at that point?

                  • jrockway 4 days ago

                    I don't see any reason why copyright would not apply. If you take someone else's photograph, change the white balance, and start selling copies of it, that's classic copyright infringement.

                    As for input data to models, my intuition is that they would be tainted by the copyright of the input images. It's just that nobody has a bot for scanning AI models for their photographs, so you don't see a lot of litigation or DMCA takedown requests here. It's easy when someone just uploads your photo to their website. It's hard when the photo contributes some weights to a neural network.

                    My main takeaway is that copyright is very imperfect. It doesn't allow for any unsolicited enhancements of someone else's work.

                    • ximeng 4 days ago

                      You'll get further issues once Getty Images automatically generates copyrighted images covering as much of the latent space of interesting images as possible.

                      • rcxdude 4 days ago

                        In principle by the way copyright works this isn't really an issue: if an image is generated independently then copyright does not apply. However, in the event that you wind up with something similar, it becomes something which you would need to prove (and people have lost cases because they could not). On the other hand, such large-scale generation of images will likely be treated differently by the courts than other means of production.

                      • rcxdude 4 days ago

                        Indeed, people often seem to assume that the output of GANs are free of the copyright of the training data, but this has not been tested in court and I get the impression the legal opinion leans towards that it does, which makes the copyright status of most GANs (and in fact most neural nets) a pretty huge mess.

                      • elicash 4 days ago

                        It's hard to know what would have happened if Shepard Fairey hadn't gotten caught destroying documents, and we don't know the specifics of all of the settlement, but I think it's fair to say he lost that case.

                        https://en.wikipedia.org/wiki/Barack_Obama_%22Hope%22_poster

                      • ashleyn 5 days ago

                        Be great to see this in a fragment shader someday.

                        • thih9 4 days ago

                          I really like the cartoonized photos of detailed landscapes.

                          I think human faces and bokeh could use some improvement.

                          The former seems especially tricky; showing or hiding certain lines might change the perceived emotion and result in a different image.

                          • patel011393 5 days ago

                            To a certain extent, PowerPoint's "Artistic Features" option under "Picture Format" allows for similar effects. The paintbrush options is like this cartoon style. If I had the time, I'd definitely choose this cartoon program, as PowerPoint's effect is not as clean.

                            • amelius 4 days ago

                              I didn't even notice these pictures were "cartoonized" until I zoomed in on my smartphone ...

                              If you make an imaging effect, please make sure it works at all scales. And I'm also looking at you bokeh-folks ;)

                              • ggm 5 days ago

                                Rotoscope?

                                • gonzo41 5 days ago

                                  yep, this seems exactly the same. It's also fitting the Keanu Reeves is in one of the test images. Scanner darkly was a great film, time for a rewatch

                                  • jcims 4 days ago

                                    Automated rotoscope.

                                  • wnkrshm 4 days ago

                                    Those images that don't already work because of their colors and composition have the charme of traced photos.

                                    • talaketu 5 days ago

                                      The terms "White-box model" and "Black-box model" mentioned in the paper seem to be standard terms from the ML literature, though I didn't understand precisely what they meant here. I know the metaphor: As we can see inside a white box but not inside a black box, so we can observe the inner workings of a white box model. Similar terminology is used for other domains such as system design and testing.

                                      The conclusion here, broadly, is that white-box is better than black-box for this application.

                                      Is there a modern terminology that avoids concerns about racial bias in language?

                                      • colordrops 5 days ago

                                        You're finding racial issues where there are none. The master-slave terminology used in technology comes from the human activity hence the sensitivity around usage. The terms blackbox and whitebox never had anything to do with human usage or racial meaning. Let's not create history that doesn't exist and go overboard. They are just colors.

                                        • darkwater 4 days ago

                                          I do find a racial bias in the photos used to showcase the work though: not a single black person in there, not even in the famous ones.

                                          • mthoms 4 days ago

                                            >I do find a racial bias in the photos used to showcase the work though

                                            I'm as liberal as they come but — is this the kind of thing we're going to be doing now? Really?

                                            Anyways, you didn't look hard enough. The forth person on the page is African American (the first three appear to be: one Asian person and two Indigenous persons).

                                        • gitgud 5 days ago

                                          > "The conclusion here, broadly, is that white-box is better than black-box for this application."

                                          Well I think this is incorrect. Generally speaking, a black-box exposes an interface that you can use, and a white-box is something you can internally modify.

                                          One is not necessarily better than the other. Using a white-box approach can create tight-coupling between components in a system as you could be relying on internal mechanisms, whereas a black-box approach enforces boundaries in your system which is generally good. Also, it's often better to test systems with a black-box mentality to ensure security, resilience etc...

                                          It's not inherently representing good/bad, however modern terms like, opaque/transparent might be more accurate and less controversial I suppose.

                                          • onetom 4 days ago

                                            ...

                                            Between day and night, between black and white

                                            There is more, there is more than gray

                                            Between the question and the answer

                                            There's the silence of the sea

                                            Between the cradle and the grave

                                            There is the someone that is me

                                            Between yesterday and tomorrow

                                            There is more, there is more than a day

                                            ...

                                            https://www.streetdirectory.com/lyricadvisor/song/ceeoup/bet...

                                            • PaulBGD_ 5 days ago

                                              I'm curious how this is racial? Light is white, a lack of light is black.

                                              • core-questions 4 days ago

                                                I guess you've missed the last few weeks in virtue signalling

                                                • Normille 4 days ago

                                                  To avoid accusations of racism, use "double-plus-un-white" instead of "black"

                                              • jcims 4 days ago

                                                The 'box' is generally considered to be a system that completes a process on input to generate output. White and black are references to the state of illumination of the inner workings of the machine.

                                                White box is not better, it just defeats the mystery/freedom of the abstraction.

                                                That said, given that we describe races as white and black, I'm happy to eventually pick up a new convention if it makes folks uncomfortable. I say eventually because the current trend is to consider most of these shallow accommodations as 'performative', and I have no interest in being involved in any action that could be construed as being done for social credit where there is little or no actual value. If i felt or had evidence that it actually did help it would be a different story.

                                                • xyzzy_plugh 5 days ago

                                                  I would probably use transparent and opaque.

                                                  • ben_w 4 days ago

                                                    I was about to suggest “open box” and “sealed box”.

                                                    The terms “black box” and “white box” just don’t do a very good job of illuminating the concepts they represent to people new to the field — white doesn’t mean transparent.

                                                  • redis_mlc 5 days ago

                                                    > seem to be standard terms from the ML literature

                                                    No, those terms greatly predate any kind of ML.

                                                    Another term is crystal box.