FastPhotoStyle from Nvidia

(github.com)

231 points | by scraft 2250 days ago

21 comments

dbranes 2249 days ago
I'm probably missing something obvious here, maybe someone can explain the following to me.
- Their approach is a composition of 2 steps, what they call "stylization" and "smoothing".
- Top left of 2nd page they claim: "Both of the steps have closed-form solutions"
- Equations 5 is the closed form solution for the "smoothing" step.
My question: Where's the closed-form solution for the stylization step that they're claiming?
Are they calling equation 3 a closed-form expression? In this case the title and the claim in the introduction are rather misleadinng, because computing 3 requires you to train autoencoders.
[-]
- saurik 2249 days ago
  You don't train it for every image; in this way, a neural network often is a "closed-form solution": it provides you an equation, admittedly a very convoluted one, which can be used to obtain its solution, admittedly usually an approximation, in a finite amount of time. The normal solution to this problem (according to the paper) is an iterative technique "to solve an optimization problem of matching the Gram matrices of deep features extracted from the content and style photos", whereas this one is simply two passes: stylization and smoothing.
  [-]
  - dbranes 2249 days ago
    Not sure if I understand, don't every neural network ever produce some approximation in finite time? In what sense is this approach "closed-form"?
    [-]
    - nmca 2249 days ago
      Previous stylisation was slow because it needed to SGD optimisation for each image to be stylised. This uses a NN trained once. When you've trained a NN it is precisely a closed form solution, in the style y = max(0, 3x + 4). However they are normally a little longer to wrote down :P
      [-]
      - dbranes 2249 days ago
        Ah okay right this is the answer. Previous approaches [1] are deep generative models that you have to optimize for each input, whereas here you run just a forward evaluation on a model that you've trained beforehand.
        I would still argue the term closed-form is misleading here, because:
        - Even during training at any given time you can read off a "closed-form expression" of the neural network of this type, so closed-form in this broad sense really doesn't mean much. Furthermore any result of any numerical computation ever are also closed-form solutions according to this, on the grounds that they result from a computation that completed in finite number of steps. So really whenever you ask a grad student to run some numerical simulation expect them to come back saying "Hey I found a closed-form expression!"
        - The reason the above is absurd is that these trained NN's aren't really solutions to the optimization problem, but approximations. So this is really saying I have a problem, I don't know how to solve it but I can produce a infinite sequence of approximations. Now I'm gonna truncate this sequence of approximations, and call this a closed form solution.
        The analogy in highschool math would be computing an infinite sum that doesn't converge, but now let's instead just add to some large N, and call this a closed-form solution.
        [1] e.g. https://arxiv.org/pdf/1508.06576.pdf
        [-]
        nmca 2249 days ago
        Actually, I agree with you. Initially you seemed to object to the term "closed form"; this now highlights the more pertinent point - these models are 100% closed form, but 0% "solution" in the formal sense.
    - westoncb 2249 days ago
      Someone correct me if I'm wrong, but I believe this refers to the fact that it can be expressed in terms of certain simple mathematical operations like addition, subtraction, multiplication, powers, roots etc.—and as a consequence, the execution is very efficient. My understanding is that 'closed form' solution is essentially something that resembles a polynomial (again, accepting corrections!).
      [-]
      - IanCal 2249 days ago
        Closed form just means you can do it in a finite number of operations. So just "run X" rather than the previous versions of this kind of thing which are "repeat X until measure Y is lower than the limit I care about". (my basic understanding)
        [-]
        westoncb 2249 days ago
        I checked the Wikipedia article, and the sorts of operations involved do appear to be a part of the definition: https://en.wikipedia.org/wiki/Closed-form_expression —though it sounds like it's a somewhat loosely defined term.
  - grenoire 2249 days ago
    I really don't think that it's fair to call neural networks closed-form solutions. The term immediately makes me assume that it enabled you to bypass the training stage altogether.
    [-]
    - IanCal 2249 days ago
      Running a trained net is, if you only have to do a single iteration. It's a complex formula, but it is a closed form one.
scribu 2249 days ago
Notice that all of the examples ilustrated in the paper contain similar scenes. The content image is a building, while the style image is also a building. Or an image of trees is styled using another image of trees.
But how well does it fare when you give it an image of a house and an image of something completely different, like a dog or a slipper?
[-]
- amelius 2249 days ago
  Download the code, run it, and let us know!
- JackFr 2249 days ago
  What would you expect the outcome to be?
  What is the correct answer to a question that's not well-formed?
  [-]
  - semi-extrinsic 2249 days ago
    The interesting question, then, is how far off can this be and still work? Is the limit "reasonable", or is there room for improvement of the algorithm?
    E.g. I think most humans would say taking this content picture:
    https://wallpapershome.com/images/pages/pic_hs/10150.jpg
    and styling it with this picture:
    https://c2.staticflickr.com/4/3499/3876547311_c2e32759d9_z.j...
    is a pretty well-posed operation. How does that look using this algorithm?
    [-]
    - IanCal 2249 days ago
      Your first link just redirects to their homepage for me, can you explain which picture it was?
      [-]
      - yorwba 2249 days ago
        It shows a red crab on a beach in front of the bright blue ocean with a blue sky and white clouds.
        I guess transfer of the wooden house amidst yellow fields with a reddening sky might lead to a wooden crab on a yellow field in front of a reddish-yellow ocean with red sky and clouds, or something.
        [-]
        yorwba 2249 days ago
        It actually looks better than expected: https://imgur.com/a/5BjvC
        [-]
        IanCal 2249 days ago
        Looks nice!
        Did you have to do anything extra to get it working? I've set things up according to the documentation (I think), but I get dimension size errors when running it.
        [-]
        yorwba 2249 days ago
        Haha, yes, I had to rewrite their code a bit. All the .unsqueeze(1).expand_as(...) in photo_wct.py need to be replaced by just .expand_as(...) and the return value of __feature_wct needs to be wrapped in torch.autograd.Variable.
        I'm going to submit a PR, but it took me a bit of experimentation to fix these errors, so the code is a bit messier than I'd like.
        [-]
        IanCal 2249 days ago
        Ahh that looks like the error I was hitting, thanks. I might try replacing the bits as well, though I just upgraded pytorch from 0.1.12 to 0.3 and it became much slower (I killed it after 5-6 minutes of setup).
        [-]
        yorwba 2249 days ago
        My fork is here: https://github.com/Yorwba/FastPhotoStyle
        I was using the pytorch 0.1.12 installed with conda (following their USAGE.md) and it took ~30s total for the transfer.
        [-]
        IanCal 2249 days ago
        Much appreciated thanks!
        For some reason it's taking me about 4-5 minutes for the transfer, but the code now runs and the rest of the runtime is only a few seconds.
        semi-extrinsic 2249 days ago
        Wow, thats pretty good! So this thing can do fairly well on complex transfers.
ttoinou 2249 days ago
Thoses interested in that technology : I made two videos 18 months ago, No optical flow and youtube compression kills everything but still decent if watched in 4K on a big screen :)
https://www.youtube.com/watch?v=2YRVt80g2Ek
https://www.youtube.com/watch?v=i69cBYI6f-w
TD-Linux 2250 days ago
I'm going to be that person - why a non-OSI approved license? Given that it's CUDA-specific, I'd expect NVIDIA to want people to use it.
[-]
- dingo_bat 2249 days ago
  > Licensed under the CC BY-NC-SA 4.0 license
  Seems fine to me. If you want to develop something commercial you'd roll your own anyway. Nothing else is restricted by this license.
  [-]
  - daeken 2249 days ago
    Consider artists. There's a tremendous potential in using technology like this in art, and preventing someone from selling their works will often put them off of using it at all.
    [-]
    - mfgmfg 2249 days ago
      What does the license of the product have to do with the output of the product? You can use GIMP and GCC commercially, for example and libraries used with GCC often have runtime exemptions for their output
      [-]
      - daeken 2249 days ago
        Because this tool is licensed non-commercial. Using it for art that you sell would be a commercial use, and a violation of the license.
        [-]
        andybak 2249 days ago
        Hmmmm. Does the licence of the tool affect the output from the tool? Photoshop is propriety but Adobe doesn't have to explicitly grant me rights to the work I create with it.
        [-]
        bb88 2249 days ago
        Usually no, unless say, the tool put some part of itself in the output.
        The license of GCC doesn't affect the license of your binaries.
        The license of python doesn't affect the license of your software.
        etc.
        contravariant 2249 days ago
        You only need a license for the copyright though, in the worst case you waive your right to distribute your derivative code if it has been used for commercial applications (which would be a weird interpretation, but I can't find a precise explanation what the 'non-commercial' license covers).
    - pjmlp 2249 days ago
      Contrary to modern software developers, artists are used to the notion that tools developed by other people are worthy of some kind of compensation, even if found on some flea market.
      [-]
      - daeken 2249 days ago
        Certainly, but do you see where you can buy a commercial license for this? I don't.
SeanBoocock 2249 days ago
I wonder whether this could be applied to a real-time scenario. Modern Real-time renderers for games will often have a tone mapping step that let artists color grade the final output. The paper cites a 11+ second runtime for 1K inputs, which is orders of magnitude off what it would need to be, but perhaps a simpler version run on the GPU is feasible.
[-]
- josephpmay 2249 days ago
  Notice that the research was done by Nvidia
  [-]
  - piracykills 2249 days ago
    Nvidia is pretty big in the machine learning space in general, not game specific these days - GPUs are pretty general purpose highly multithreaded number crunchers and Nvidia's been making moves further in this direction with their own CUDA-based training tools, the DGX-1, the Jetson and other products.
supermdguy 2250 days ago
Paper this is based on: https://arxiv.org/pdf/1802.06474
It's really great that NVIDIA is releasing code for their deep learning research.
milanfar 2244 days ago
(a) This problem is long known as color/contrast transfer, and it was solved > 10yrs ago, (b) the results shown in this paper aren't objectively or subjectively better/more photo-realistic than Kokaram etc.'s work; and (c) I question whether this task even requires deep learning at all.
https://francois.pitie.net/colour/
milanfar 2244 days ago
This problem is long known as color/contrast transfer, and it was solved more than 10yrs ago. The results shown aren't objectively or subjectively better/more photo-realistic than Kokaram etc.'s work which is far simpler. I question whether this task even requires deep learning at all.
https://francois.pitie.net/colour/
p1necone 2249 days ago
These very low res example images aren't particularly useful for judging how good this actually is.
grondilu 2249 days ago
Only tangentially related, but has anyone ever tried to apply style-transfer on human faces for artificial aging or rejuvenation? Like for the movie industry or something?
[-]
- robbomacrae 2249 days ago
  FaceApp does this (inc gender swapping) and its quite fun for an hour or so of messing around.
dingo_bat 2249 days ago
The examples seem to be too good to be true. I don't have a GPU lying around so I cannot try it unfortunately.
[-]
- medhir 2249 days ago
  paperspace provides pretty easy setup cloud GPUs for ~$0.40/hr if that's of interest :)
andybak 2249 days ago
The only machines with decent GPUs in them I have access to run Windows and Windows Subsystem for Linux doesn't allow GPU access. Other than dual-booting or running Linux in VirtualBox - is there any way I can try this?
[-]
- exDM69 2249 days ago
  None of the dependencies seemed to be Linux-specific at a quick glance. You might be able to install all that on Windows (not sure how pleasant experince it'll be).
  Virtualbox won't help you, because you can't give proper access to the GPU for the VM guest unless you set up PCI-e passthrough and dedicate your whole GPU to the VM guest (and use your integrated graphics for the host). Not sure if this is even possible if Windows is the host.
  If you don't feel like setting up a Linux install on your box, you could try some of the GPU cloud services.
  [-]
  - ATsch 2249 days ago
    Also I am told the proprietary nvidia drivers have a software lock that prevents you from using GPU passthrough unless you buy certain more expensive models.
    [-]
    - exDM69 2249 days ago
      With PCI-e passthru using intel_iommu, you can set this up with a gaming GPU. The driver can't tell that it's not running on bare metal.
      This requires dedicating the whole GPU and the PCI-e slot to the virtual machine guest.
      For more flexible virtualization setups, you need the professional quality cards.
    - mtreis86 2249 days ago
      There is a work around. A number of GeForce cards gave the exact same chipset as a Quadro card but with a resistor pulling down an external pin. That resistor can be changed to make the card identify as a Quadro.
      http://www.eevblog.com/forum/chat/hacking-nvidia-cards-into-...
      Apparently this can also be done from software
      http://archive.techarp.com/showarticleefc1.html
      [-]
      - exDM69 2249 days ago
        This is just spoofing the PCI VID:PID numbers to the driver and relying on driver bugs(?) to function. You could do the same with a few lines of kernel hacks far easier than soldering. It does not enable any features that are fused off in the hardware. This setup is not reliable.
        Also, these posts are from 2008 and 2013, 5 and 10 years old. These hacks probably don't work any more.
- executesorder66 2249 days ago
  OT, but why do those machines need to run Windows? Why can't you install Linux?
  [-]
  - andybak 2249 days ago
    They are dev boxes for Windows VR apps. I'd like to play with this out of curiosity. It's not worth the hassle of a dual boot for that.
- flipp3r 2249 days ago
  The user manual literally has a setup for Ubuntu, using CUDA & cupy.
  [-]
  - andybak 2249 days ago
    I'm not sure I understand how that helps me.
- poppingtonic 2249 days ago
  Anaconda can probably help
- jamespo 2249 days ago
  No
JeffreyKaine 2250 days ago
Is it really all that hard to have a demo site for these things? It would be a lot of fun to play with crossing pictures. I'm guessing it's because using a graphics card in the browser isn't good enough yet?
[-]
- volker48 2249 days ago
  I'm not sure how fast their FastPhotoStyle approach is, but a TensorFlow implementation of the original neural style transfer can take upwards of 20 minutes to create the final stylized image. If someone had the pre-trained model and neural net code in JS to read it and you could do it all client side then it would be possible, but still very slow.
  [-]
  - ehsankia 2249 days ago
    The tech has come a long long way since the original, even before this FastPhotoStyle project.
    A few months ago, there was TensorFire [0] that was able to do it in the browser. Quick google also gives other results [1]. There's also many apps that can do it in seconds. Speed definitely isn't an issue anymore, but getting it to work in browser can be tricky.
    [0] https://tenso.rs/demos/fast-neural-style/
    [1] https://reiinakano.github.io/fast-style-transfer-deeplearnjs...
krn1p4n1c 2249 days ago
That top left style will be perfect for the family xmas photo.
limaoscarjuliet 2249 days ago
Is there some research doing the same in voice area? - Fix/change accent, - Improve person's voice, - Perhaps even make one sound like another.
[-]
- STRiDEX 2249 days ago
  I saw a clip from adobe a while ago
  https://www.youtube.com/watch?v=I3l4XLZ59iw
- arto 2249 days ago
  https://news.ycombinator.com/item?id=16426585
- etaioinshrdlu 2249 days ago
  https://lyrebird.ai/ -- they do the last thing. But they all seem related.
- arbie 2249 days ago
  Yes, Adobe's VoCo, is one example.
  Images and speech require different architectures (CNNs vs RNNs).
abledon 2249 days ago
Has anyone had luck using this for their tinder profile?
[-]
- skocznymroczny 2249 days ago
  Unfortunately, it only transfers style, not attractiveness
  [-]
jczhang 2249 days ago
I assume you need a Nvidia card for this? Also has anyone tested it and seen how long it takes to render?
ivanceras 2249 days ago
>Preparation 1: Setup environment and install required libraries >Python Library Dependency
> conda install pytorch torchvision cuda90 -y -c pytorch
What is conda? How do i install it on ubuntu 16.014?
[-]
- dagw 2249 days ago
  Conda is basically an alternative to pip and virtualenv, used by the Anaconda python distribution that's really popular in the data science and machine learning community. The easiest way to get it is to install miniconda: https://conda.io/docs/user-guide/install/linux.html
- sungam 2249 days ago
  Conda is the anaconda python distribution widely used for machine learning and numerical computing.
ttoinou 2250 days ago
Is it faster than previous implementation ?
[-]
- koverda 2249 days ago
  Looks like it's a lot faster. They compare their approach to the Luan et al. approach, and for a 1024x512 image, they are about 30-60x faster. They also seem to be more accurate with better results.
  [-]
  - ttoinou 2249 days ago
    Oooooh no I'm going to get back on nerding this 100 % of my time :'(
dharma1 2249 days ago
what's the max resolution with this?
simonhamp 2250 days ago
What witchcraft is this?
[-]
- sannee 2249 days ago
  Of course there is a relevant XKCD: https://xkcd.com/1838/