A promenade of PyTorch

(goldsborough.me)

150 points | by astdb 2246 days ago

5 comments

  • sytelus 2246 days ago
    To be fair static graphs has its own advantages, specifically figuring out how to distribute computation over heterogeneous units. I think TF will still have upper hand when distributing computation over 10K CPUs/GPus but for everything else it’s a major pain to tolerate. The major issue with TF is it’s dissopointing API design- especially Estimator APIs and in many cases even lower level APIs. I wants to say WTF like at every line. Nothing is intuitive in that land. TF has jumped the shark here. In 1.5, they replaced documentation from old MNIST quick start to dumbed down Iris using Estimator API. I can only say WTF? Hopefully this disaster would divert enough of new comers to better things like PyCharm or even CNTK.
    • singhrac 2246 days ago
      I wholeheartedly agree about the TF API becoming very cluttered. The entire way contrib is handled seems confusing.

      While I think TF probably has a better distributed story, it's still not great (several months of experience getting this to work), and very very few people can justify distributing computation across 10K CPUs/GPUs - in my experience, it turns out most async SGD algorithms don't really work very well.

      One place I see this actually working is using SVI/EM and storing local parameters across the network - something maybe 1 GPU can't handle for extremely large models.

    • cjalmeida 2246 days ago
      The Estimator and Dataset API docs is so bad I'm planning on starting a blog with posts on running MNIST and a few other common datasets so others don't have to suffer.
      • mrry 2246 days ago
        [Disclosure: I designed and wrote most of the docs for TensorFlow’s Dataset API.]

        I’m sorry to hear that you’ve not had a pleasant experience with tf.data. One of the doc-related criticisms we’ve heard is that they aim for broad coverage, rather than being examples you can drop in to your project and run straight away. We’re trying to address that with more tutorials and blog posts, and it’d be great if you started a blog on that topic to help out the community!

        If there are other areas where we could improve, I’d be delighted to hear suggestions (and accept PRs).

        • sytelus 2245 days ago
          No offense but TensorFlow's Dataset API documentation also sucks. This combined with bad API design (that actually can be used as case study for bad design in classrooms) is disaster in making. For example, shuffle() takes a mysterious argument. Why? It's not to be found in docs except that it should be more than items in dataset. Why can't shuffle() just be shuffle() and why do I now always have to remember passing correct parameter for rest of my life? Whatever. I still don't get what exactly repeat() does. Does it rewinds back to start when you reach past end? Why you need it? Why not just stick to epochs? Why make things complicated with steps vs epochs anyway? Docs gives zero clue. Then there are whole bunch of mysteriously named unexplained methods like make_one_shot_iterator() or from_tensor_slices(). Why is make_one_shot_iterator() not just iterator()? Why do I have to rebuild dataset using from_tensor_slices()? The docs are designed with a point of view "take all these code calling mysteriously designed APIs, copy-paste and don't bother too much about understanding what those APIs really do". It really sucks.
          • cjalmeida 2245 days ago
            IMO, shuffle is something they did really fine. Unlike PyTorch datasets, TF allows streaming unbounded data. For something like this work with shuffle, it must cache some data before passing it down the pipeline. You specify how much in the argument.

            This may not seem useful this conventional training, where you usually work with a fixed amount of samples you know beforehand. But there may be cases where this is not true (for instance, in some special cases of augmentation) - the streaming part is useful but then you must use this caching trick.

            But I agree API naming is not stellar, or at least should come with better documentation.

        • aub3bhat 2245 days ago
          I think tf.data is amazing, its far far better than the previous queue and string_input_producer style approach.

          More than documentation, I would argue that TF especially tf.data lacks a tracing tool that would let a user quickly debug how data is being transformed and if there are any obvious ways to speed up. E.g. image_load -> cast -> resize vs image_load -> resize -> cast had different behavior and lead to hard to identify bugs. For tf.data prefetch which ends up being key to improving speed yet its is not documented, the only way I actually found out about it was by reading your TF.Data presentation.

        • cjalmeida 2245 days ago
          I guess you misread me. The Dataset API is somewhat fine, much better than queues for instance. However not clear from documentation how to do more complex stuff, or how to integrate it with the rest of TF stack, specially the new Estimator API.
    • waleedka 2245 days ago
      Yes, totally agree. TF is solid and it’s still my goto tool, but the API has a very low elegance score, to put it nicely. Clearly the team has a lot of brillian engineers, but they’re missing the high level sense of design (yes, APIs do need good design as well) that puts everything together beautifully.
  • mlboss 2246 days ago
    Tensorflow Eager provides dynamic graph computation similar to PyTorch. https://research.googleblog.com/2017/10/eager-execution-impe...
    • cs702 2246 days ago
      Yes.

      Yet in my experience PyTorch is much nicer to use. Its API feels very natural and easy to extend -- it feels very Pythonic.

      TensorFlow's API, on the other hand, seems to get in my way whenever I try to do anything new that isn't already built into one of its higher-level APIs (e.g., Keras). I frequently I find myself fighting with TensorFlow's API.

      For iterative R&D/exploratory work, I find I'm more productive -- and happier -- with PyTorch than TensorFlow.

      • epberry 2245 days ago
        Agreed that pytorch feels “pythonic”. I think tensorflow doesn’t because it’s really a C++ API with an extensive set of wrappers and it shows. Pytorch feels like they started with python and added in the C extensions after the fact.
    • amelius 2245 days ago
      TensorFlow requires the user to maintain a dataflow-graph, as if the user is writing a compiler, which IMHO is silly. It's the opposite of convenience.

      What TensorFlow should do instead: do dataflow-analysis, like any modern compiler, and figure out the dataflow-graphs at compile-time.

      OR... take PyTorch's approach and use dynamic graphs. I bet there's not even a significant performance penalty associated with dynamic graphs, as the tensors are usually quite large and consume most of the computation-time anyway.

      My point: why use a tool that's founded on a wrong design-decision?

  • hokkos 2246 days ago
    Maybe someone should extract the program flow from the AST to create the compute graph, or create a restricted language that facilitate that. It seems in TF or PyTorch the syntactic sugar is infective at some point and you have to use the explicit API.
    • zitterbewegung 2246 days ago
      I think that a data flow language would be interesting . Having dataflow as a first class values would be cool. Borrowing stuff from graph computation libraries or graph rewrite systems would be useful. The idea would be that you would take the individual pieces of a dataflow that could be packaged and manipulated . To give an example a set of layers of a CNN or even a whole network like inception could be a first class object. Also, you would want the dataflows to be able to be differential. I think some category theory properties would be helpful to use in this context .
    • skierscott 2246 days ago
      Isn’t this what JITs do?
      • singhrac 2246 days ago
        There's a (very exciting) PyTorch JIT incoming sometime soon!
  • yvsong 2246 days ago
    Need an official tool to convert to CoreML.
  • stealthcat 2246 days ago
    seeing that Pytorch started as Chainer 'fork' (?) with similar API but different backend, showed us how much a BigCo can do against a startup, it's unsettling.

    Chainer autograd has been around years ago.

    • agibsonccc 2246 days ago
      As a DL framework author myself, I can say for a fact we all steal good ideas from each other. There's no conspiracy here.

      A lot of folks not actually building the tools like to assume there's some big war going on where we're trying to sabotage each other.

      In reality, we're all just scratching our own itch. This goes for pytorch, TF, as well as us.

      Yeah there's occasional public debate, but we're not out to go to "war" with each other or anything. It's the end users that make this out to be something it's not.

      Just food for thought here.

      • thanatropism 2245 days ago
        TensorFlow is hardly a cat scratch fever, it's a major strategic effort by a top 3 global tech conglomerate.
        • agibsonccc 2245 days ago
          Yes and so are the other frameworks? When I said "scratching our own itch" google is building it for their own strategic initiatives ranging from hiring to google cloud. FB uses pytorch for their research, caffe2 for deployment. CNTK is used by microsoft. Onnx was a joint effort between FB and microsoft to compete against TF's file format.

          I'm not sure what your point is. What I was attacking was this "conspiracy" that startups and these companies are somehow out to get each other. Of course there is competition and various strategic reasons folks implement their own frameworks. We ourselves implement our own framework that imports all the python frameworks and runs them in production on the JVM and big data stack. Imagine that, I do it to sell licenses.

          The "startup" the above was alluding to was PFN in Japan. FB supposedly "stole" ideas from chainer. And yes they did, they even say so as such.

          It doesn't mean there's a conspiracy, it's just smart to do. If something is working adapt it for your use case. That's all that happens across any of the major frameworks.

          I'm not sure if I undersold myself a bit, but I just want to say this is all I've been doing since 2013. My framework is well used by a great portion of the fortune 500 and all over the globe as well as part of a major foundation. It's not some toy, it's a commercial venture with millions in funding and a decent sized engineering team, open source foundation/community behind it.

          I'm more than familiar with the space and even compete with google's business model to a certain extent.I have plenty of incentive to care about these things, but I'm still calling it out for what it is. I talk to other framework authors and have nothing but good things to say about them. We're all out there just building what we need to to suit our purposes. Yes those things have incentives, but it doesn't mean there needs to be conspiracies and trash talk.

          I'll call it out again: The users are the ones who blow this stuff way out of proportion. I've seen this play out for years now.

    • apaszke 2246 days ago
      PyTorch was never a Chainer fork. The whole codebase are C libs from Lua Torch, and a bunch of Python code that was written entirely for this project. Chainer was an inspiration, but no code was ever shared between those two projects.