17 comments

  • throw6606 1308 days ago
    Careful, folks. S4TF is pretty much dead on arrival. It was pushed aggressively by Chris Lattner (for obvious reasons) but he left Google a while ago and since then most internal users lost interest. There's nothing in Swift that's inherently suitable for ML and building the ecosystem is a ton of work; without all the political pushing, it went nowhere and is close to a "semi-abandoned research project" phase.
    • MiroF 1307 days ago
      > There's nothing in Swift that's inherently suitable for ML

      The type system?

      • mkolodny 1307 days ago
        Speed, too. For PyTorch to train models and run inference quickly, your Python code gets translated to C++/CUDA. Part of the idea with S4TF is to be able to write ML code in a single, fast language.
        • socialdemocrat 1307 days ago
          You can already do that in Julia and unlike Swift that is a language with lots of scientific and machine learning libraries.

          I like Swift but looking at the code examples I got to say preparing data and setting up a model is 10x easier in Julia.

        • MiroF 1307 days ago
          I am skeptical that there are no calls to underlying C or CUDA libraries occurring. Swift doesn’t naively beat BLAS
        • dklend122 1307 days ago
          Well, it's not happening in swift yet.

          S4TF still requires either c cuda kernels or XLA. Julia on the other hand has JIT GPU codegen and its CPU codegen has been benchmarked to beat openblas

          • MiroF 1306 days ago
            > CPU codegen has been benchmarked to beat openblas

            Source on this claim?

            • ddragon 1306 days ago
              He probably means Tullio.jl (which also seems to integrate with Julia's source to source differentiation library Zygote.jl, the main competitor to Swift for Tensorflow):

              https://discourse.julialang.org/t/realistically-how-close-is...

              https://github.com/mcabbott/Tullio.jl

              Regardless if it can consistently beat Fortran/BLAS in every area, in general JIT languages have more opportunities for optimizations than AoT languages, so it's interesting to see what comes out of a language that focuses on leveraging this to get the most performance.

              • MiroF 1306 days ago
                I'm surprised re: blas - that is closer than I thought! Still a huge gap on the GPU, but still impressive.

                > in general JIT languages have more opportunities for optimizations than AoT languages

                I'm not sure I agree with this. I would say the opportunities for any non-GC mature AoT language (C++, Rust, etc.) is going to be pretty much the same, since you can just attach a JIT to most AoT langs.

                • amkkma 1306 days ago
                  The GPU gap is only if written in the high level index or loop style. There is little to no gap if done either using array abstractions (broadcast, map etc) or at a level similar to Cuda C (though with nicer Julia abstractions and syntax): https://juliagpu.org/cuda/

                  The Julialab at MIT is working on making the higher level codegen faster

                  • MiroF 1306 days ago
                    I guess that makes sense to me.. you can just automatically convert the C in BLAS to Julia and then if they're both being converted to llvm ir by clang anyways than i guess it'll be about as fast!
                    • amkkma 1305 days ago
                      That's not at all what Julia is doing. It's much more sophisticated in that it has very low level intrinsic primitives that can compose and it optimizes the IR to make it fast and then compiles it to CUDA. These all map to Julia constructs.
                • ddragon 1306 days ago
                  Sure, if you give static language a JIT they'll be able to get the advantages of having JIT, though language semantics still matter. A language built for JITs like Julia or Common Lisp have native ways of interfacing with the compiler, and programs are built without worry of exponential explosion of implementations during method monomorphization (as you'll only compile the optimal versions that you'll actually use, based on runtime information, without having to be pessimist as any overspecialization can be fixed on demand). AoT languages would probably need a compiler pragma or type similar to a dynamic boxing but for delayed monomorphization/compilation for methods when you want to avoid compiling all paths AoT (which might be a way to allow for example tensor specialization on sizes, similar to StaticArrays on Julia).
                  • MiroF 1306 days ago
                    I don't quite follow.

                    I am not too experienced with Julia, but my understanding was that it uses LLVM to jit itself. Since the LLVM jit compiler is also an API available to C++, anything that can be done in Julia can be done with jit to LLVM api in C++.

                    Then you just compile the methods that you'll actually use with LLVM right before using them.

                    • ddragon 1306 days ago
                      Sorry, you're right in that Julia is written in C/C++, so everything Julia does can be solved in those by writing a language (like Julia itself, and not unlike Tensorflow original interface) and compiling it on demand and finding a way to eval the new code and recover the results. I was talking along the ways of how to make it sort of convenient (at least viable to implent unlike the former), as an extension to the C++ compiler itself where you can just tell the compiler what stays AoT and what is JIT'd but otherwise keep the same C++ syntax.

                      Not to mention if you want to reimplement Julia's logic in C++ you'll have to develop it's sophisticated type inference, since Julia compiler is so aggressive that it will compile at once entire blocks of program (the entire program if it can) as long as it can infer what types are used downstream, which is why it can compete with AoT compiled languages (it's basicaly a "Just Ahead of Time Compiler")

                    • shele 1306 days ago
                      The crux is that which are "the methods that you'll actually use" is very difficult to answer. A lot of effort is put into this on the Julia "Package compiler" and "Compiler" projects.
    • belval 1307 days ago
      If true, it does not surprise me. While there is a lot of language war going on in the ML ecosystem, I never really heard anyone using, planning to use or waiting for swift for TensorFlow.

      It might be a nice language (never used it), but there are other contenders with more engineers/scientists support like Rust and Julia, for which the advantages were clearer.

      Finally, the whole ordeal got a very bad look from having its main proponent being the creator of Swift instead of the actual community pushing for it.

      • ksec 1307 days ago
        Precisely because Chris Latter had the idea of replacing everything from C to Javascript with Swift.

        I didn't like that idea then, I still dont like it now. But the idea in itself is very Apple (ish).

    • bartvk 1308 days ago
      > since then most internal users lost interest

      Why do you say so?

    • ddbb33 1308 days ago
      Can you backup your claims?
      • pjmlp 1308 days ago
        I guess that the fact that Tensorflow 2020 conference had zero references to it, including on the related blog posts says it all.
      • Razengan 1307 days ago
        Newly created account claims something without proof and comments asking for such seem to be getting buried.
        • MiroF 1307 days ago
          Yes, or a Googler who doesn’t want to be seen publicly attacking a Google project
          • Razengan 1305 days ago
            Yes, or someone with an agenda against Swift/TF.
      • UrSuchAGenius 1308 days ago
        This website has been cached already
    • rtorr 1308 days ago
      “Stillborn” is a pretty awful term to use for software.
      • throw6606 1308 days ago
        Mea culpa. Reworded.
      • StavrosK 1307 days ago
        Why is that? It conveys the DOA meaning pretty well.
        • MiroF 1307 days ago
          Generally, your metaphors should not rely on comparison to a pretty traumatic event that has happened to quite a few people, many of whom might be around you without you knowing.
          • unishark 1307 days ago
            DOA is a metaphor too, literally referring to people arriving at the ER too late.
          • layoutIfNeeded 1307 days ago
            Sure. I guess then you shouldn’t even use the word “dead”, as plenty of people have lost close relatives which is a pretty traumatic event.
            • MiroF 1307 days ago
              Just letting you know the reality that a lot of people view "stillborn" events as particularly traumatic. I'm not going to spend a lot of time arguing about this because it's a pretty simple point.

              It's up to you to choose what kind of person you want to be, I don't have control over what you say.

            • deadmutex 1307 days ago
              I think the difference is that the term "Dead" is used in a lot of different contexts. E.g. Battery is dead. Stillborn is not, and it mainly associated used in one very traumatic context.
          • lehi 1307 days ago
            I know anorexics for whom any mention of food can be a trigger. Should metaphors involving consumption be verboten in public discourse because they might be read by someone with an eating disorder?
      • kgwgk 1307 days ago
        Now that I think of it, does anyone know if "abort" is among the computing terms to be avoided (like master, black list or sanity check)?
        • p1esk 1307 days ago
          None of these should be avoided, as long as they are accurate terms.
        • devmunchies 1307 days ago
          Anyone who is avoiding "master, black list or sanity check" probably thinks abortion is super awesome and that the term "abort" should never be stigmatized.

          Best not to worry about such silly things and keep writing code.

          • falseprofit 1307 days ago
            I don't know of anyone who "thinks abortion is super awesome"...

            In any case, the primary meaning of 'abort' is more general, so it shouldn't be compared to the metaphorical use of 'stillborn'.

            • kgwgk 1306 days ago
              The original meaning of abort is miscarriage/miscarry, other uses are “metaphorical”.
  • bodono 1308 days ago
    I'm not sure this is really going to take off, it seems that most people who are abandoning TF are moving to Jax or pytorch. My own experience with Jax is that it is much easier to use then TF, just an all round more pleasant experience. It would be interesting to try this, but at this point I'm not really willing to learn 'yet another deep learning framework' and the extreme anti-user problems that TF had make me loath to give it another shot, even with a presumably better frontend. Moreover, I think that python is just a better all-round ML/data science language at this point. Has anyone tried both Jax and this and would be willing to give us their thoughts on strengths and weaknesses of each?
    • gas9S9zw3P9c 1307 days ago
      I'm skeptical of JAX. It feels good right now, but when the first TF beta version came out it was very much like that too - clean, simple, minimal, and just a better version of Theano. Then the "crossing the chasm" effort started and everyone at Google wanted to be part of it, making TF the big complex mess it is today. It's a great example of Conway's Law. I'm not convinced the same won't happen to JAX as it catches on.

      PyTorch has already stood the test of time and proven that its development is led by a competent team.

      • bodono 1307 days ago
        I know where you're coming from, but TF in my opinion was very user-hostile even on arrival. I can't tell you how much hair-pulling I did over tf.conds, tf.while_loops and the whole gather / scatter paradigm for simple indexing into arrays. I really think the people working on it wanted users to write TF code in a certain, particular way and made it really difficult to use it in other ways. Just thinking back on that time still raises my blood pressure! So far Jax is much better and I'm cautiously optimistic they have learned lessons from TF.
        • gas9S9zw3P9c 1307 days ago
          I had the opposite experience. The early TF versions were difficult to use in that they required a lot of boilerplate code to do simple things, but at least there was no hidden complexity. I knew exactly what my code did and what was going on under the hood. When I use today's high-level opaque TF libraries I have no idea what's going on. It's much harder to debug subtle problems. The workflow went wrong "Damn, I need to write 200 lines of code to do this simple thing" to "I need to spend 1 hour looking through library documentations, gotchas, deprecation issues and TF-internal code to figure out which function to call with what parameters and check if it actually does exactly what I need" - I much prefer the former.

          Having barriers of entry is not always a bad thing - it forces people to learn and understand concepts instead of blindly following and copying and pasting code from a Medium article and praying that it works.

          But I agree with you that there are many different use cases. Those people who want to do high-level work (I have some images, just give me a classifier) shouldn't need to deal with that complexity. IMO the big mistake was trying to merge all these different use cases into one framework. Let's hope JAX doesn't go down the same route.

        • brilee 1307 days ago
          (googler)

          Not quite sure why you picked those particular examples... JAX also requires usage of lax.cond, lax.while_loop, and ops.segment_sum. Only gather has been improved with slice notation support. IMO, TF has landed on a pretty nice solution to cond/while_loop via AutoGraph.

          • joaogui1 1307 days ago
            While jax has those operations you don't always need them, it depends on what transformations you want to do (JIT or grad) and they have been working on making normal control structures compatible with all transformations
        • iflp 1307 days ago
          You can't blame the TF people for things like while_loop. Those are inherited from Theano, and back then the dynamic graph idea wasn't obvious.

          JAX is indeed a different situation as it has a more original design (although TF1 came with a huge improvement in compilation speed, so maybe there were innovations under the hood). But I don't know if I like it. The framework itself is quite neat, but last time I checked, the accompanying NN libraries had horrifying designs.

        • MiroF 1307 days ago
          > tf.conds, tf.while_loops and the whole gather / scatter paradigm

          I'm ill-informed - but isn't that exactly what lax is?

          • mattjjatgoogle 1306 days ago
            The difference is that in TF1 you had to use tf.cond, tf.while_loop etc for differentiable control flow. In JAX you can differentiate Python control flow directly, e.g.:

              In [1]: from jax import grad
              
              In [2]: def f(x):
                 ...:     if x > 0:
                 ...:         return 3. * x ** 2
                 ...:     else:
                 ...:         return 5. * x ** 3
                 ...:
              
              In [3]: grad(f)(1.)
              Out[3]: DeviceArray(6., dtype=float32)
              
              In [4]: grad(f)(-1.)
              Out[4]: DeviceArray(15., dtype=float32)
            
            In the above example, the control flow happens in Python, just as it would in PyTorch. (That's not surprising, since JAX grew out of the original Autograd [1]!)

            Structured control flow functions like lax.cond, lax.scan, etc exist so that you can, for example, stage control flow out of Python and into an end-to-end compiled XLA computation with jax.jit. In other words, some JAX transformations place more constraints on your Python code than others, but you can just opt into the ones you want. (More generally, the lax module lets you program XLA HLO pretty directly [2].)

            Disclaimer: I work on JAX!

            [1] https://github.com/hips/autograd [2] https://www.tensorflow.org/xla/operation_semantics

            • p1esk 1306 days ago
              What would you say the main advantage of Jax is over Pytorch?
      • iflp 1307 days ago
        > I'm not convinced the same won't happen to JAX

        And now there are already multiple NN libraries for JAX from Google...

        • joaogui1 1307 days ago
          There are a bunch of frameworks built on top of Pytorch too (fastAI, lighting, torchbearer, ignite...), I don't see why this should be a problem (or at least a problem to JAX but not to Pytorch)
          • MiroF 1307 days ago
            IMO, this is not a fair comparison because Pytorch spans a larger amount of abstraction than jax (I don't quite know how to explain it other than "spans a larger amount of abstraction").

            You can do much of the jax stuff in pytorch, you can't do the high level nn.LSTM stuff in jax, you have to use like flax or objax or something.

    • MiroF 1307 days ago
      All I want is a way to statically type check tensor axes. Why can't I get a way to statically type check tensors?
    • alpineidyll3 1308 days ago
      The subtext is Google would love even more Google projects to be ml prerequisites.
    • sandGorgon 1307 days ago
      I have just started hearing about Jax. But it seems to be a low level library that Tensorflow uses right ?

      The latest release of Tensorflow probability uses JAX under the hood. So what do you mean when you say you're moving to JAX versus Tensorflow

      • joaogui1 1307 days ago
        In your first sentence you're mistaking JAX and XLA

        XLA: Accelerated Linear Algebra, I guess it's kind of a backend/compiler that optimizes Linear Algebra/Deep Learning calculations with some very interesting techniques, among them fusing kernels

        JAX: In some sense syntax sugar over XLA, but a better way of describing it is Composable transformations + Numpy + some Scipy. The composable transformations allow you to take derivatives (be them single, multi or vector valued functions and also higher order derivatives), JIT a function (which is them compiled to XLA), 2 forms of parallelism (vmap and pmap) and others, while being compatible with one another and with both TPUs, GPUs and CPUs

        • sandGorgon 1307 days ago
          im not mistaking the articles around it - check this out: https://www.tensorflow.org/probability/examples/TensorFlow_P...

          "TensorFlow Probability (TFP) is a library for probabilistic reasoning and statistical analysis that now works on JAX! For those not familiar, JAX is a library for accelerated numerical computing based on composable function transformations.

          We have ported a lot of TFP's most useful functionality to JAX while preserving the abstractions and APIs that many TFP users are now comfortable with."

          Tensorflow is migrating a bunch of stuff to JAX. Even they use the "library" word for their own porting. For a user like me, it looks like Jax is a library that tensorflow uses...but the end-user usable library is tensorflow.

          • cgs1019 1306 days ago
            Hi, tech lead for TFP here. The wording here was unclear -- sorry! We're fixing it presently.

            We are not migrating away from TF; far from it!

            The change here was to interoperate with TF and JAX (and numpy!), by way of some rewrite trickery under the hood. Essentially, we wrote a translation layer that implements the TF API surface (or, the parts we actually use) in terms of numpy & JAX primitives [1]. This lets us leave most TFP code intact, written in terms of the TF API, but interoperate with JAX by way of the API translation layer. (Actually we implemented numpy support first, and mostly got JAX for "free" since JAX is largely API-compatible with numpy).

            Sorry for any confusion!

            We're pretty stoked about this work, so happy to answer any other questions you may have (also feel free to chime in on the github tracker or email tfprobability@tensorflow.org)

            [1] - https://github.com/tensorflow/probability/tree/master/tensor...

            • sandGorgon 1306 days ago
              hey thanks for the clarification.

              here's what everybody is puzzled on: it looks like the layers going forward are JAX -> Tensorflow -> Keras.

              and we are seeing people moving to JAX directly. So this is ending up like a Flutter vs Kotlin issue (also within Google).

              Do you envision JAX being low level .. and the high level tensorflow keras interface being the most usable api ?

    • sakex 1307 days ago
      I don't think that the main goal of TF on Swift is to train models using Swift. I think it's mainly to deploy them in production on iPhones
  • 6d65 1308 days ago
    Last time I looked the automatic differentiation was in a compiler branch with no immediate plans to merge in master.

    But overall it is promising. I even installed Swift on Linux to play with it, didn't get to ML as I have an AMD GPU and this is a can of worms. Hope it's finished one day.

    I would prefer for Julia ml libraries to become mainstream. But, it is what it is.

    Also, the ideal for me would be Rust for tensorflow, but the slower compile times (didn't compare with Swift) are an impediment for an iterative workflow such as tweaking models.

    • cs702 1308 days ago
      A Rust for TensorFlow (and/or a "RustTorch") would be awesome.

      I hope all the work being done on improving incremental compilation[a] and developing interactive Rust REPLs like evcxr[b] makes using Rust for AI a practical reality.

      [a] https://doc.rust-lang.org/edition-guide/rust-2018/the-compil...

      [b] https://github.com/google/evcxr

      • anoncept 1307 days ago
        As a starting point, maybe take a look at the 'tch' crate [1] and the 'rust-bert' crate [2] built on top of it?

        [1]: https://github.com/LaurentMazare/tch-rs -> https://crates.io/crates/tch

        [2]: https://github.com/guillaume-be/rust-bert -> https://crates.io/crates/rust-bert

        • danieldk 1307 days ago
          tch-rs is really nice. I have built a Rust sequence labeler + dependency parser + lemmatizer on top of it. It supports multi-task learning, finetuning of BERT-like models, model distillation, etc.:

          https://github.com/stickeritis/sticker2/

          Unfortunately, when I started this guillaume-be's rust-bert crate wasn't around yet (I think), so I also ported some of the models from Huggingface transformers:

          https://github.com/stickeritis/sticker-transformers/

          At any rate, I can highly recommend the tch crate if you are looking to build neural networks in Rust.

      • adamnemecek 1307 days ago
        I love Rust as much as the next guy but it's not the best language for numerics. Julia is really nice though.
      • nestorD 1307 days ago
        There are torch bindings that are used by some users but what I personaly would like is a JAX clone built on top of Rust.

        I see a way to do it [0] but... I already have a PhD to finish.

        [0]: a macro compiling functions into an intermediate representations that are transformed with const functions (gradient computation) at compile time and jitted into XLA at runtime.

      • 6d65 1307 days ago
        I think there are official tensorflow bindings for Rust, add well as for pytorch C++ API.

        But, adding auto differentiation to match the Swift for tensorflow behaviour, sounds like a serious undertaking, and I doubt is on anyone's radar.

        But yeah, I've been wanting this for a while. Shoehorning Rust everywhere is the endgame.

    • turbinerneiter 1308 days ago
      I recently spend some days playing with differentiable programming in Swift on Linux:

      * as you said, auto-diff is in a branch or the Google fork of the project * the only pre-built images are for Ubuntu 18.04 * on Linux, the REPL seems to somewhat broken * many libraries are assuming OSX or iOS

      I don't feel a lot of hope for adoption of Swift on Linux. Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem. Meanwhile, the Open Source community is much more interested in Rust than Swift.

      For the differentiable programming - this is what got me excited, but after trying, I was a bit underwhelmed. Not that it isn't great technology, its just not figured out yet. I tried to come up with a use case outside of ML and the one I tried wasn't really applicable.

      I do feel however that someone will come up with something and that it will have quite some impact.

      • dsabanin 1307 days ago
        Apple I believe has a reason to work on Swift for Linux, and that reason is their considerable cloud infrastructure and various backend services.

        I’m sure being able to share domain specific Swift code between client apps and backend would be pretty high on their list of wants.

        Also, little clues like the way Xcode generates SwiftPM packages in a Linux-ready fashion out of the box shows that they care at least a bit.

        Having a lot of interest in the programming languages, my opinion is that Swift is a damn good one. It’s very high level, supports FP deep enough, has a great type system (that is getting better with every release), great OOP support, native performance characteristics and it still lets you get to a really low level when you need it.

        I also like how they took great ideas from Haskell, Scala, Smalltalk, C# and others. I code daily in Scala and Swift, previously had done Erlang, Clojure, Common LISP, TypeScript, Ruby, Python, Haskell, OCaml, Java, PHP, C, Smalltalk and some others. In this list, Swift is now almost at the top.

        They need to get Higher-Kinded Types and then it’s going to win the world (just kidding, JavaScript gets to win the world, unfortunately) :)

        • threatofrain 1307 days ago
          If Apple has intrinsic care for Swift on Linux then let it be evident in their story or direction.
          • dsabanin 1307 days ago
            I can't speak for Apple, of course, but some indications of their seriousness are there.

            SwiftNIO is a cross-platform asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. (https://github.com/apple/swift-nio)

            Distributed Membership Protocol implementations in Swift: https://github.com/apple/swift-cluster-membership

            Docker Official Image packaging for Swift: https://github.com/apple/swift-docker

            Also, on official https://swift.org/download all releases and snapshots are automatically available for: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, CentOS 7, CentOS 8, Amazon Linux 2

            Official Swift Server work group is a steering team that promotes the use of Swift for developing and deploying server applications: https://swift.org/server/

            Swift AWS Lambda Runtime: https://github.com/swift-server/swift-aws-lambda-runtime

            Official Swift 5.3 release goals have stated a major goal of "expanding the number of platforms where Swift is available and supported, notably adding support for Windows and additional Linux distributions." (https://swift.org/blog/5-3-release-process/)

            • pjmlp 1307 days ago
              Still waiting for the day when import Glibc isn't a thing on Swift examples.
              • dsabanin 1307 days ago
                Well, you can help make it come sooner… wink
                • pjmlp 1307 days ago
                  Why bother when F# already works everywhere where I care about, with better tooling support and the wealth of .NET libraries?

                  I already have my share of platforms I care about.

      • 6d65 1307 days ago
        Yep. It's difficult to find usages for autodiff, and it looks like a very niche thing to add to a language. But, it's still cool. In a language with extensible syntax(ex: proc macros), this would sit in a library.

        And I think you're right, there is a low chance that this gets traction. Especially since Chris Lattner is at SiFive, they probably will have some kind ofAI accelerators, but TF is for training rather then execution. So not sure they'll find a reason to push it.

        Jeremy Howard from fast ai might be able to convince people to give it a try. But without people working on it full time, the chances are not great. Especially with a compiler fork that requires constant merges/rebases. But, who knows.

        • improbable22 1307 days ago
          > It's difficult to find usages for autodiff

          I guess there are lots of uses in optimisation problems, and in sampling algorithms for statistics. I don't know how easy it will be to sell Swift to people who now use Stan or Stata (or R) and don't think of themselves as programmers.

          > In a language with extensible syntax(ex: proc macros), this would sit in a library.

          And this would allow easier iteration of different designs.

          • Scipio_Afri 1307 days ago
            Does autodiff exist in R? Then why not use R since it seems it also has much ML algo support due to its focus on statistics - its used by many statisticians.

            Is the idea here that Swift is a more approachable language and thus this is to lower the barrier of entry to TF?

            • 6d65 1307 days ago
              I think the main selling points of Swift to tensorflow are:

              * Speed, ML learning pipelines are often bottlenecked by the data load and transformation. TF had new mechanisms (the last I've seen was TF.data). But a language compiled to native is much more flexible in that regard.

              * Type safety. Sometimes issues with the models can pop long after they have been running. The hope is that typed API's will show simple errors at compile times

              * Auto differentiation built into the language. If I'm not mistaken, this is more powerful than backpropagation in TF, wich also has autodiff. The idea is that this would allow for more custom models without a performance penalty. My knowledge here is limited, since it's been over 2 years since I've implemented back propagation. I've successfully forgotten most of the things I knew about ML/DL.

              I don't have any experience with R, but from what I've heard, it was known to be slow. But that might have changed or I may have misunderstood the situation.

              • socialdemocrat 1307 days ago
                Does not need to be built into the language. Julia does it great without special built in support. But Julia is an extremely flexible language.

                Number types are first class which is a big part of it and you can add extra custom passes to the JIT compiler in regular library code.

            • socialdemocrat 1307 days ago
              You cannot really do autodiff in a slow language. I mean you can but nobody wants to run large machine learning training algorithms on a slow language like Python or R.

              You could write autodiff in say C++ but it is a user unfriendly language not well suited for machine learning and scientific computing.

              Swift is a nicer high level language you can do autodiff in. But honestly I don’t see the point with Swift either.

              Julia already does AutoDiff extremely well and outperforms Swift and pretty much everybody else.

      • shadowfiend 1307 days ago
        > Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem.

        Neither of these things are true, fwiw. Strong Linux support is an explicit current goal for Swift, though it is admittedly not all the way there yet. See this post regarding Swift 6 and its goals for wider support, for example: https://forums.swift.org/t/on-the-road-to-swift-6/32862

  • mark_l_watson 1307 days ago
    I question the long term viability of Swift for TensorFlow. I hope that I don’t sound like I am whining but I invested a fair amount of time with Swift from last fall through this February because I wanted: a better faster language to work in for DL than Python; I was also interested in trying some iOS, iPadOS, and macOS development; Swift looked interesting also for Linux server side work.

    For Apple development I have found Flutter+Dart to be more pleasant. For DL I decided to stick with taking advantage of my 5+ years experience with TF in Python.

    Off topic, but Julia and Flux are really worth checking out for DL.

    Some advice: if you want to experiment with Swift for TensorFlow use Google’s colab and make your life easier. I wish I could get back the install times on my Linux GPU laptop and on macOS. If you pay for colab like I do, you get good GPU/TPU resources, and it is simply an easy and fun way to go.

  • adonese 1308 days ago
    Is this (future of tf is in swift) still the case? Because there have been speculation esp. when lattner left google. Also projects like Jax has gained good momentum.

    I guess it's really hard to convince scientist to move away off of their python legacy. And python scientific stack is really incredible.

  • xvilka 1308 days ago
    Should have used Julia instead. The choice was only because of the Chris Lattner who already left.
    • marcinzm 1308 days ago
      They presumably wanted a semi-popular statically typed language as the gains of Julia over Python aren't enough to be worth it (and Julia isn't popular enough).
      • yyyk 1307 days ago
        Semi-popular where? The typical iOS programmer isn't going to touch ML. In the relevant demography, Julia had (still has) more users. Swift4TF was a poor choice, and now that Lattner is gone I doubt it has a future.
        • marcinzm 1307 days ago
          Julia is out because it's not statically typed and not enough of a difference from Python. If it was more popular then it may have had a chance on that alone but it's not. In terms of semi-popular statically typed modern languages that have good/performant C/C++ interop you have: Go, Swift and Rust. Can't think of any others to be honest. I'm guessing Go's lack of generics excluded it and Rust's complexity excluded it.
          • yyyk 1307 days ago
            TF's own analysis[0] was hesitant on static vs dynamic, finding downsides for either choice and suggested choosing a "middle ground" (which IMHO was doable starting from Julia's optional types).

            So static typing might be your requirement, but it wasn't TF's requirement. When examining Julia later on, they don't mention dynamic types as a downside at all - and every argument save for "we're familiar with Swift's internal working" was contradicted within the same document...

            Now, being familiar with Swift is a very understandable reason for their choice, but it's IMHO not the best choice for others which are not as familiar with Swift or ML. Most ML users are very familiar with Python and I think they will find Julia to be a welcome improvement on Python's pain points without Swift's baggage.

            [0] https://github.com/tensorflow/swift/blob/master/docs/WhySwif...

      • pjmlp 1308 days ago
        • marcinzm 1308 days ago
          There's a difference between popular and "not a toy language." I'm not arguing Julia isn't used, I'm arguing it's not used often enough to be a merit irrespective of other reasons.
          • socialdemocrat 1307 days ago
            Julia is used in serious scientific work. Swift isn’t. Massive projects running on super computers such a next generation climate models are written Julia. Many best of breeds scientific packages are written in Julia. Swift while great for App development has limited presence in the scientific field.
          • pjmlp 1307 days ago
            Interesting given some of the renowned names using it, probably with more revenue than plenty of Rust unicorns.
            • MiroF 1307 days ago
              Apple, Microsoft, Google?

              This website isn't counting "Julia only" stacks, it's just companies that have used Julia for one project or another. If you really want to compare that to rust, julia is again going to fall short.

              • pjmlp 1307 days ago
                Those were not the companies I had in mind with my Rust remark, but if you so wish, Swift, Kotlin/Native, Go, C++, .NET Native, Verona, Checked C, Objective-C.

                It remains to be seen how much Rust they will actually make into tier 1 OS SDKs for userspace applications.

                In fact, currently it looks more they are bringing their experience with Rust into their platform languages than anything else.

                Swift memory ownership, Verona, C++ Core Guidelines checker, Kotlin/Native ownership rules.

                Beware wishing for Julia's downfall with glass ceilings.

  • joaogui1 1307 days ago
    I really don't understand why they didn't go with Julia, it fits the purpose much better than Swift, already having a ML ecosystem and having great interop with Python, C, C++, R and Matlab. Heck JAX, where a lot of TF refugees are going, is pretty similar to Zygote
    • dunefox 1307 days ago
      It's because they were set on Swift and the 'language evaluation' was pointless. Julia would have been the natural choice.
  • Nelkins 1307 days ago
    There's some cool autodiff work going on in F#/.NET now too[0]. From a usage/API perspective they look kind of similar.

    [0] https://diffsharp.github.io/

  • VHRanger 1308 days ago
    Who would want to use an Apple-centric language for ML, seriously?

    Apple hardware is outright incompatible to the kind of hardware we use daily in machine learning workstations.

    • jasode 1308 days ago
      >Who would want to use an Apple-centric language for ML, seriously? Apple hardware is outright incompatible

      Based on how you wrote your comment, I'm guessing you may not know this S4TF is a Google initiative.

      Yes, Chris Lattner used to work for Apple but he was at Google Brain during the start of this project. When his team wanted to create a language where automatic differentiation and gradient descent was a 1st-class concept in the core syntax (i.e. without libraries) of the programming language , they looked at Rust, Julia, Swift, etc[1]. They ended up choosing Swift as the base language to extend with new ML syntax.

      Also, Chris has said in previous interviews that he thought of Swift as general purpose language and that hoped it would be used outside of Apple's ecosystem.

      EDIT to address the confusion where some assume Apple hardware dictates the direction of S4TF: For those not aware, Google's hardware TPU (Tensor Processing Unit)[2] is built with custom ASIC chips and not ARM nor Apple Silicon chips. Presumably, S4TF would run natively against Google's TPU. In other words, the goals and execution targets of S4TF are not restricted by Apple's ecosystem of macOS/iOS/Macbooks/iMacs/iPads in any way. Yes, the SwiftUI framework is Apple-specific but Swift-the-core-language-syntax[3] is not.

      [to downvoters: if I wrote inaccuracies, please correct me.]

      [1] https://github.com/tensorflow/swift/blob/master/docs/WhySwif...

      [2] https://en.wikipedia.org/wiki/Tensor_Processing_Unit

      [3] https://docs.swift.org/swift-book/ReferenceManual/zzSummaryO...

      • dunefox 1307 days ago
        What a coincidence that someone would choose his own programming language in an evaluation. There was no good reason not to choose Julia.
      • nsonha 1307 days ago
        For answering a rhetorical question? Yes of course that guy who came up with this also roots for swift to gain more traction outside Apple ecosystem but the reality hasn't going in that direction so far. There are plenty of cool languages out there like f#, julia, mathematica etc and the ML mob settled for python which is an average language so there is no reason to believe they are attracted to swift.

        And you just ignored the hardware comment

    • YetAnotherNick 1308 days ago
      I find swift really good in terms of ease of coding and speed. But yeah, I would have preferred Julia or something than swift.
    • maxerickson 1308 days ago
      People that make apps for Apple hardware?

      Seems kind of obvious.

    • yunohn 1308 days ago
      I agree on the hardware part. MacOS only supports AMD GPUs which are unsupported by Tensor flow...
    • vimy 1308 days ago
      That might change with Apple Silicon and Apple’s new ML Compute framework.
      • rudedogg 1307 days ago
        It seems like they'll be making more consumer focused GPUs/ML chips to me. I think Apple Silicon is exciting, but I imagine for serious work you'll still need an external enclosure and a dedicated GPU from AMD.

        Or Linux/Windows :(. Losing Nvidia/CUDA kind of killed ML on macOS overnight.

    • iddan 1308 days ago
      Swift is gaining momentum outside the Apple ecosystem. Web servers are being built and the Tensorflow team explains in length it’s advantages for developing ML models with it.
    • xiaodai 1308 days ago
      The appeal isn't broad enough
    • ericmay 1308 days ago
      Working on expanding it outside of the Apple/iOS ecosystem but many (probably most) developers use Mac. You can write Swift code in Linux right now.

      I’d also say that it appears that Swift is going to be a great language for machine learning. Things like calculating gradients are built in to the language and you can import Python to fill in gaps until Swift libraries are ready. And Swift is quite fast.

      • MiroF 1307 days ago
        I tried to use Swift recently while avoiding Xcode crud. It's a nightmare, definitely not going to be adopted until Apple goes more hands off.
      • pjmlp 1308 days ago
        Try to use Framework code on Linux.
  • hedgehog 1307 days ago
    The name is a bit confusing. Swift for TensorFlow combines a bunch of things:

      - adding autodiff to Swift language & compiler
      - neural net construction & training API in Swift
      - low-friction Python bindings
      - low-friction C++ interop
      - ability to run neural nets TensorFlow using the C++ interop 
      - ability to alternately run neural nets directly on XLA ("X10")
    
    The Swift changes are supposed to get mainlined and I think at least some of the stuff related to Python and C++ already are. The idea is that Swift is nice enough to cover experiment + train + embed as library even on mobile platforms. It's a big engineering project and I hope Google keeps funding the work.
  • pjmlp 1308 days ago
    Still no Windows support at the level of DirectML, or Julia.
  • Razengan 1307 days ago
    • dunefox 1307 days ago
      If you're dead set on using a particular language you can find reasons why it appears suitable even if they don't make any sense.
      • Razengan 1304 days ago
        That logic works both ways.
  • suyash 1307 days ago
    For those who prefer an alternate language here is Java for TensorFlow project https://www.tensorflow.org/install/lang_java
  • cube2222 1308 days ago
    I think the title is cut off at the end.
  • YetAnotherNick 1308 days ago
    I am not getting the exact usecase. If the speed is the issue, then I think tf.function solves that. For most of the operation we need to use python via pythonkit, so the things which can't be sped up using tf.function, can't be sped up using swift. Also if we need to use python everywhere, type safety is also very minor.
    • ddragon 1307 days ago
      The unique property is the ability to just pick any code or library that is unaware of the differentiation library (unlike tf.function as it needs to specifically use tf methods) and get the gradient. In a language like Julia this is immediately useful as it has a massive ecosystem of numerical code that make sense to get gradients (like differential equations and the SciML project [1], or less conventional stuff like raytracers), but in a language like Swift (as there is no meaning to gradient of GUI libraries or frontend stuff) it is more of a "if you build they'll come" faith from Google.

      But regardless unique features, it's a have cake and eat it too type of interface. You don't need to learn a second language within the language like tensorflow's tf.* making it even more natural and flexiblethan pytorch, including all debug mechanisms of the host language itself, but you still get compile time graph creation like tensorflow, including all kinds of optimizations. It makes other approaches seem primitive by comparison, but creating it is much more complex, and the main audience is already more than used to using language within language solutions (like numpy) which can provide something almost as good even if less elegantly, so it's not easy to convince people as well (when it involves changing programming languages).

      [1] https://sciml.ai/

    • MiroF 1307 days ago
      > exact usecase

      Static type checking?

  • socialdemocrat 1307 days ago
    I am a fan of both Swift and Julia but I honestly don’t see the point of Swift in scientific computing. Julia is just a way better fit.

    However for App development Swift of course has a far more impressive stack.

  • tanilama 1307 days ago
    I still think it is lacking in this proposal to justify why picking Swift as the host language