Swift for TensorFlow – A system for deep learning and differentiable computing

(tensorflow.org)

135 points | by OMGCable 1308 days ago

17 comments

throw6606 1308 days ago
Careful, folks. S4TF is pretty much dead on arrival. It was pushed aggressively by Chris Lattner (for obvious reasons) but he left Google a while ago and since then most internal users lost interest. There's nothing in Swift that's inherently suitable for ML and building the ecosystem is a ton of work; without all the political pushing, it went nowhere and is close to a "semi-abandoned research project" phase.
[-]
- MiroF 1307 days ago
  > There's nothing in Swift that's inherently suitable for ML
  The type system?
  [-]
  - mkolodny 1307 days ago
    Speed, too. For PyTorch to train models and run inference quickly, your Python code gets translated to C++/CUDA. Part of the idea with S4TF is to be able to write ML code in a single, fast language.
    [-]
    - socialdemocrat 1307 days ago
      You can already do that in Julia and unlike Swift that is a language with lots of scientific and machine learning libraries.
      I like Swift but looking at the code examples I got to say preparing data and setting up a model is 10x easier in Julia.
    - MiroF 1307 days ago
      I am skeptical that there are no calls to underlying C or CUDA libraries occurring. Swift doesn’t naively beat BLAS
    - dklend122 1307 days ago
      Well, it's not happening in swift yet.
      S4TF still requires either c cuda kernels or XLA. Julia on the other hand has JIT GPU codegen and its CPU codegen has been benchmarked to beat openblas
      [-]
      - MiroF 1306 days ago
        > CPU codegen has been benchmarked to beat openblas
        Source on this claim?
        [-]
        ddragon 1306 days ago
        He probably means Tullio.jl (which also seems to integrate with Julia's source to source differentiation library Zygote.jl, the main competitor to Swift for Tensorflow):
        https://discourse.julialang.org/t/realistically-how-close-is...
        https://github.com/mcabbott/Tullio.jl
        Regardless if it can consistently beat Fortran/BLAS in every area, in general JIT languages have more opportunities for optimizations than AoT languages, so it's interesting to see what comes out of a language that focuses on leveraging this to get the most performance.
        [-]
        MiroF 1306 days ago
        I'm surprised re: blas - that is closer than I thought! Still a huge gap on the GPU, but still impressive.
        > in general JIT languages have more opportunities for optimizations than AoT languages
        I'm not sure I agree with this. I would say the opportunities for any non-GC mature AoT language (C++, Rust, etc.) is going to be pretty much the same, since you can just attach a JIT to most AoT langs.
        [-]
        amkkma 1306 days ago
        The GPU gap is only if written in the high level index or loop style. There is little to no gap if done either using array abstractions (broadcast, map etc) or at a level similar to Cuda C (though with nicer Julia abstractions and syntax): https://juliagpu.org/cuda/
        The Julialab at MIT is working on making the higher level codegen faster
        [-]
        MiroF 1306 days ago
        I guess that makes sense to me.. you can just automatically convert the C in BLAS to Julia and then if they're both being converted to llvm ir by clang anyways than i guess it'll be about as fast!
        [-]
        amkkma 1305 days ago
        That's not at all what Julia is doing. It's much more sophisticated in that it has very low level intrinsic primitives that can compose and it optimizes the IR to make it fast and then compiles it to CUDA. These all map to Julia constructs.
        ddragon 1306 days ago
        Sure, if you give static language a JIT they'll be able to get the advantages of having JIT, though language semantics still matter. A language built for JITs like Julia or Common Lisp have native ways of interfacing with the compiler, and programs are built without worry of exponential explosion of implementations during method monomorphization (as you'll only compile the optimal versions that you'll actually use, based on runtime information, without having to be pessimist as any overspecialization can be fixed on demand). AoT languages would probably need a compiler pragma or type similar to a dynamic boxing but for delayed monomorphization/compilation for methods when you want to avoid compiling all paths AoT (which might be a way to allow for example tensor specialization on sizes, similar to StaticArrays on Julia).
        [-]
        MiroF 1306 days ago
        I don't quite follow.
        I am not too experienced with Julia, but my understanding was that it uses LLVM to jit itself. Since the LLVM jit compiler is also an API available to C++, anything that can be done in Julia can be done with jit to LLVM api in C++.
        Then you just compile the methods that you'll actually use with LLVM right before using them.
        [-]
        ddragon 1306 days ago
        Sorry, you're right in that Julia is written in C/C++, so everything Julia does can be solved in those by writing a language (like Julia itself, and not unlike Tensorflow original interface) and compiling it on demand and finding a way to eval the new code and recover the results. I was talking along the ways of how to make it sort of convenient (at least viable to implent unlike the former), as an extension to the C++ compiler itself where you can just tell the compiler what stays AoT and what is JIT'd but otherwise keep the same C++ syntax.
        Not to mention if you want to reimplement Julia's logic in C++ you'll have to develop it's sophisticated type inference, since Julia compiler is so aggressive that it will compile at once entire blocks of program (the entire program if it can) as long as it can infer what types are used downstream, which is why it can compete with AoT compiled languages (it's basicaly a "Just Ahead of Time Compiler")
        shele 1306 days ago
        The crux is that which are "the methods that you'll actually use" is very difficult to answer. A lot of effort is put into this on the Julia "Package compiler" and "Compiler" projects.
- belval 1307 days ago
  If true, it does not surprise me. While there is a lot of language war going on in the ML ecosystem, I never really heard anyone using, planning to use or waiting for swift for TensorFlow.
  It might be a nice language (never used it), but there are other contenders with more engineers/scientists support like Rust and Julia, for which the advantages were clearer.
  Finally, the whole ordeal got a very bad look from having its main proponent being the creator of Swift instead of the actual community pushing for it.
  [-]
  - ksec 1307 days ago
    Precisely because Chris Latter had the idea of replacing everything from C to Javascript with Swift.
    I didn't like that idea then, I still dont like it now. But the idea in itself is very Apple (ish).
- bartvk 1308 days ago
  > since then most internal users lost interest
  Why do you say so?
  [-]
- ddbb33 1308 days ago
  Can you backup your claims?
  [-]
  - pjmlp 1308 days ago
    I guess that the fact that Tensorflow 2020 conference had zero references to it, including on the related blog posts says it all.
  - Razengan 1307 days ago
    Newly created account claims something without proof and comments asking for such seem to be getting buried.
    [-]
    - MiroF 1307 days ago
      Yes, or a Googler who doesn’t want to be seen publicly attacking a Google project
      [-]
      - Razengan 1305 days ago
        Yes, or someone with an agenda against Swift/TF.
  - UrSuchAGenius 1308 days ago
    This website has been cached already
- rtorr 1308 days ago
  “Stillborn” is a pretty awful term to use for software.
  [-]
  - throw6606 1308 days ago
    Mea culpa. Reworded.
  - StavrosK 1307 days ago
    Why is that? It conveys the DOA meaning pretty well.
    [-]
    - MiroF 1307 days ago
      Generally, your metaphors should not rely on comparison to a pretty traumatic event that has happened to quite a few people, many of whom might be around you without you knowing.
      [-]
      - unishark 1307 days ago
        DOA is a metaphor too, literally referring to people arriving at the ER too late.
      - layoutIfNeeded 1307 days ago
        Sure. I guess then you shouldn’t even use the word “dead”, as plenty of people have lost close relatives which is a pretty traumatic event.
        [-]
        MiroF 1307 days ago
        Just letting you know the reality that a lot of people view "stillborn" events as particularly traumatic. I'm not going to spend a lot of time arguing about this because it's a pretty simple point.
        It's up to you to choose what kind of person you want to be, I don't have control over what you say.
        deadmutex 1307 days ago
        I think the difference is that the term "Dead" is used in a lot of different contexts. E.g. Battery is dead. Stillborn is not, and it mainly associated used in one very traumatic context.
        [-]
        layoutIfNeeded 1307 days ago
        How about “miscarriage of justice”?
        https://en.m.wikipedia.org/wiki/Miscarriage_of_justice
      - lehi 1307 days ago
        I know anorexics for whom any mention of food can be a trigger. Should metaphors involving consumption be verboten in public discourse because they might be read by someone with an eating disorder?
  - kgwgk 1307 days ago
    Now that I think of it, does anyone know if "abort" is among the computing terms to be avoided (like master, black list or sanity check)?
    [-]
    - p1esk 1307 days ago
      None of these should be avoided, as long as they are accurate terms.
    - devmunchies 1307 days ago
      Anyone who is avoiding "master, black list or sanity check" probably thinks abortion is super awesome and that the term "abort" should never be stigmatized.
      Best not to worry about such silly things and keep writing code.
      [-]
      - falseprofit 1307 days ago
        I don't know of anyone who "thinks abortion is super awesome"...
        In any case, the primary meaning of 'abort' is more general, so it shouldn't be compared to the metaphorical use of 'stillborn'.
        [-]
        kgwgk 1306 days ago
        The original meaning of abort is miscarriage/miscarry, other uses are “metaphorical”.
bodono 1308 days ago
I'm not sure this is really going to take off, it seems that most people who are abandoning TF are moving to Jax or pytorch. My own experience with Jax is that it is much easier to use then TF, just an all round more pleasant experience. It would be interesting to try this, but at this point I'm not really willing to learn 'yet another deep learning framework' and the extreme anti-user problems that TF had make me loath to give it another shot, even with a presumably better frontend. Moreover, I think that python is just a better all-round ML/data science language at this point. Has anyone tried both Jax and this and would be willing to give us their thoughts on strengths and weaknesses of each?
[-]
- gas9S9zw3P9c 1307 days ago
  I'm skeptical of JAX. It feels good right now, but when the first TF beta version came out it was very much like that too - clean, simple, minimal, and just a better version of Theano. Then the "crossing the chasm" effort started and everyone at Google wanted to be part of it, making TF the big complex mess it is today. It's a great example of Conway's Law. I'm not convinced the same won't happen to JAX as it catches on.
  PyTorch has already stood the test of time and proven that its development is led by a competent team.
  [-]
  - bodono 1307 days ago
    I know where you're coming from, but TF in my opinion was very user-hostile even on arrival. I can't tell you how much hair-pulling I did over tf.conds, tf.while_loops and the whole gather / scatter paradigm for simple indexing into arrays. I really think the people working on it wanted users to write TF code in a certain, particular way and made it really difficult to use it in other ways. Just thinking back on that time still raises my blood pressure! So far Jax is much better and I'm cautiously optimistic they have learned lessons from TF.
    [-]
    - gas9S9zw3P9c 1307 days ago
      I had the opposite experience. The early TF versions were difficult to use in that they required a lot of boilerplate code to do simple things, but at least there was no hidden complexity. I knew exactly what my code did and what was going on under the hood. When I use today's high-level opaque TF libraries I have no idea what's going on. It's much harder to debug subtle problems. The workflow went wrong "Damn, I need to write 200 lines of code to do this simple thing" to "I need to spend 1 hour looking through library documentations, gotchas, deprecation issues and TF-internal code to figure out which function to call with what parameters and check if it actually does exactly what I need" - I much prefer the former.
      Having barriers of entry is not always a bad thing - it forces people to learn and understand concepts instead of blindly following and copying and pasting code from a Medium article and praying that it works.
      But I agree with you that there are many different use cases. Those people who want to do high-level work (I have some images, just give me a classifier) shouldn't need to deal with that complexity. IMO the big mistake was trying to merge all these different use cases into one framework. Let's hope JAX doesn't go down the same route.
    - brilee 1307 days ago
      (googler)
      Not quite sure why you picked those particular examples... JAX also requires usage of lax.cond, lax.while_loop, and ops.segment_sum. Only gather has been improved with slice notation support. IMO, TF has landed on a pretty nice solution to cond/while_loop via AutoGraph.
      [-]
      - joaogui1 1307 days ago
        While jax has those operations you don't always need them, it depends on what transformations you want to do (JIT or grad) and they have been working on making normal control structures compatible with all transformations
    - iflp 1307 days ago
      You can't blame the TF people for things like while_loop. Those are inherited from Theano, and back then the dynamic graph idea wasn't obvious.
      JAX is indeed a different situation as it has a more original design (although TF1 came with a huge improvement in compilation speed, so maybe there were innovations under the hood). But I don't know if I like it. The framework itself is quite neat, but last time I checked, the accompanying NN libraries had horrifying designs.
    - MiroF 1307 days ago
      > tf.conds, tf.while_loops and the whole gather / scatter paradigm
      I'm ill-informed - but isn't that exactly what lax is?
      [-]
      - mattjjatgoogle 1306 days ago
        The difference is that in TF1 you had to use tf.cond, tf.while_loop etc for differentiable control flow. In JAX you can differentiate Python control flow directly, e.g.:
        In [1]: from jax import grad In [2]: def f(x): ...: if x > 0: ...: return 3. * x ** 2 ...: else: ...: return 5. * x ** 3 ...: In [3]: grad(f)(1.) Out[3]: DeviceArray(6., dtype=float32) In [4]: grad(f)(-1.) Out[4]: DeviceArray(15., dtype=float32)
        In the above example, the control flow happens in Python, just as it would in PyTorch. (That's not surprising, since JAX grew out of the original Autograd [1]!)
        Structured control flow functions like lax.cond, lax.scan, etc exist so that you can, for example, stage control flow out of Python and into an end-to-end compiled XLA computation with jax.jit. In other words, some JAX transformations place more constraints on your Python code than others, but you can just opt into the ones you want. (More generally, the lax module lets you program XLA HLO pretty directly [2].)
        Disclaimer: I work on JAX!
        [1] https://github.com/hips/autograd [2] https://www.tensorflow.org/xla/operation_semantics
        [-]
        p1esk 1306 days ago
        What would you say the main advantage of Jax is over Pytorch?
  - iflp 1307 days ago
    > I'm not convinced the same won't happen to JAX
    And now there are already multiple NN libraries for JAX from Google...
    [-]
    - joaogui1 1307 days ago
      There are a bunch of frameworks built on top of Pytorch too (fastAI, lighting, torchbearer, ignite...), I don't see why this should be a problem (or at least a problem to JAX but not to Pytorch)
      [-]
      - MiroF 1307 days ago
        IMO, this is not a fair comparison because Pytorch spans a larger amount of abstraction than jax (I don't quite know how to explain it other than "spans a larger amount of abstraction").
        You can do much of the jax stuff in pytorch, you can't do the high level nn.LSTM stuff in jax, you have to use like flax or objax or something.
- MiroF 1307 days ago
  All I want is a way to statically type check tensor axes. Why can't I get a way to statically type check tensors?
  [-]
  - BadInformatics 1307 days ago
    There are a few efforts working in this space. As you can imagine, all of them are experimental:
    - Dex: https://github.com/google-research/dex-lang/ - Hasktorch: https://github.com/hasktorch/hasktorch - This initiative from the Python Typing-sig: https://docs.google.com/document/d/1oaG0V2ZE5BRDjd9N-Tr1N0IK...
  - marmaduke 1307 days ago
    Futhark has size types,
    https://futhark-lang.org/blog/2020-03-15-futhark-0.15.1-rele...
    and it seems to be ok for DL
    https://elsman.com/pdf/fhpnc19.pdf
  - atorodius 1307 days ago
    This is not statically checked but it's a step in the right direction: https://pytorch.org/docs/stable/named_tensor.html
    [-]
    - MiroF 1307 days ago
      Yeah, I actually helped work on the inspo for that project https://github.com/harvardnlp/namedtensor .
      From what I've been able to tell, (no shade to the Pytorch team which has many different priorities) work has been somewhat slow going on the port.
      Further, this is dynamic type checking as you mentioned.
      [-]
      - atorodius 1307 days ago
        I see, interesting! Yeah statically checking this would be way more awesome still
        [-]
        MiroF 1307 days ago
        Oh I just noticed that you're one of the people behind that recent GAN compression work! Really cool stuff and a big step up this year, I've been following the field for a lil bit.
        Congrats!
        [-]
        atorodius 1306 days ago
        Thanks a lot for the kind words!
- alpineidyll3 1308 days ago
  The subtext is Google would love even more Google projects to be ml prerequisites.
- sandGorgon 1307 days ago
  I have just started hearing about Jax. But it seems to be a low level library that Tensorflow uses right ?
  The latest release of Tensorflow probability uses JAX under the hood. So what do you mean when you say you're moving to JAX versus Tensorflow
  [-]
  - joaogui1 1307 days ago
    In your first sentence you're mistaking JAX and XLA
    XLA: Accelerated Linear Algebra, I guess it's kind of a backend/compiler that optimizes Linear Algebra/Deep Learning calculations with some very interesting techniques, among them fusing kernels
    JAX: In some sense syntax sugar over XLA, but a better way of describing it is Composable transformations + Numpy + some Scipy. The composable transformations allow you to take derivatives (be them single, multi or vector valued functions and also higher order derivatives), JIT a function (which is them compiled to XLA), 2 forms of parallelism (vmap and pmap) and others, while being compatible with one another and with both TPUs, GPUs and CPUs
    [-]
    - sandGorgon 1307 days ago
      im not mistaking the articles around it - check this out: https://www.tensorflow.org/probability/examples/TensorFlow_P...
      "TensorFlow Probability (TFP) is a library for probabilistic reasoning and statistical analysis that now works on JAX! For those not familiar, JAX is a library for accelerated numerical computing based on composable function transformations.
      We have ported a lot of TFP's most useful functionality to JAX while preserving the abstractions and APIs that many TFP users are now comfortable with."
      Tensorflow is migrating a bunch of stuff to JAX. Even they use the "library" word for their own porting. For a user like me, it looks like Jax is a library that tensorflow uses...but the end-user usable library is tensorflow.
      [-]
      - cgs1019 1306 days ago
        Hi, tech lead for TFP here. The wording here was unclear -- sorry! We're fixing it presently.
        We are not migrating away from TF; far from it!
        The change here was to interoperate with TF and JAX (and numpy!), by way of some rewrite trickery under the hood. Essentially, we wrote a translation layer that implements the TF API surface (or, the parts we actually use) in terms of numpy & JAX primitives [1]. This lets us leave most TFP code intact, written in terms of the TF API, but interoperate with JAX by way of the API translation layer. (Actually we implemented numpy support first, and mostly got JAX for "free" since JAX is largely API-compatible with numpy).
        Sorry for any confusion!
        We're pretty stoked about this work, so happy to answer any other questions you may have (also feel free to chime in on the github tracker or email tfprobability@tensorflow.org)
        [1] - https://github.com/tensorflow/probability/tree/master/tensor...
        [-]
        sandGorgon 1306 days ago
        hey thanks for the clarification.
        here's what everybody is puzzled on: it looks like the layers going forward are JAX -> Tensorflow -> Keras.
        and we are seeing people moving to JAX directly. So this is ending up like a Flutter vs Kotlin issue (also within Google).
        Do you envision JAX being low level .. and the high level tensorflow keras interface being the most usable api ?
- sakex 1307 days ago
  I don't think that the main goal of TF on Swift is to train models using Swift. I think it's mainly to deploy them in production on iPhones
6d65 1308 days ago
Last time I looked the automatic differentiation was in a compiler branch with no immediate plans to merge in master.
But overall it is promising. I even installed Swift on Linux to play with it, didn't get to ML as I have an AMD GPU and this is a can of worms. Hope it's finished one day.
I would prefer for Julia ml libraries to become mainstream. But, it is what it is.
Also, the ideal for me would be Rust for tensorflow, but the slower compile times (didn't compare with Swift) are an impediment for an iterative workflow such as tweaking models.
[-]
- cs702 1308 days ago
  A Rust for TensorFlow (and/or a "RustTorch") would be awesome.
  I hope all the work being done on improving incremental compilation[a] and developing interactive Rust REPLs like evcxr[b] makes using Rust for AI a practical reality.
  [a] https://doc.rust-lang.org/edition-guide/rust-2018/the-compil...
  [b] https://github.com/google/evcxr
  [-]
  - anoncept 1307 days ago
    As a starting point, maybe take a look at the 'tch' crate [1] and the 'rust-bert' crate [2] built on top of it?
    [1]: https://github.com/LaurentMazare/tch-rs -> https://crates.io/crates/tch
    [2]: https://github.com/guillaume-be/rust-bert -> https://crates.io/crates/rust-bert
    [-]
    - danieldk 1307 days ago
      tch-rs is really nice. I have built a Rust sequence labeler + dependency parser + lemmatizer on top of it. It supports multi-task learning, finetuning of BERT-like models, model distillation, etc.:
      https://github.com/stickeritis/sticker2/
      Unfortunately, when I started this guillaume-be's rust-bert crate wasn't around yet (I think), so I also ported some of the models from Huggingface transformers:
      https://github.com/stickeritis/sticker-transformers/
      At any rate, I can highly recommend the tch crate if you are looking to build neural networks in Rust.
  - adamnemecek 1307 days ago
    I love Rust as much as the next guy but it's not the best language for numerics. Julia is really nice though.
  - nestorD 1307 days ago
    There are torch bindings that are used by some users but what I personaly would like is a JAX clone built on top of Rust.
    I see a way to do it [0] but... I already have a PhD to finish.
    [0]: a macro compiling functions into an intermediate representations that are transformed with const functions (gradient computation) at compile time and jitted into XLA at runtime.
  - 6d65 1307 days ago
    I think there are official tensorflow bindings for Rust, add well as for pytorch C++ API.
    But, adding auto differentiation to match the Swift for tensorflow behaviour, sounds like a serious undertaking, and I doubt is on anyone's radar.
    But yeah, I've been wanting this for a while. Shoehorning Rust everywhere is the endgame.
- turbinerneiter 1308 days ago
  I recently spend some days playing with differentiable programming in Swift on Linux:
  * as you said, auto-diff is in a branch or the Google fork of the project * the only pre-built images are for Ubuntu 18.04 * on Linux, the REPL seems to somewhat broken * many libraries are assuming OSX or iOS
  I don't feel a lot of hope for adoption of Swift on Linux. Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem. Meanwhile, the Open Source community is much more interested in Rust than Swift.
  For the differentiable programming - this is what got me excited, but after trying, I was a bit underwhelmed. Not that it isn't great technology, its just not figured out yet. I tried to come up with a use case outside of ML and the one I tried wasn't really applicable.
  I do feel however that someone will come up with something and that it will have quite some impact.
  [-]
  - dsabanin 1307 days ago
    Apple I believe has a reason to work on Swift for Linux, and that reason is their considerable cloud infrastructure and various backend services.
    I’m sure being able to share domain specific Swift code between client apps and backend would be pretty high on their list of wants.
    Also, little clues like the way Xcode generates SwiftPM packages in a Linux-ready fashion out of the box shows that they care at least a bit.
    Having a lot of interest in the programming languages, my opinion is that Swift is a damn good one. It’s very high level, supports FP deep enough, has a great type system (that is getting better with every release), great OOP support, native performance characteristics and it still lets you get to a really low level when you need it.
    I also like how they took great ideas from Haskell, Scala, Smalltalk, C# and others. I code daily in Scala and Swift, previously had done Erlang, Clojure, Common LISP, TypeScript, Ruby, Python, Haskell, OCaml, Java, PHP, C, Smalltalk and some others. In this list, Swift is now almost at the top.
    They need to get Higher-Kinded Types and then it’s going to win the world (just kidding, JavaScript gets to win the world, unfortunately) :)
    [-]
    - threatofrain 1307 days ago
      If Apple has intrinsic care for Swift on Linux then let it be evident in their story or direction.
      [-]
      - dsabanin 1307 days ago
        I can't speak for Apple, of course, but some indications of their seriousness are there.
        SwiftNIO is a cross-platform asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. (https://github.com/apple/swift-nio)
        Distributed Membership Protocol implementations in Swift: https://github.com/apple/swift-cluster-membership
        Docker Official Image packaging for Swift: https://github.com/apple/swift-docker
        Also, on official https://swift.org/download all releases and snapshots are automatically available for: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, CentOS 7, CentOS 8, Amazon Linux 2
        Official Swift Server work group is a steering team that promotes the use of Swift for developing and deploying server applications: https://swift.org/server/
        Swift AWS Lambda Runtime: https://github.com/swift-server/swift-aws-lambda-runtime
        Official Swift 5.3 release goals have stated a major goal of "expanding the number of platforms where Swift is available and supported, notably adding support for Windows and additional Linux distributions." (https://swift.org/blog/5-3-release-process/)
        [-]
        pjmlp 1307 days ago
        Still waiting for the day when import Glibc isn't a thing on Swift examples.
        [-]
        dsabanin 1307 days ago
        Well, you can help make it come sooner… wink
        [-]
        pjmlp 1307 days ago
        Why bother when F# already works everywhere where I care about, with better tooling support and the wealth of .NET libraries?
        I already have my share of platforms I care about.
  - 6d65 1307 days ago
    Yep. It's difficult to find usages for autodiff, and it looks like a very niche thing to add to a language. But, it's still cool. In a language with extensible syntax(ex: proc macros), this would sit in a library.
    And I think you're right, there is a low chance that this gets traction. Especially since Chris Lattner is at SiFive, they probably will have some kind ofAI accelerators, but TF is for training rather then execution. So not sure they'll find a reason to push it.
    Jeremy Howard from fast ai might be able to convince people to give it a try. But without people working on it full time, the chances are not great. Especially with a compiler fork that requires constant merges/rebases. But, who knows.
    [-]
    - improbable22 1307 days ago
      > It's difficult to find usages for autodiff
      I guess there are lots of uses in optimisation problems, and in sampling algorithms for statistics. I don't know how easy it will be to sell Swift to people who now use Stan or Stata (or R) and don't think of themselves as programmers.
      > In a language with extensible syntax(ex: proc macros), this would sit in a library.
      And this would allow easier iteration of different designs.
      [-]
      - Scipio_Afri 1307 days ago
        Does autodiff exist in R? Then why not use R since it seems it also has much ML algo support due to its focus on statistics - its used by many statisticians.
        Is the idea here that Swift is a more approachable language and thus this is to lower the barrier of entry to TF?
        [-]
        6d65 1307 days ago
        I think the main selling points of Swift to tensorflow are:
        * Speed, ML learning pipelines are often bottlenecked by the data load and transformation. TF had new mechanisms (the last I've seen was TF.data). But a language compiled to native is much more flexible in that regard.
        * Type safety. Sometimes issues with the models can pop long after they have been running. The hope is that typed API's will show simple errors at compile times
        * Auto differentiation built into the language. If I'm not mistaken, this is more powerful than backpropagation in TF, wich also has autodiff. The idea is that this would allow for more custom models without a performance penalty. My knowledge here is limited, since it's been over 2 years since I've implemented back propagation. I've successfully forgotten most of the things I knew about ML/DL.
        I don't have any experience with R, but from what I've heard, it was known to be slow. But that might have changed or I may have misunderstood the situation.
        [-]
        socialdemocrat 1307 days ago
        Does not need to be built into the language. Julia does it great without special built in support. But Julia is an extremely flexible language.
        Number types are first class which is a big part of it and you can add extra custom passes to the JIT compiler in regular library code.
        socialdemocrat 1307 days ago
        You cannot really do autodiff in a slow language. I mean you can but nobody wants to run large machine learning training algorithms on a slow language like Python or R.
        You could write autodiff in say C++ but it is a user unfriendly language not well suited for machine learning and scientific computing.
        Swift is a nicer high level language you can do autodiff in. But honestly I don’t see the point with Swift either.
        Julia already does AutoDiff extremely well and outperforms Swift and pretty much everybody else.
  - shadowfiend 1307 days ago
    > Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem.
    Neither of these things are true, fwiw. Strong Linux support is an explicit current goal for Swift, though it is admittedly not all the way there yet. See this post regarding Swift 6 and its goals for wider support, for example: https://forums.swift.org/t/on-the-road-to-swift-6/32862
mark_l_watson 1307 days ago
I question the long term viability of Swift for TensorFlow. I hope that I don’t sound like I am whining but I invested a fair amount of time with Swift from last fall through this February because I wanted: a better faster language to work in for DL than Python; I was also interested in trying some iOS, iPadOS, and macOS development; Swift looked interesting also for Linux server side work.
For Apple development I have found Flutter+Dart to be more pleasant. For DL I decided to stick with taking advantage of my 5+ years experience with TF in Python.
Off topic, but Julia and Flux are really worth checking out for DL.
Some advice: if you want to experiment with Swift for TensorFlow use Google’s colab and make your life easier. I wish I could get back the install times on my Linux GPU laptop and on macOS. If you pay for colab like I do, you get good GPU/TPU resources, and it is simply an easy and fun way to go.
adonese 1308 days ago
Is this (future of tf is in swift) still the case? Because there have been speculation esp. when lattner left google. Also projects like Jax has gained good momentum.
I guess it's really hard to convince scientist to move away off of their python legacy. And python scientific stack is really incredible.
xvilka 1308 days ago
Should have used Julia instead. The choice was only because of the Chris Lattner who already left.
[-]
- marcinzm 1308 days ago
  They presumably wanted a semi-popular statically typed language as the gains of Julia over Python aren't enough to be worth it (and Julia isn't popular enough).
  [-]
  - yyyk 1307 days ago
    Semi-popular where? The typical iOS programmer isn't going to touch ML. In the relevant demography, Julia had (still has) more users. Swift4TF was a poor choice, and now that Lattner is gone I doubt it has a future.
    [-]
    - marcinzm 1307 days ago
      Julia is out because it's not statically typed and not enough of a difference from Python. If it was more popular then it may have had a chance on that alone but it's not. In terms of semi-popular statically typed modern languages that have good/performant C/C++ interop you have: Go, Swift and Rust. Can't think of any others to be honest. I'm guessing Go's lack of generics excluded it and Rust's complexity excluded it.
      [-]
      - yyyk 1307 days ago
        TF's own analysis[0] was hesitant on static vs dynamic, finding downsides for either choice and suggested choosing a "middle ground" (which IMHO was doable starting from Julia's optional types).
        So static typing might be your requirement, but it wasn't TF's requirement. When examining Julia later on, they don't mention dynamic types as a downside at all - and every argument save for "we're familiar with Swift's internal working" was contradicted within the same document...
        Now, being familiar with Swift is a very understandable reason for their choice, but it's IMHO not the best choice for others which are not as familiar with Swift or ML. Most ML users are very familiar with Python and I think they will find Julia to be a welcome improvement on Python's pain points without Swift's baggage.
        [0] https://github.com/tensorflow/swift/blob/master/docs/WhySwif...
  - pjmlp 1308 days ago
    On HN circles maybe.
    https://juliacomputing.com/case-studies/
    [-]
    - marcinzm 1308 days ago
      There's a difference between popular and "not a toy language." I'm not arguing Julia isn't used, I'm arguing it's not used often enough to be a merit irrespective of other reasons.
      [-]
      - socialdemocrat 1307 days ago
        Julia is used in serious scientific work. Swift isn’t. Massive projects running on super computers such a next generation climate models are written Julia. Many best of breeds scientific packages are written in Julia. Swift while great for App development has limited presence in the scientific field.
      - pjmlp 1307 days ago
        Interesting given some of the renowned names using it, probably with more revenue than plenty of Rust unicorns.
        [-]
        MiroF 1307 days ago
        Apple, Microsoft, Google?
        This website isn't counting "Julia only" stacks, it's just companies that have used Julia for one project or another. If you really want to compare that to rust, julia is again going to fall short.
        [-]
        pjmlp 1307 days ago
        Those were not the companies I had in mind with my Rust remark, but if you so wish, Swift, Kotlin/Native, Go, C++, .NET Native, Verona, Checked C, Objective-C.
        It remains to be seen how much Rust they will actually make into tier 1 OS SDKs for userspace applications.
        In fact, currently it looks more they are bringing their experience with Rust into their platform languages than anything else.
        Swift memory ownership, Verona, C++ Core Guidelines checker, Kotlin/Native ownership rules.
        Beware wishing for Julia's downfall with glass ceilings.
joaogui1 1307 days ago
I really don't understand why they didn't go with Julia, it fits the purpose much better than Swift, already having a ML ecosystem and having great interop with Python, C, C++, R and Matlab. Heck JAX, where a lot of TF refugees are going, is pretty similar to Zygote
[-]
- dunefox 1307 days ago
  It's because they were set on Swift and the 'language evaluation' was pointless. Julia would have been the natural choice.
Nelkins 1307 days ago
There's some cool autodiff work going on in F#/.NET now too[0]. From a usage/API perspective they look kind of similar.
[0] https://diffsharp.github.io/
VHRanger 1308 days ago
Who would want to use an Apple-centric language for ML, seriously?
Apple hardware is outright incompatible to the kind of hardware we use daily in machine learning workstations.
[-]
- jasode 1308 days ago
  >Who would want to use an Apple-centric language for ML, seriously? Apple hardware is outright incompatible
  Based on how you wrote your comment, I'm guessing you may not know this S4TF is a Google initiative.
  Yes, Chris Lattner used to work for Apple but he was at Google Brain during the start of this project. When his team wanted to create a language where automatic differentiation and gradient descent was a 1st-class concept in the core syntax (i.e. without libraries) of the programming language , they looked at Rust, Julia, Swift, etc[1]. They ended up choosing Swift as the base language to extend with new ML syntax.
  Also, Chris has said in previous interviews that he thought of Swift as general purpose language and that hoped it would be used outside of Apple's ecosystem.
  EDIT to address the confusion where some assume Apple hardware dictates the direction of S4TF: For those not aware, Google's hardware TPU (Tensor Processing Unit)[2] is built with custom ASIC chips and not ARM nor Apple Silicon chips. Presumably, S4TF would run natively against Google's TPU. In other words, the goals and execution targets of S4TF are not restricted by Apple's ecosystem of macOS/iOS/Macbooks/iMacs/iPads in any way. Yes, the SwiftUI framework is Apple-specific but Swift-the-core-language-syntax[3] is not.
  [to downvoters: if I wrote inaccuracies, please correct me.]
  [1] https://github.com/tensorflow/swift/blob/master/docs/WhySwif...
  [2] https://en.wikipedia.org/wiki/Tensor_Processing_Unit
  [3] https://docs.swift.org/swift-book/ReferenceManual/zzSummaryO...
  [-]
  - dunefox 1307 days ago
    What a coincidence that someone would choose his own programming language in an evaluation. There was no good reason not to choose Julia.
  - nsonha 1307 days ago
    For answering a rhetorical question? Yes of course that guy who came up with this also roots for swift to gain more traction outside Apple ecosystem but the reality hasn't going in that direction so far. There are plenty of cool languages out there like f#, julia, mathematica etc and the ML mob settled for python which is an average language so there is no reason to believe they are attracted to swift.
    And you just ignored the hardware comment
- YetAnotherNick 1308 days ago
  I find swift really good in terms of ease of coding and speed. But yeah, I would have preferred Julia or something than swift.
- maxerickson 1308 days ago
  People that make apps for Apple hardware?
  Seems kind of obvious.
- yunohn 1308 days ago
  I agree on the hardware part. MacOS only supports AMD GPUs which are unsupported by Tensor flow...
  [-]
  - c16a 1307 days ago
    Tensorflow can work with ROCM which is equivalent to Nvidia's CUDA.
    [-]
    - yunohn 1306 days ago
      There is no official support, only AMD's custom TF branch. Further ROCM itself works only on Linux. I believe most people writing Swift are using MacOS.
      [-]
      - amkkma 1306 days ago
        And Julia can do custom ROCM codegen, either by itself or through array and kernel abstractions :) https://github.com/JuliaGPU/AMDGPU.jl
        https://juliagpu.gitlab.io/KernelAbstractions.jl/
        https://github.com/JuliaGPU/GPUArrays.jl/pulse
        [-]
        yunohn 1306 days ago
        Indeed, the Julia scene looks enticing right now. If only AMD ROCM supported MacOS... :/
- vimy 1308 days ago
  That might change with Apple Silicon and Apple’s new ML Compute framework.
  [-]
  - rudedogg 1307 days ago
    It seems like they'll be making more consumer focused GPUs/ML chips to me. I think Apple Silicon is exciting, but I imagine for serious work you'll still need an external enclosure and a dedicated GPU from AMD.
    Or Linux/Windows :(. Losing Nvidia/CUDA kind of killed ML on macOS overnight.
- iddan 1308 days ago
  Swift is gaining momentum outside the Apple ecosystem. Web servers are being built and the Tensorflow team explains in length it’s advantages for developing ML models with it.
  [-]
  - pjmlp 1308 days ago
    What momentum? Even IBM gave up on it.
    [-]
    - rudedogg 1307 days ago
      I don't have any inside knowledge, but as a developer using Swift (that followed the Swift web frameworks closely):
      I think Kitura only existed as a long shot for kicking off their cloud offering. They were hoping iOS developers would code their backends in Swift/Kitura and host on IBMs cloud.
      But that never happened - so Kitura was killed. Swift on the server is still niche. And the companies that used a Swift backend overwhelmingly went with https://vapor.codes instead. It is faster, and has a nicer API.
    - rvz 1307 days ago
      Amazon? [0][1][2]
      [0] https://swift.org/blog/aws-lambda-runtime/
      [1] https://aws.amazon.com/blogs/opensource/continuous-delivery-...
      [2] https://github.com/amzn/smoke-aws
      [-]
      - pjmlp 1307 days ago
        Numbers?
- xiaodai 1308 days ago
  The appeal isn't broad enough
- ericmay 1308 days ago
  Working on expanding it outside of the Apple/iOS ecosystem but many (probably most) developers use Mac. You can write Swift code in Linux right now.
  I’d also say that it appears that Swift is going to be a great language for machine learning. Things like calculating gradients are built in to the language and you can import Python to fill in gaps until Swift libraries are ready. And Swift is quite fast.
  [-]
  - MiroF 1307 days ago
    I tried to use Swift recently while avoiding Xcode crud. It's a nightmare, definitely not going to be adopted until Apple goes more hands off.
  - pjmlp 1308 days ago
    Try to use Framework code on Linux.
hedgehog 1307 days ago
The name is a bit confusing. Swift for TensorFlow combines a bunch of things:
```
  - adding autodiff to Swift language & compiler
  - neural net construction & training API in Swift
  - low-friction Python bindings
  - low-friction C++ interop
  - ability to run neural nets TensorFlow using the C++ interop 
  - ability to alternately run neural nets directly on XLA ("X10")
```
The Swift changes are supposed to get mainlined and I think at least some of the stuff related to Python and C++ already are. The idea is that Swift is nice enough to cover experiment + train + embed as library even on mobile platforms. It's a big engineering project and I hope Google keeps funding the work.
pjmlp 1308 days ago
Still no Windows support at the level of DirectML, or Julia.
Razengan 1307 days ago
Related: Why Swift for TensorFlow: https://github.com/tensorflow/swift/blob/master/docs/WhySwif...
[-]
- dunefox 1307 days ago
  If you're dead set on using a particular language you can find reasons why it appears suitable even if they don't make any sense.
  [-]
  - Razengan 1304 days ago
    That logic works both ways.
suyash 1307 days ago
For those who prefer an alternate language here is Java for TensorFlow project https://www.tensorflow.org/install/lang_java
cube2222 1308 days ago
I think the title is cut off at the end.
YetAnotherNick 1308 days ago
I am not getting the exact usecase. If the speed is the issue, then I think tf.function solves that. For most of the operation we need to use python via pythonkit, so the things which can't be sped up using tf.function, can't be sped up using swift. Also if we need to use python everywhere, type safety is also very minor.
[-]
- ddragon 1307 days ago
  The unique property is the ability to just pick any code or library that is unaware of the differentiation library (unlike tf.function as it needs to specifically use tf methods) and get the gradient. In a language like Julia this is immediately useful as it has a massive ecosystem of numerical code that make sense to get gradients (like differential equations and the SciML project [1], or less conventional stuff like raytracers), but in a language like Swift (as there is no meaning to gradient of GUI libraries or frontend stuff) it is more of a "if you build they'll come" faith from Google.
  But regardless unique features, it's a have cake and eat it too type of interface. You don't need to learn a second language within the language like tensorflow's tf.* making it even more natural and flexiblethan pytorch, including all debug mechanisms of the host language itself, but you still get compile time graph creation like tensorflow, including all kinds of optimizations. It makes other approaches seem primitive by comparison, but creating it is much more complex, and the main audience is already more than used to using language within language solutions (like numpy) which can provide something almost as good even if less elegantly, so it's not easy to convince people as well (when it involves changing programming languages).
  [1] https://sciml.ai/
- MiroF 1307 days ago
  > exact usecase
  Static type checking?
socialdemocrat 1307 days ago
I am a fan of both Swift and Julia but I honestly don’t see the point of Swift in scientific computing. Julia is just a way better fit.
However for App development Swift of course has a far more impressive stack.
tanilama 1307 days ago
I still think it is lacking in this proposal to justify why picking Swift as the host language