Careful, folks. S4TF is pretty much dead on arrival. It was pushed aggressively by Chris Lattner (for obvious reasons) but he left Google a while ago and since then most internal users lost interest. There's nothing in Swift that's inherently suitable for ML and building the ecosystem is a ton of work; without all the political pushing, it went nowhere and is close to a "semi-abandoned research project" phase.
Speed, too. For PyTorch to train models and run inference quickly, your Python code gets translated to C++/CUDA. Part of the idea with S4TF is to be able to write ML code in a single, fast language.
S4TF still requires either c cuda kernels or XLA. Julia on the other hand has JIT GPU codegen and its CPU codegen has been benchmarked to beat openblas
He probably means Tullio.jl (which also seems to integrate with Julia's source to source differentiation library Zygote.jl, the main competitor to Swift for Tensorflow):
Regardless if it can consistently beat Fortran/BLAS in every area, in general JIT languages have more opportunities for optimizations than AoT languages, so it's interesting to see what comes out of a language that focuses on leveraging this to get the most performance.
I'm surprised re: blas - that is closer than I thought! Still a huge gap on the GPU, but still impressive.
> in general JIT languages have more opportunities for optimizations than AoT languages
I'm not sure I agree with this. I would say the opportunities for any non-GC mature AoT language (C++, Rust, etc.) is going to be pretty much the same, since you can just attach a JIT to most AoT langs.
The GPU gap is only if written in the high level index or loop style. There is little to no gap if done either using array abstractions (broadcast, map etc) or at a level similar to Cuda C (though with nicer Julia abstractions and syntax): https://juliagpu.org/cuda/
The Julialab at MIT is working on making the higher level codegen faster
I guess that makes sense to me.. you can just automatically convert the C in BLAS to Julia and then if they're both being converted to llvm ir by clang anyways than i guess it'll be about as fast!
That's not at all what Julia is doing. It's much more sophisticated in that it has very low level intrinsic primitives that can compose and it optimizes the IR to make it fast and then compiles it to CUDA. These all map to Julia constructs.
Sure, if you give static language a JIT they'll be able to get the advantages of having JIT, though language semantics still matter. A language built for JITs like Julia or Common Lisp have native ways of interfacing with the compiler, and programs are built without worry of exponential explosion of implementations during method monomorphization (as you'll only compile the optimal versions that you'll actually use, based on runtime information, without having to be pessimist as any overspecialization can be fixed on demand). AoT languages would probably need a compiler pragma or type similar to a dynamic boxing but for delayed monomorphization/compilation for methods when you want to avoid compiling all paths AoT (which might be a way to allow for example tensor specialization on sizes, similar to StaticArrays on Julia).
I am not too experienced with Julia, but my understanding was that it uses LLVM to jit itself. Since the LLVM jit compiler is also an API available to C++, anything that can be done in Julia can be done with jit to LLVM api in C++.
Then you just compile the methods that you'll actually use with LLVM right before using them.
Sorry, you're right in that Julia is written in C/C++, so everything Julia does can be solved in those by writing a language (like Julia itself, and not unlike Tensorflow original interface) and compiling it on demand and finding a way to eval the new code and recover the results. I was talking along the ways of how to make it sort of convenient (at least viable to implent unlike the former), as an extension to the C++ compiler itself where you can just tell the compiler what stays AoT and what is JIT'd but otherwise keep the same C++ syntax.
Not to mention if you want to reimplement Julia's logic in C++ you'll have to develop it's sophisticated type inference, since Julia compiler is so aggressive that it will compile at once entire blocks of program (the entire program if it can) as long as it can infer what types are used downstream, which is why it can compete with AoT compiled languages (it's basicaly a "Just Ahead of Time Compiler")
The crux is that which are "the methods that you'll actually use" is very difficult to answer. A lot of effort is put into this on the Julia "Package compiler" and "Compiler" projects.
If true, it does not surprise me. While there is a lot of language war going on in the ML ecosystem, I never really heard anyone using, planning to use or waiting for swift for TensorFlow.
It might be a nice language (never used it), but there are other contenders with more engineers/scientists support like Rust and Julia, for which the advantages were clearer.
Finally, the whole ordeal got a very bad look from having its main proponent being the creator of Swift instead of the actual community pushing for it.
Generally, your metaphors should not rely on comparison to a pretty traumatic event that has happened to quite a few people, many of whom might be around you without you knowing.
Just letting you know the reality that a lot of people view "stillborn" events as particularly traumatic. I'm not going to spend a lot of time arguing about this because it's a pretty simple point.
It's up to you to choose what kind of person you want to be, I don't have control over what you say.
I think the difference is that the term "Dead" is used in a lot of different contexts. E.g. Battery is dead. Stillborn is not, and it mainly associated used in one very traumatic context.
I know anorexics for whom any mention of food can be a trigger. Should metaphors involving consumption be verboten in public discourse because they might be read by someone with an eating disorder?
Anyone who is avoiding "master, black list or sanity check" probably thinks abortion is super awesome and that the term "abort" should never be stigmatized.
Best not to worry about such silly things and keep writing code.
I'm not sure this is really going to take off, it seems that most people who are abandoning TF are moving to Jax or pytorch. My own experience with Jax is that it is much easier to use then TF, just an all round more pleasant experience. It would be interesting to try this, but at this point I'm not really willing to learn 'yet another deep learning framework' and the extreme anti-user problems that TF had make me loath to give it another shot, even with a presumably better frontend. Moreover, I think that python is just a better all-round ML/data science language at this point. Has anyone tried both Jax and this and would be willing to give us their thoughts on strengths and weaknesses of each?
I'm skeptical of JAX. It feels good right now, but when the first TF beta version came out it was very much like that too - clean, simple, minimal, and just a better version of Theano. Then the "crossing the chasm" effort started and everyone at Google wanted to be part of it, making TF the big complex mess it is today. It's a great example of Conway's Law. I'm not convinced the same won't happen to JAX as it catches on.
PyTorch has already stood the test of time and proven that its development is led by a competent team.
I know where you're coming from, but TF in my opinion was very user-hostile even on arrival. I can't tell you how much hair-pulling I did over tf.conds, tf.while_loops and the whole gather / scatter paradigm for simple indexing into arrays. I really think the people working on it wanted users to write TF code in a certain, particular way and made it really difficult to use it in other ways. Just thinking back on that time still raises my blood pressure! So far Jax is much better and I'm cautiously optimistic they have learned lessons from TF.
I had the opposite experience. The early TF versions were difficult to use in that they required a lot of boilerplate code to do simple things, but at least there was no hidden complexity. I knew exactly what my code did and what was going on under the hood. When I use today's high-level opaque TF libraries I have no idea what's going on. It's much harder to debug subtle problems. The workflow went wrong "Damn, I need to write 200 lines of code to do this simple thing" to "I need to spend 1 hour looking through library documentations, gotchas, deprecation issues and TF-internal code to figure out which function to call with what parameters and check if it actually does exactly what I need" - I much prefer the former.
Having barriers of entry is not always a bad thing - it forces people to learn and understand concepts instead of blindly following and copying and pasting code from a Medium article and praying that it works.
But I agree with you that there are many different use cases. Those people who want to do high-level work (I have some images, just give me a classifier) shouldn't need to deal with that complexity. IMO the big mistake was trying to merge all these different use cases into one framework. Let's hope JAX doesn't go down the same route.
Not quite sure why you picked those particular examples... JAX also requires usage of lax.cond, lax.while_loop, and ops.segment_sum. Only gather has been improved with slice notation support. IMO, TF has landed on a pretty nice solution to cond/while_loop via AutoGraph.
While jax has those operations you don't always need them, it depends on what transformations you want to do (JIT or grad) and they have been working on making normal control structures compatible with all transformations
You can't blame the TF people for things like while_loop. Those are inherited from Theano, and back then the dynamic graph idea wasn't obvious.
JAX is indeed a different situation as it has a more original design (although TF1 came with a huge improvement in compilation speed, so maybe there were innovations under the hood). But I don't know if I like it. The framework itself is quite neat, but last time I checked, the accompanying NN libraries had horrifying designs.
The difference is that in TF1 you had to use tf.cond, tf.while_loop etc for differentiable control flow. In JAX you can differentiate Python control flow directly, e.g.:
In [1]: from jax import grad
In [2]: def f(x):
...: if x > 0:
...: return 3. * x ** 2
...: else:
...: return 5. * x ** 3
...:
In [3]: grad(f)(1.)
Out[3]: DeviceArray(6., dtype=float32)
In [4]: grad(f)(-1.)
Out[4]: DeviceArray(15., dtype=float32)
In the above example, the control flow happens in Python, just as it would in PyTorch. (That's not surprising, since JAX grew out of the original Autograd [1]!)
Structured control flow functions like lax.cond, lax.scan, etc exist so that you can, for example, stage control flow out of Python and into an end-to-end compiled XLA computation with jax.jit. In other words, some JAX transformations place more constraints on your Python code than others, but you can just opt into the ones you want. (More generally, the lax module lets you program XLA HLO pretty directly [2].)
There are a bunch of frameworks built on top of Pytorch too (fastAI, lighting, torchbearer, ignite...), I don't see why this should be a problem (or at least a problem to JAX but not to Pytorch)
IMO, this is not a fair comparison because Pytorch spans a larger amount of abstraction than jax (I don't quite know how to explain it other than "spans a larger amount of abstraction").
You can do much of the jax stuff in pytorch, you can't do the high level nn.LSTM stuff in jax, you have to use like flax or objax or something.
Oh I just noticed that you're one of the people behind that recent GAN compression work! Really cool stuff and a big step up this year, I've been following the field for a lil bit.
In your first sentence you're mistaking JAX and XLA
XLA: Accelerated Linear Algebra, I guess it's kind of a backend/compiler that optimizes Linear Algebra/Deep Learning calculations with some very interesting techniques, among them fusing kernels
JAX: In some sense syntax sugar over XLA, but a better way of describing it is Composable transformations + Numpy + some Scipy. The composable transformations allow you to take derivatives (be them single, multi or vector valued functions and also higher order derivatives), JIT a function (which is them compiled to XLA), 2 forms of parallelism (vmap and pmap) and others, while being compatible with one another and with both TPUs, GPUs and CPUs
"TensorFlow Probability (TFP) is a library for probabilistic reasoning and statistical analysis that now works on JAX! For those not familiar, JAX is a library for accelerated numerical computing based on composable function transformations.
We have ported a lot of TFP's most useful functionality to JAX while preserving the abstractions and APIs that many TFP users are now comfortable with."
Tensorflow is migrating a bunch of stuff to JAX. Even they use the "library" word for their own porting. For a user like me, it looks like Jax is a library that tensorflow uses...but the end-user usable library is tensorflow.
Hi, tech lead for TFP here. The wording here was unclear -- sorry! We're fixing it presently.
We are not migrating away from TF; far from it!
The change here was to interoperate with TF and JAX (and numpy!), by way of some rewrite trickery under the hood. Essentially, we wrote a translation layer that implements the TF API surface (or, the parts we actually use) in terms of numpy & JAX primitives [1]. This lets us leave most TFP code intact, written in terms of the TF API, but interoperate with JAX by way of the API translation layer. (Actually we implemented numpy support first, and mostly got JAX for "free" since JAX is largely API-compatible with numpy).
Sorry for any confusion!
We're pretty stoked about this work, so happy to answer any other questions you may have (also feel free to chime in on the github tracker or email tfprobability@tensorflow.org)
Last time I looked the automatic differentiation was in a compiler branch with no immediate plans to merge in master.
But overall it is promising. I even installed Swift on Linux to play with it, didn't get to ML as I have an AMD GPU and this is a can of worms. Hope it's finished one day.
I would prefer for Julia ml libraries to become mainstream. But, it is what it is.
Also, the ideal for me would be Rust for tensorflow, but the slower compile times (didn't compare with Swift) are an impediment for an iterative workflow such as tweaking models.
A Rust for TensorFlow (and/or a "RustTorch") would be awesome.
I hope all the work being done on improving incremental compilation[a] and developing interactive Rust REPLs like evcxr[b] makes using Rust for AI a practical reality.
tch-rs is really nice. I have built a Rust sequence labeler + dependency parser + lemmatizer on top of it. It supports multi-task learning, finetuning of BERT-like models, model distillation, etc.:
Unfortunately, when I started this guillaume-be's rust-bert crate wasn't around yet (I think), so I also ported some of the models from Huggingface transformers:
There are torch bindings that are used by some users but what I personaly would like is a JAX clone built on top of Rust.
I see a way to do it [0] but... I already have a PhD to finish.
[0]: a macro compiling functions into an intermediate representations that are transformed with const functions (gradient computation) at compile time and jitted into XLA at runtime.
I recently spend some days playing with differentiable programming in Swift on Linux:
* as you said, auto-diff is in a branch or the Google fork of the project
* the only pre-built images are for Ubuntu 18.04
* on Linux, the REPL seems to somewhat broken
* many libraries are assuming OSX or iOS
I don't feel a lot of hope for adoption of Swift on Linux. Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem.
Meanwhile, the Open Source community is much more interested in Rust than Swift.
For the differentiable programming - this is what got me excited, but after trying, I was a bit underwhelmed. Not that it isn't great technology, its just not figured out yet. I tried to come up with a use case outside of ML and the one I tried wasn't really applicable.
I do feel however that someone will come up with something and that it will have quite some impact.
Apple I believe has a reason to work on Swift for Linux, and that reason is their considerable cloud infrastructure and various backend services.
I’m sure being able to share domain specific Swift code between client apps and backend would be pretty high on their list of wants.
Also, little clues like the way Xcode generates SwiftPM packages in a Linux-ready fashion out of the box shows that they care at least a bit.
Having a lot of interest in the programming languages, my opinion is that Swift is a damn good one. It’s very high level, supports FP deep enough, has a great type system (that is getting better with every release), great OOP support, native performance characteristics and it still lets you get to a really low level when you need it.
I also like how they took great ideas from Haskell, Scala, Smalltalk, C# and others. I code daily in Scala and Swift, previously had done Erlang, Clojure, Common LISP, TypeScript, Ruby, Python, Haskell, OCaml, Java, PHP, C, Smalltalk and some others. In this list, Swift is now almost at the top.
They need to get Higher-Kinded Types and then it’s going to win the world (just kidding, JavaScript gets to win the world, unfortunately) :)
I can't speak for Apple, of course, but some indications of their seriousness are there.
SwiftNIO is a cross-platform asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. (https://github.com/apple/swift-nio)
Also, on official https://swift.org/download all releases and snapshots are automatically available for: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, CentOS 7, CentOS 8, Amazon Linux 2
Official Swift Server work group is a steering team that promotes the use of Swift for developing and deploying server applications: https://swift.org/server/
Official Swift 5.3 release goals have stated a major goal of "expanding the number of platforms where Swift is available and supported, notably adding support for Windows and additional Linux distributions."
(https://swift.org/blog/5-3-release-process/)
Yep. It's difficult to find usages for autodiff, and it looks like a very niche thing to add to a language. But, it's still cool. In a language with extensible syntax(ex: proc macros), this would sit in a library.
And I think you're right, there is a low chance that this gets traction. Especially since Chris Lattner is at SiFive, they probably will have some kind ofAI accelerators, but TF is for training rather then execution. So not sure they'll find a reason to push it.
Jeremy Howard from fast ai might be able to convince people to give it a try. But without people working on it full time, the chances are not great. Especially with a compiler fork that requires constant merges/rebases. But, who knows.
I guess there are lots of uses in optimisation problems, and in sampling algorithms for statistics. I don't know how easy it will be to sell Swift to people who now use Stan or Stata (or R) and don't think of themselves as programmers.
> In a language with extensible syntax(ex: proc macros), this would sit in a library.
And this would allow easier iteration of different designs.
Does autodiff exist in R? Then why not use R since it seems it also has much ML algo support due to its focus on statistics - its used by many statisticians.
Is the idea here that Swift is a more approachable language and thus this is to lower the barrier of entry to TF?
I think the main selling points of Swift to tensorflow are:
* Speed, ML learning pipelines are often bottlenecked by the data load and transformation. TF had new mechanisms (the last I've seen was TF.data). But a language compiled to native is much more flexible in that regard.
* Type safety. Sometimes issues with the models can pop long after they have been running. The hope is that typed API's will show simple errors at compile times
* Auto differentiation built into the language. If I'm not mistaken, this is more powerful than backpropagation in TF, wich also has autodiff. The idea is that this would allow for more custom models without a performance penalty. My knowledge here is limited, since it's been over 2 years since I've implemented back propagation. I've successfully forgotten most of the things I knew about ML/DL.
I don't have any experience with R, but from what I've heard, it was known to be slow. But that might have changed or I may have misunderstood the situation.
You cannot really do autodiff in a slow language. I mean you can but nobody wants to run large machine learning training algorithms on a slow language like Python or R.
You could write autodiff in say C++ but it is a user unfriendly language not well suited for machine learning and scientific computing.
Swift is a nicer high level language you can do autodiff in. But honestly I don’t see the point with Swift either.
Julia already does AutoDiff extremely well and outperforms Swift and pretty much everybody else.
> Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem.
Neither of these things are true, fwiw. Strong Linux support is an explicit current goal for Swift, though it is admittedly not all the way there yet. See this post regarding Swift 6 and its goals for wider support, for example: https://forums.swift.org/t/on-the-road-to-swift-6/32862
I question the long term viability of Swift for TensorFlow. I hope that I don’t sound like I am whining but I invested a fair amount of time with Swift from last fall through this February because I wanted: a better faster language to work in for DL than Python; I was also interested in trying some iOS, iPadOS, and macOS development; Swift looked interesting also for Linux server side work.
For Apple development I have found Flutter+Dart to be more pleasant. For DL I decided to stick with taking advantage of my 5+ years experience with TF in Python.
Off topic, but Julia and Flux are really worth checking out for DL.
Some advice: if you want to experiment with Swift for TensorFlow use Google’s colab and make your life easier. I wish I could get back the install times on my Linux GPU laptop and on macOS. If you pay for colab like I do, you get good GPU/TPU resources, and it is simply an easy and fun way to go.
Is this (future of tf is in swift) still the case? Because there have been speculation esp. when lattner left google.
Also projects like Jax has gained good momentum.
I guess it's really hard to convince scientist to move away off of their python legacy. And python scientific stack is really incredible.
They presumably wanted a semi-popular statically typed language as the gains of Julia over Python aren't enough to be worth it (and Julia isn't popular enough).
Semi-popular where? The typical iOS programmer isn't going to touch ML. In the relevant demography, Julia had (still has) more users. Swift4TF was a poor choice, and now that Lattner is gone I doubt it has a future.
Julia is out because it's not statically typed and not enough of a difference from Python. If it was more popular then it may have had a chance on that alone but it's not. In terms of semi-popular statically typed modern languages that have good/performant C/C++ interop you have: Go, Swift and Rust. Can't think of any others to be honest. I'm guessing Go's lack of generics excluded it and Rust's complexity excluded it.
TF's own analysis[0] was hesitant on static vs dynamic, finding downsides for either choice and suggested choosing a "middle ground" (which IMHO was doable starting from Julia's optional types).
So static typing might be your requirement, but it wasn't TF's requirement. When examining Julia later on, they don't mention dynamic types as a downside at all - and every argument save for "we're familiar with Swift's internal working" was contradicted within the same document...
Now, being familiar with Swift is a very understandable reason for their choice, but it's IMHO not the best choice for others which are not as familiar with Swift or ML. Most ML users are very familiar with Python and I think they will find Julia to be a welcome improvement on Python's pain points without Swift's baggage.
There's a difference between popular and "not a toy language." I'm not arguing Julia isn't used, I'm arguing it's not used often enough to be a merit irrespective of other reasons.
Julia is used in serious scientific work. Swift isn’t. Massive projects running on super computers such a next generation climate models are written Julia. Many best of breeds scientific packages are written in Julia. Swift while great for App development has limited presence in the scientific field.
This website isn't counting "Julia only" stacks, it's just companies that have used Julia for one project or another. If you really want to compare that to rust, julia is again going to fall short.
Those were not the companies I had in mind with my Rust remark, but if you so wish, Swift, Kotlin/Native, Go, C++, .NET Native, Verona, Checked C, Objective-C.
It remains to be seen how much Rust they will actually make into tier 1 OS SDKs for userspace applications.
In fact, currently it looks more they are bringing their experience with Rust into their platform languages than anything else.
Swift memory ownership, Verona, C++ Core Guidelines checker, Kotlin/Native ownership rules.
Beware wishing for Julia's downfall with glass ceilings.
I really don't understand why they didn't go with Julia, it fits the purpose much better than Swift, already having a ML ecosystem and having great interop with Python, C, C++, R and Matlab.
Heck JAX, where a lot of TF refugees are going, is pretty similar to Zygote
>Who would want to use an Apple-centric language for ML, seriously? Apple hardware is outright incompatible
Based on how you wrote your comment, I'm guessing you may not know this S4TF is a Google initiative.
Yes, Chris Lattner used to work for Apple but he was at Google Brain during the start of this project. When his team wanted to create a language where automatic differentiation and gradient descent was a 1st-class concept in the core syntax (i.e. without libraries) of the programming language , they looked at Rust, Julia, Swift, etc[1]. They ended up choosing Swift as the base language to extend with new ML syntax.
Also, Chris has said in previous interviews that he thought of Swift as general purpose language and that hoped it would be used outside of Apple's ecosystem.
EDIT to address the confusion where some assume Apple hardware dictates the direction of S4TF: For those not aware, Google's hardware TPU (Tensor Processing Unit)[2] is built with custom ASIC chips and not ARM nor Apple Silicon chips. Presumably, S4TF would run natively against Google's TPU. In other words, the goals and execution targets of S4TF are not restricted by Apple's ecosystem of macOS/iOS/Macbooks/iMacs/iPads in any way. Yes, the SwiftUI framework is Apple-specific but Swift-the-core-language-syntax[3] is not.
[to downvoters: if I wrote inaccuracies, please correct me.]
For answering a rhetorical question? Yes of course that guy who came up with this also roots for swift to gain more traction outside Apple ecosystem but the reality hasn't going in that direction so far. There are plenty of cool languages out there like f#, julia, mathematica etc and the ML mob settled for python which is an average language so there is no reason to believe they are attracted to swift.
There is no official support, only AMD's custom TF branch. Further ROCM itself works only on Linux. I believe most people writing Swift are using MacOS.
It seems like they'll be making more consumer focused GPUs/ML chips to me. I think Apple Silicon is exciting, but I imagine for serious work you'll still need an external enclosure and a dedicated GPU from AMD.
Or Linux/Windows :(. Losing Nvidia/CUDA kind of killed ML on macOS overnight.
Swift is gaining momentum outside the Apple ecosystem. Web servers are being built and the Tensorflow team explains in length it’s advantages for developing ML models with it.
I don't have any inside knowledge, but as a developer using Swift (that followed the Swift web frameworks closely):
I think Kitura only existed as a long shot for kicking off their cloud offering. They were hoping iOS developers would code their backends in Swift/Kitura and host on IBMs cloud.
But that never happened - so Kitura was killed. Swift on the server is still niche. And the companies that used a Swift backend overwhelmingly went with https://vapor.codes instead. It is faster, and has a nicer API.
Working on expanding it outside of the Apple/iOS ecosystem but many (probably most) developers use Mac. You can write Swift code in Linux right now.
I’d also say that it appears that Swift is going to be a great language for machine learning. Things like calculating gradients are built in to the language and you can import Python to fill in gaps until Swift libraries are ready. And Swift is quite fast.
The name is a bit confusing. Swift for TensorFlow combines a bunch of things:
- adding autodiff to Swift language & compiler
- neural net construction & training API in Swift
- low-friction Python bindings
- low-friction C++ interop
- ability to run neural nets TensorFlow using the C++ interop
- ability to alternately run neural nets directly on XLA ("X10")
The Swift changes are supposed to get mainlined and I think at least some of the stuff related to Python and C++ already are. The idea is that Swift is nice enough to cover experiment + train + embed as library even on mobile platforms. It's a big engineering project and I hope Google keeps funding the work.
I am not getting the exact usecase. If the speed is the issue, then I think tf.function solves that. For most of the operation we need to use python via pythonkit, so the things which can't be sped up using tf.function, can't be sped up using swift. Also if we need to use python everywhere, type safety is also very minor.
The unique property is the ability to just pick any code or library that is unaware of the differentiation library (unlike tf.function as it needs to specifically use tf methods) and get the gradient. In a language like Julia this is immediately useful as it has a massive ecosystem of numerical code that make sense to get gradients (like differential equations and the SciML project [1], or less conventional stuff like raytracers), but in a language like Swift (as there is no meaning to gradient of GUI libraries or frontend stuff) it is more of a "if you build they'll come" faith from Google.
But regardless unique features, it's a have cake and eat it too type of interface. You don't need to learn a second language within the language like tensorflow's tf.* making it even more natural and flexiblethan pytorch, including all debug mechanisms of the host language itself, but you still get compile time graph creation like tensorflow, including all kinds of optimizations. It makes other approaches seem primitive by comparison, but creating it is much more complex, and the main audience is already more than used to using language within language solutions (like numpy) which can provide something almost as good even if less elegantly, so it's not easy to convince people as well (when it involves changing programming languages).
The type system?
I like Swift but looking at the code examples I got to say preparing data and setting up a model is 10x easier in Julia.
S4TF still requires either c cuda kernels or XLA. Julia on the other hand has JIT GPU codegen and its CPU codegen has been benchmarked to beat openblas
Source on this claim?
https://discourse.julialang.org/t/realistically-how-close-is...
https://github.com/mcabbott/Tullio.jl
Regardless if it can consistently beat Fortran/BLAS in every area, in general JIT languages have more opportunities for optimizations than AoT languages, so it's interesting to see what comes out of a language that focuses on leveraging this to get the most performance.
> in general JIT languages have more opportunities for optimizations than AoT languages
I'm not sure I agree with this. I would say the opportunities for any non-GC mature AoT language (C++, Rust, etc.) is going to be pretty much the same, since you can just attach a JIT to most AoT langs.
The Julialab at MIT is working on making the higher level codegen faster
I am not too experienced with Julia, but my understanding was that it uses LLVM to jit itself. Since the LLVM jit compiler is also an API available to C++, anything that can be done in Julia can be done with jit to LLVM api in C++.
Then you just compile the methods that you'll actually use with LLVM right before using them.
Not to mention if you want to reimplement Julia's logic in C++ you'll have to develop it's sophisticated type inference, since Julia compiler is so aggressive that it will compile at once entire blocks of program (the entire program if it can) as long as it can infer what types are used downstream, which is why it can compete with AoT compiled languages (it's basicaly a "Just Ahead of Time Compiler")
It might be a nice language (never used it), but there are other contenders with more engineers/scientists support like Rust and Julia, for which the advantages were clearer.
Finally, the whole ordeal got a very bad look from having its main proponent being the creator of Swift instead of the actual community pushing for it.
I didn't like that idea then, I still dont like it now. But the idea in itself is very Apple (ish).
Why do you say so?
It's up to you to choose what kind of person you want to be, I don't have control over what you say.
https://en.m.wikipedia.org/wiki/Miscarriage_of_justice
Best not to worry about such silly things and keep writing code.
In any case, the primary meaning of 'abort' is more general, so it shouldn't be compared to the metaphorical use of 'stillborn'.
PyTorch has already stood the test of time and proven that its development is led by a competent team.
Having barriers of entry is not always a bad thing - it forces people to learn and understand concepts instead of blindly following and copying and pasting code from a Medium article and praying that it works.
But I agree with you that there are many different use cases. Those people who want to do high-level work (I have some images, just give me a classifier) shouldn't need to deal with that complexity. IMO the big mistake was trying to merge all these different use cases into one framework. Let's hope JAX doesn't go down the same route.
Not quite sure why you picked those particular examples... JAX also requires usage of lax.cond, lax.while_loop, and ops.segment_sum. Only gather has been improved with slice notation support. IMO, TF has landed on a pretty nice solution to cond/while_loop via AutoGraph.
JAX is indeed a different situation as it has a more original design (although TF1 came with a huge improvement in compilation speed, so maybe there were innovations under the hood). But I don't know if I like it. The framework itself is quite neat, but last time I checked, the accompanying NN libraries had horrifying designs.
I'm ill-informed - but isn't that exactly what lax is?
Structured control flow functions like lax.cond, lax.scan, etc exist so that you can, for example, stage control flow out of Python and into an end-to-end compiled XLA computation with jax.jit. In other words, some JAX transformations place more constraints on your Python code than others, but you can just opt into the ones you want. (More generally, the lax module lets you program XLA HLO pretty directly [2].)
Disclaimer: I work on JAX!
[1] https://github.com/hips/autograd [2] https://www.tensorflow.org/xla/operation_semantics
And now there are already multiple NN libraries for JAX from Google...
You can do much of the jax stuff in pytorch, you can't do the high level nn.LSTM stuff in jax, you have to use like flax or objax or something.
- Dex: https://github.com/google-research/dex-lang/ - Hasktorch: https://github.com/hasktorch/hasktorch - This initiative from the Python Typing-sig: https://docs.google.com/document/d/1oaG0V2ZE5BRDjd9N-Tr1N0IK...
https://futhark-lang.org/blog/2020-03-15-futhark-0.15.1-rele...
and it seems to be ok for DL
https://elsman.com/pdf/fhpnc19.pdf
From what I've been able to tell, (no shade to the Pytorch team which has many different priorities) work has been somewhat slow going on the port.
Further, this is dynamic type checking as you mentioned.
Congrats!
The latest release of Tensorflow probability uses JAX under the hood. So what do you mean when you say you're moving to JAX versus Tensorflow
XLA: Accelerated Linear Algebra, I guess it's kind of a backend/compiler that optimizes Linear Algebra/Deep Learning calculations with some very interesting techniques, among them fusing kernels
JAX: In some sense syntax sugar over XLA, but a better way of describing it is Composable transformations + Numpy + some Scipy. The composable transformations allow you to take derivatives (be them single, multi or vector valued functions and also higher order derivatives), JIT a function (which is them compiled to XLA), 2 forms of parallelism (vmap and pmap) and others, while being compatible with one another and with both TPUs, GPUs and CPUs
"TensorFlow Probability (TFP) is a library for probabilistic reasoning and statistical analysis that now works on JAX! For those not familiar, JAX is a library for accelerated numerical computing based on composable function transformations.
We have ported a lot of TFP's most useful functionality to JAX while preserving the abstractions and APIs that many TFP users are now comfortable with."
Tensorflow is migrating a bunch of stuff to JAX. Even they use the "library" word for their own porting. For a user like me, it looks like Jax is a library that tensorflow uses...but the end-user usable library is tensorflow.
We are not migrating away from TF; far from it!
The change here was to interoperate with TF and JAX (and numpy!), by way of some rewrite trickery under the hood. Essentially, we wrote a translation layer that implements the TF API surface (or, the parts we actually use) in terms of numpy & JAX primitives [1]. This lets us leave most TFP code intact, written in terms of the TF API, but interoperate with JAX by way of the API translation layer. (Actually we implemented numpy support first, and mostly got JAX for "free" since JAX is largely API-compatible with numpy).
Sorry for any confusion!
We're pretty stoked about this work, so happy to answer any other questions you may have (also feel free to chime in on the github tracker or email tfprobability@tensorflow.org)
[1] - https://github.com/tensorflow/probability/tree/master/tensor...
here's what everybody is puzzled on: it looks like the layers going forward are JAX -> Tensorflow -> Keras.
and we are seeing people moving to JAX directly. So this is ending up like a Flutter vs Kotlin issue (also within Google).
Do you envision JAX being low level .. and the high level tensorflow keras interface being the most usable api ?
But overall it is promising. I even installed Swift on Linux to play with it, didn't get to ML as I have an AMD GPU and this is a can of worms. Hope it's finished one day.
I would prefer for Julia ml libraries to become mainstream. But, it is what it is.
Also, the ideal for me would be Rust for tensorflow, but the slower compile times (didn't compare with Swift) are an impediment for an iterative workflow such as tweaking models.
I hope all the work being done on improving incremental compilation[a] and developing interactive Rust REPLs like evcxr[b] makes using Rust for AI a practical reality.
[a] https://doc.rust-lang.org/edition-guide/rust-2018/the-compil...
[b] https://github.com/google/evcxr
[1]: https://github.com/LaurentMazare/tch-rs -> https://crates.io/crates/tch
[2]: https://github.com/guillaume-be/rust-bert -> https://crates.io/crates/rust-bert
https://github.com/stickeritis/sticker2/
Unfortunately, when I started this guillaume-be's rust-bert crate wasn't around yet (I think), so I also ported some of the models from Huggingface transformers:
https://github.com/stickeritis/sticker-transformers/
At any rate, I can highly recommend the tch crate if you are looking to build neural networks in Rust.
I see a way to do it [0] but... I already have a PhD to finish.
[0]: a macro compiling functions into an intermediate representations that are transformed with const functions (gradient computation) at compile time and jitted into XLA at runtime.
But, adding auto differentiation to match the Swift for tensorflow behaviour, sounds like a serious undertaking, and I doubt is on anyone's radar.
But yeah, I've been wanting this for a while. Shoehorning Rust everywhere is the endgame.
* as you said, auto-diff is in a branch or the Google fork of the project * the only pre-built images are for Ubuntu 18.04 * on Linux, the REPL seems to somewhat broken * many libraries are assuming OSX or iOS
I don't feel a lot of hope for adoption of Swift on Linux. Apple obviously is not working on that (fair, they have no reason to do so) and the Swift community also has no focus on Linux, since ... they are in the Apple ecosystem. Meanwhile, the Open Source community is much more interested in Rust than Swift.
For the differentiable programming - this is what got me excited, but after trying, I was a bit underwhelmed. Not that it isn't great technology, its just not figured out yet. I tried to come up with a use case outside of ML and the one I tried wasn't really applicable.
I do feel however that someone will come up with something and that it will have quite some impact.
I’m sure being able to share domain specific Swift code between client apps and backend would be pretty high on their list of wants.
Also, little clues like the way Xcode generates SwiftPM packages in a Linux-ready fashion out of the box shows that they care at least a bit.
Having a lot of interest in the programming languages, my opinion is that Swift is a damn good one. It’s very high level, supports FP deep enough, has a great type system (that is getting better with every release), great OOP support, native performance characteristics and it still lets you get to a really low level when you need it.
I also like how they took great ideas from Haskell, Scala, Smalltalk, C# and others. I code daily in Scala and Swift, previously had done Erlang, Clojure, Common LISP, TypeScript, Ruby, Python, Haskell, OCaml, Java, PHP, C, Smalltalk and some others. In this list, Swift is now almost at the top.
They need to get Higher-Kinded Types and then it’s going to win the world (just kidding, JavaScript gets to win the world, unfortunately) :)
SwiftNIO is a cross-platform asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. (https://github.com/apple/swift-nio)
Distributed Membership Protocol implementations in Swift: https://github.com/apple/swift-cluster-membership
Docker Official Image packaging for Swift: https://github.com/apple/swift-docker
Also, on official https://swift.org/download all releases and snapshots are automatically available for: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, CentOS 7, CentOS 8, Amazon Linux 2
Official Swift Server work group is a steering team that promotes the use of Swift for developing and deploying server applications: https://swift.org/server/
Swift AWS Lambda Runtime: https://github.com/swift-server/swift-aws-lambda-runtime
Official Swift 5.3 release goals have stated a major goal of "expanding the number of platforms where Swift is available and supported, notably adding support for Windows and additional Linux distributions." (https://swift.org/blog/5-3-release-process/)
I already have my share of platforms I care about.
And I think you're right, there is a low chance that this gets traction. Especially since Chris Lattner is at SiFive, they probably will have some kind ofAI accelerators, but TF is for training rather then execution. So not sure they'll find a reason to push it.
Jeremy Howard from fast ai might be able to convince people to give it a try. But without people working on it full time, the chances are not great. Especially with a compiler fork that requires constant merges/rebases. But, who knows.
I guess there are lots of uses in optimisation problems, and in sampling algorithms for statistics. I don't know how easy it will be to sell Swift to people who now use Stan or Stata (or R) and don't think of themselves as programmers.
> In a language with extensible syntax(ex: proc macros), this would sit in a library.
And this would allow easier iteration of different designs.
Is the idea here that Swift is a more approachable language and thus this is to lower the barrier of entry to TF?
* Speed, ML learning pipelines are often bottlenecked by the data load and transformation. TF had new mechanisms (the last I've seen was TF.data). But a language compiled to native is much more flexible in that regard.
* Type safety. Sometimes issues with the models can pop long after they have been running. The hope is that typed API's will show simple errors at compile times
* Auto differentiation built into the language. If I'm not mistaken, this is more powerful than backpropagation in TF, wich also has autodiff. The idea is that this would allow for more custom models without a performance penalty. My knowledge here is limited, since it's been over 2 years since I've implemented back propagation. I've successfully forgotten most of the things I knew about ML/DL.
I don't have any experience with R, but from what I've heard, it was known to be slow. But that might have changed or I may have misunderstood the situation.
Number types are first class which is a big part of it and you can add extra custom passes to the JIT compiler in regular library code.
You could write autodiff in say C++ but it is a user unfriendly language not well suited for machine learning and scientific computing.
Swift is a nicer high level language you can do autodiff in. But honestly I don’t see the point with Swift either.
Julia already does AutoDiff extremely well and outperforms Swift and pretty much everybody else.
Neither of these things are true, fwiw. Strong Linux support is an explicit current goal for Swift, though it is admittedly not all the way there yet. See this post regarding Swift 6 and its goals for wider support, for example: https://forums.swift.org/t/on-the-road-to-swift-6/32862
For Apple development I have found Flutter+Dart to be more pleasant. For DL I decided to stick with taking advantage of my 5+ years experience with TF in Python.
Off topic, but Julia and Flux are really worth checking out for DL.
Some advice: if you want to experiment with Swift for TensorFlow use Google’s colab and make your life easier. I wish I could get back the install times on my Linux GPU laptop and on macOS. If you pay for colab like I do, you get good GPU/TPU resources, and it is simply an easy and fun way to go.
I guess it's really hard to convince scientist to move away off of their python legacy. And python scientific stack is really incredible.
So static typing might be your requirement, but it wasn't TF's requirement. When examining Julia later on, they don't mention dynamic types as a downside at all - and every argument save for "we're familiar with Swift's internal working" was contradicted within the same document...
Now, being familiar with Swift is a very understandable reason for their choice, but it's IMHO not the best choice for others which are not as familiar with Swift or ML. Most ML users are very familiar with Python and I think they will find Julia to be a welcome improvement on Python's pain points without Swift's baggage.
[0] https://github.com/tensorflow/swift/blob/master/docs/WhySwif...
https://juliacomputing.com/case-studies/
This website isn't counting "Julia only" stacks, it's just companies that have used Julia for one project or another. If you really want to compare that to rust, julia is again going to fall short.
It remains to be seen how much Rust they will actually make into tier 1 OS SDKs for userspace applications.
In fact, currently it looks more they are bringing their experience with Rust into their platform languages than anything else.
Swift memory ownership, Verona, C++ Core Guidelines checker, Kotlin/Native ownership rules.
Beware wishing for Julia's downfall with glass ceilings.
[0] https://diffsharp.github.io/
Apple hardware is outright incompatible to the kind of hardware we use daily in machine learning workstations.
Based on how you wrote your comment, I'm guessing you may not know this S4TF is a Google initiative.
Yes, Chris Lattner used to work for Apple but he was at Google Brain during the start of this project. When his team wanted to create a language where automatic differentiation and gradient descent was a 1st-class concept in the core syntax (i.e. without libraries) of the programming language , they looked at Rust, Julia, Swift, etc[1]. They ended up choosing Swift as the base language to extend with new ML syntax.
Also, Chris has said in previous interviews that he thought of Swift as general purpose language and that hoped it would be used outside of Apple's ecosystem.
EDIT to address the confusion where some assume Apple hardware dictates the direction of S4TF: For those not aware, Google's hardware TPU (Tensor Processing Unit)[2] is built with custom ASIC chips and not ARM nor Apple Silicon chips. Presumably, S4TF would run natively against Google's TPU. In other words, the goals and execution targets of S4TF are not restricted by Apple's ecosystem of macOS/iOS/Macbooks/iMacs/iPads in any way. Yes, the SwiftUI framework is Apple-specific but Swift-the-core-language-syntax[3] is not.
[to downvoters: if I wrote inaccuracies, please correct me.]
[1] https://github.com/tensorflow/swift/blob/master/docs/WhySwif...
[2] https://en.wikipedia.org/wiki/Tensor_Processing_Unit
[3] https://docs.swift.org/swift-book/ReferenceManual/zzSummaryO...
And you just ignored the hardware comment
Seems kind of obvious.
https://juliagpu.gitlab.io/KernelAbstractions.jl/
https://github.com/JuliaGPU/GPUArrays.jl/pulse
Or Linux/Windows :(. Losing Nvidia/CUDA kind of killed ML on macOS overnight.
I think Kitura only existed as a long shot for kicking off their cloud offering. They were hoping iOS developers would code their backends in Swift/Kitura and host on IBMs cloud.
But that never happened - so Kitura was killed. Swift on the server is still niche. And the companies that used a Swift backend overwhelmingly went with https://vapor.codes instead. It is faster, and has a nicer API.
[0] https://swift.org/blog/aws-lambda-runtime/
[1] https://aws.amazon.com/blogs/opensource/continuous-delivery-...
[2] https://github.com/amzn/smoke-aws
I’d also say that it appears that Swift is going to be a great language for machine learning. Things like calculating gradients are built in to the language and you can import Python to fill in gaps until Swift libraries are ready. And Swift is quite fast.
But regardless unique features, it's a have cake and eat it too type of interface. You don't need to learn a second language within the language like tensorflow's tf.* making it even more natural and flexiblethan pytorch, including all debug mechanisms of the host language itself, but you still get compile time graph creation like tensorflow, including all kinds of optimizations. It makes other approaches seem primitive by comparison, but creating it is much more complex, and the main audience is already more than used to using language within language solutions (like numpy) which can provide something almost as good even if less elegantly, so it's not easy to convince people as well (when it involves changing programming languages).
[1] https://sciml.ai/
Static type checking?
However for App development Swift of course has a far more impressive stack.