While I almost always use TensorFlow for work, I appreciate Skymind's open source Deeplearning4j library for use with Common Lisp (via Armed Bear CL), Java, and Scala. Sometimes living on the JVM is the best choice.
Thanks! Of note is we also contribute to keras and run javacpp: https://github.com/bytedeco/javacpp which allows us to run a lot of things you only get in python directly in c. We need to work on getting some of these things more well known. Hopefully we can focus more on better community growth this year.
We control the whole framework down to the bare metal.
We also have our own built in memory allocators.
There's a bit more to jvm than just training models. Java based application servers are still widely deployed for example.
There's a whole data engineering niche we target here as well (for the spark pipelines) where python isn't a good fit for that team.
Then there's the fact we're still easier to run keras on spark than any other framework thanks to our model import.
That doesn't account for what we're doing with inference. Many vendors in this space are just running kubernetes bundling other tools they don't control. We actually engage various large companies in custom chip development running other DL frameworks due to this low level control.
Because if I have to use DataBricks then I need to have things as notebooks, something that I'm desperately trying to migrate the team I work in away from so that we can have actually maintainable code that gets deployed, monitored and held to the same rigour/maintainability that we hold normal dev code to.
Also, being forced to use a cluster is such catastrophic overkill for so, so many tasks. I have teammates wanting to use spark/databricks just to process a handful of files in S3 totalling a few gb tops. Realistically we could do the same work, in a single container with Python/Julia/Scala/language of choice in same or less amount of time and with an order of magnitude more maintainability.
Good point. Maybe that's just as easy. We see companies with on-prem Spark clusters telling us that we're the easiest way to do DL there. Enterprise clients are doing a lot on prem, and their future is probably hybrid for a long time to come.
After doing a significant deep dive into Databricks as a third party solution for my team at work, we decided Databricks and a deep commitment to Spark was a very poor choice for machine learning, though Spark seems generally fine as an interface for map reduce or scheduled cluster compute tasks.
Spark is in 2019 what Hadoop was in ~2014. In 5-6 years Spark will be the cure-all basket that a bunch of people put all their eggs into not realizing the deep-seated limitations. This is especially true for machine learning.
Use of those tools (along with MLlib and virtually anything relying on a py4j bridge) was precisely the setup I tested and found to have unacceptably poor performance when taken across our range of both large and small workloads, in addition to many problems with deep inflexibility in controlling the runtime environment on a per-project basis (our most critical requirement).
See my other comment below with a link to a previous discussion.
Congrats to the SkyMind team, I have given a few talks on Machine Learning using DL4J and it’s been nothing but an excelllent framework for Java Developers to learn.
Fwiw, the Skymind team built Deeplearning4j, is the second-largest contributor to Keras after Google, and the sole maintainer of HyperOpt.
https://github.com/deeplearning4j/
https://github.com/hyperopt/hyperopt/
Our code serves as a bridge between the Python data science ecosystem and tools like Spark, Kafka, Hadoop, etc.
While I almost always use TensorFlow for work, I appreciate Skymind's open source Deeplearning4j library for use with Common Lisp (via Armed Bear CL), Java, and Scala. Sometimes living on the JVM is the best choice.
https://techcrunch.com/2013/06/28/uks-bskyb-wins-judgement-a...
https://www.eurogamer.net/articles/2016-06-20-no-mans-sky-st...
https://deeplearning4j.org/docs/latest/deeplearning4j-scaleo...
You can also import Keras models to train them on a Spark cluster with DL4J:
https://deeplearning4j.org/docs/latest/keras-import-overview
There's a bit more to jvm than just training models. Java based application servers are still widely deployed for example.
There's a whole data engineering niche we target here as well (for the spark pipelines) where python isn't a good fit for that team.
Then there's the fact we're still easier to run keras on spark than any other framework thanks to our model import.
That doesn't account for what we're doing with inference. Many vendors in this space are just running kubernetes bundling other tools they don't control. We actually engage various large companies in custom chip development running other DL frameworks due to this low level control.
Depending on what you're looking to do, we're still the standard for pre compiled binaries compiled as jar files: https://repo1.maven.org/maven2/org/nd4j/nd4j-native/1.0.0-be...
You won't find any other framework running pre cooked avx binaries and IBM power at the same time.
Happy to talk about more depending on what your focus is.
Also, being forced to use a cluster is such catastrophic overkill for so, so many tasks. I have teammates wanting to use spark/databricks just to process a handful of files in S3 totalling a few gb tops. Realistically we could do the same work, in a single container with Python/Julia/Scala/language of choice in same or less amount of time and with an order of magnitude more maintainability.
Spark is in 2019 what Hadoop was in ~2014. In 5-6 years Spark will be the cure-all basket that a bunch of people put all their eggs into not realizing the deep-seated limitations. This is especially true for machine learning.
See my other comment below with a link to a previous discussion.
https://news.ycombinator.com/item?id=19321372
People dramatically underestimate how far you can get with even a single machine and some slightly better engineering. Best ever example of this I've seen is the super impressive work done by Frank McSherry: http://www.frankmcsherry.org/assets/COST.pdf http://www.frankmcsherry.org/graph/scalability/cost/2015/02/...