I found Kubeflow Pipelines to be worse than vanilla Argo. Not only is the documentation poor for the python DSL that gets compiled to the Argo workflow spec but some operations are downright impossible to express in the DSL. Vanilla Argo + RBAC works wonderfully especially with Argo Server since v3+.
I agree. For anything sophisticated you have to use the Kubernetes Python client combined with the KFP DSL. The compiled Argo Workflow spec can become such a mess that you quickly run into a size limitations -- especially if you choose to deploy a Python function directly (which then uses Kaniko to build the containers in cluster and run them via Argo Workflows). At that point I would rather use Argo directly. And KFP is generally quite behind in Argo Workflow versions.
And it's not like KFP makes CI/CD (MLOps) any easier than Argo Workflows itself.
Not from said company, but did do a workflow tool comparison at my work and we also went with Argo.
Argo had better kubernetes + surrounding ecosystem integration out of the box, it was designed to run containers by default which suited us because we had mixed language workloads. Airflow was mostly Python specific, unless you then ran plugins and extensions, the config/pipeline definition was written in Python which I didn’t want to do after witnessing my teammates write the worst Python I’ve seen in my career, and last time I evaluated it, it depended on a bunch of external, Python specific tools (celery etc) that I had previously found painful to run.
Argo is extremely capable at its focus — coordinating workflows on Kubernetes.
I had used Airflow for a few years, and looked into Prefect; in retrospect I'm very happy we chose Argo.
Use Argo if:
- Your tasks are containerized.
- You're using Kubernetes, and can benefit from what it can offer — individually sized containers, autoscaling, fault-tolerance.
- You have loosely coupled tasks — which pass at most pass files to each other, rather than python objects.
- You don't have tens of thousands of tasks / streaming / etc.
Airflow can run on Kubernetes, but with Airflow we ended up having equally sized workers up 24/7 — whether or not it was running an expensive job, a query on a remote system, or nothing.