Aim: Record, search and compare ML training runs


45 points | by polm23 114 days ago


  • marcinzm 113 days ago
    Honestly, it's really hard for me to tell from the documentations if this is something that is self-hosted, is run locally or is a SaaS product. The phrase "open source" shows up nowhere I can see for example.

    As best I can tell it's an open source version of comet or wandb that you run locally (or I guess host somewhere). Is that right?

    • ellisv 113 days ago
      From what I can glean from the docs, you run the UI locally which reads from experiments from `.aim/`

      The experiments results could be populated by running the training job remotely and committing the `.aim/` directory afterwards.

      It's kinda like a UI for DVC

      • gevorg_s 113 days ago
        Yea, spot on. Right now UI is ran locally and it reads the data from `.aim`. We have seen users deploy it for the team and have seen standalone in the local environment too. Once training is ran, the logs are saved in the `.aim`. We are using a format that is ~50% more memory efficient than the tb logs - and searchable (see the aimrecords repo).
        • jszymborski 113 days ago
          yes, it looks like it's meant to be run in a similar way to Tensorboard
      • gevorg_s 113 days ago
        Hi, Aim is fully open source and self-hosted (at the moment - just like tensorboard). Open source version of wandb would be one way to put it.

        But we are building a new way of interacting with the ML training runs that lets the researchers compare lots of them (1000s) in really short period of time while having full access to the context of the experiments. This is a super early version. And lots more work needs to be done.

        We have implemented a pythonic search to search through the experiments that is easy to use. Hopefully this sheds more light to the work we are doing.

      • davidbuniat 113 days ago
        Oh nice, it's like open-source version of W&B, and seems much better than Tensorboard. How many experiments I can compare at the same time without crashing chrome? :)
        • gevorg_s 113 days ago
          Aim easily handles 100s of ML training runs. There are folks who have 1000s of experiments too. We haven't oficially benchmarked it yet though.
        • gevorg_s 113 days ago
          Hi all I am one of the co-authors to this project, and will try to answer all the Qs here. Was just forwarded this link - one of the community members must have posted it.
          • pchal 113 days ago
            Interesting effort. Does it snapshot the state of source code at the time an experiment is run? Does it do it without requiring a git commit? I believe the Replicate experiment tracking tool does this.
            • gevorg_s 113 days ago
              We have got similar requests couple of times and its in the pipeline. Currently focused on the comparison of 1000s of metrics/training runs. It's a serious challenge both on the Ui and on the storage end.

              Inviting you to the Aim [slack channel]( We would love to learn more about such use cases and why they are important.

          • frakt0x90 113 days ago
            I've seen a lot these projects recently. I have used MLFlow for a while and enjoyed it. How does this compare?
            • gevorg_s 113 days ago
              We are building Aim as a new paradigm for interacting with and organizing the ML training runs. And the project is really just a few months old.

              It's focused on comparing 1000s of experiments really effectively in minutes. MLFlow, Tensorboard don't have these capabilities which has motivated us to work on Aim. Especially valuable when running hyperparam sensitive tasks such as RL.

              • williamsmj 113 days ago
                Yes, I was looking for the comparison to local MLFlow and Tensorboard (and Losswise, and Weights & Biases, and all the remote options in this ecosystem). It looks nice, but without that, it's difficult to know whether it's worth spending more time looking into it.
                • gevorg_s 113 days ago
                  We are working on a new paradigm on interacting with Ml training runs. A lot of the effort is now focused on very efficient experiment comparison capabilities - talking about 1000s of them. Lots of challenges on the UI and the backend. When loading TB or any other tool really with lots of experiments it's super slow and becomes useless. Also no way to do effective comparison of runs by hyperparams or other metadata on the tensorboard or MLFlow. Quite basic capabilities.
                  • williamsmj 113 days ago
                    Sounds good (especially the performance) but ...

                    What is the new paradigm and how does it differ from the existing paradigms?

                    And what do you mean by "no way to do effective comparison of runs by hyperparams or other metadata on the tensorboard or MLFlow"? If you mean "you can't compare or sort a list of runs by hyperparameter or minimum loss or whatever" then MLFlow can certainly do that, so I think I'm misunderstanding.

                    Any comments on Losswise or W&B?

                    And do you have a plan for monetization or governance?

                    Sorry for all the questions! I have complaints about all the existing solutions, so I'm excited to see a new effort.

                    • gevorg_s 113 days ago
                      no worries at all, love the Questions!

                      re comparison: we have always wanted to use a free open-source self-hosted tool that would let us group metrics/runs by hyperparams, experiment context(train, val, test ...) and any other adjacent info about the training runs. Be able to aggregate groups of metrics, be able to give them different styles, divide them into subplots, search through the runs easily (without regexps on super-long names) etc. As far as I checked last times no such features aren't built for those tools. This is huge motivation behind Aim.

                      Probably the closest to this is W&B but it's not open-source and doesn't allow to see full context of the runs while comparing them (separate module). Haven't used Losswise tbh.

                      We are trying to build a way that would allow to compare 1000s of ML training runs at the same time while still making the full info (context) of the runs available. This is what I meant by "new paradigm". (It turns out this is a fun problem :) ).

                      We have been working on Aim just a few months only (3 of us) and it's in very early stages. Most of the ideas we have aren't really shipped yet.

                      But it's already very useful for many RL researchers who run lots of experiments and those experiments are sensitive to hyperparameters. Aim seems to be able to handle them.

                      Have you checked out the live demo from the README?

                      Check out my blogpost on TowardsDataScience for more info on Aim (

                      Hope this info is useful and makes sense. Would be awesome to connect. I would love to learn more about your use-cases and needs in these tools. My twitter is @gevorg_s.

                      • gevorg_s 113 days ago
                        Would love to invite you join the Aim community slack [here]( ? Let's connect!
                    • ashotarzumanyan 113 days ago
                      Can't agree more. A comparison with others (some sort of table or a blog post) would help a lot.
                    • jszymborski 113 days ago
                      or how either compares with Tensorboard and its HPARAMS tab? (PSA: PyTorch Lightning makes it even easier to use).
                      • gevorg_s 113 days ago
                        the big diff and the advantage is that on tb you can't group by hyperparams or easily divide into subplots while having all the research info in fro nt of you.

                        Aim does that and also aggregates groups of runs to reduce the dimension and make it easy to compare. It has a proper search by hyperparsams (and everything else tracked/collected really). All in one panel where you can compare 100s of experiments at a time.

                        Loading many runs on TB, with very long names makes it super slow to analyze the runs really. Tbh that has been a motivation for building this open source tool - to have something efficient and beautiful :)

                        • gevorg_s 113 days ago
                          Pls see my answers above.
                      • mari_lee__ 113 days ago
                        Awesome! Will look forward to the project's further development.
                      • karankanwar 113 days ago
                        This is great stuff :))
                        • lusinem 113 days ago
                          Wow! Amazing project!