Customize Python dependency resolution with machine learning

(developers.redhat.com)

42 points | by BerislavLopac 10 days ago

7 comments

  • roganartu 9 days ago
    > Keeping the dependency information in a database, which is queried during the resolution process, allows us to choose dependencies using criteria specified by the developer instead of merely importing the latest possible versions, as pip's backtracking algorithm does. You can specify quality criteria depending on the application's traits and environment. For instance, applications deployed to production environments must be secure, so it is important that dependencies do not introduce vulnerabilities. When a data scientist trains a machine learning model in an isolated environment, however, it is acceptable to use dependency versions that are vulnerable but offer a performance gain, thus saving time and resources.

    This seems like a really bad idea to me. I could understand and perhaps get behind the idea that you might use something like this to find the optimal version of a package to use in a given project, but unexpected differences between your development environment and production are a common source of outages.

    It also requires using a different package manager called Thamos: https://thoth-station.ninja/docs/developers/thamos/. This tool then outputs requirements files compatible with Pipenv, pip, or pip-tools (though notably not Poetry).

    That being said, all of the examples and config seems very centered around ML use cases, with the Thamos config accepting settings for OS, cpu, and cuda versions. Is variance in performance between otherwise-compatible versions of ML packages really that big a problem?

    • monkeybutton 9 days ago
      ML Engineer: Why does inference for this model take 0.9s per-call?!

      Data scientist: I have no idea, inferences take 0.1s on average in my environment?

      I jest but I've also lived this experience with data scientists developing an algorithm on Windows with one set of wheels, and the same code being deployed to Linux with a different set of binaries and the whole thing running 10x slower. We fixed it, but it was an unnecessary headache.

      • joconde 9 days ago
        When PyTorch fails to load the CUDA runtime for any reason, it falls back to CPU, often silently, and becomes more than 20 times slower on CNN inference. Not sure if this system could avoid it. Debugging that remotely on a user’s system was fun.
      • deycallmeajay 9 days ago
        Yeah this sounds like a terrible idea. The current goal is to build reproducible and hermetic builds. By adding more complexity it’ll be much more difficult to get the same artifact, build after build as well as give another method for attackers to achieve supply chain injections.
        • benjamir 9 days ago
          Yeah: Use even more complexity to build software. How about taming software? (duckandcover)
          • sigmonsays 9 days ago
            This is honestly a bad joke right? Python packaging could not possibly get worse.... Or could it
            • quinnftw 9 days ago
              It could: What if a machine learning algorithm picked each of your dependency versions for you ~magically~
            • atoav 8 days ago
              Surely the solution to uncontrollable dependecies is to throw random statistical processes into the mix, that make a complex but deterministic problem totally undeterminstic.

              I plead for using astropy to choose dependencies based on the alignment of the stars instead, because it has a slight mythological advantage.

            • akx 9 days ago
              Um...

              > The Python Packaging Authority (PyPA), along with the Python community, is working on an endpoint to provide the dependency information.

              So what is the `requires_dist` key in e.g. https://pypi.org/pypi/Django/3.2/json ?

              (My experimental dependency locking tool Pipimi (https://github.com/akx/pipimi/blob/f055b0c0/pipimi.py#L43-L5...) uses that endpoint.)

            • dgan 9 days ago
              Hm bike shedding, but "thoth" is a horrible name for anyone who isn't a proficient English speaker. I honestly can't pronounce it
              • geofft 9 days ago
                It's the English spelling of the Greek name for an Egyptian god: https://en.wikipedia.org/wiki/Thoth

                The actual Egyptian pronunciation, as best as we can reconstruct it, uses sounds that didn't exist in either ancient Greek or even Coptic, and don't quite exist in modern English either. I think you'd be entirely justified using another sound if your language doesn't have English "th".

                • kortex 9 days ago
                  Thoth, /θoƱθ/, rhymes with oath. Not sure if /oƱ/ renders here, it's the near-close back rounded vowel or "horseshoe".

                  Normally yes, /θ/, the dental fricative (commonly "th" in English) is uncommon as far as language sounds goes, and particularly tricky to voice if it's not in phonemic inventory.

                  But in this case, /toet/ or /toƱt/, "tote", like the container, rhymes with goat, is also an acceptable pronunciation.

                  • jiggunjer 9 days ago
                    I always pronounced it rhyming with cloth.
                  • marginalia_nu 9 days ago
                    Thoth is the name of an Egyptian god, not really an English word.
                    • danudey 9 days ago
                      So it's a horrible name for anyone who isn't a proficient Ancient Egyptian speaker.
                      • marginalia_nu 9 days ago
                        I guess we should have them write it on a stele next to its greek equivalent.
                  • zuj 7 days ago
                    Why, I mean, why ? I am serious, why ?
                    • TotallyNotOla 8 days ago
                      I see it's time to update xkcd 1987. https://xkcd.com/1987/