Ask HN: High quality Python scripts or small libraries to learn from

I found that reading other people's code is very beneficial for my own coding. But I haven't found a resource that lists some great code in Python which is not a giant codebase. Any suggestions?

148 points | by dir_balak 13 days ago

29 comments

  • SushiHippie 13 days ago
    I think I mention this all the time when this comes up, but I learned the most 'best practices' through using ruff.

    https://docs.astral.sh/ruff/

    I just installed and enabled all the rules by setting select = [ "ALL" ]

    And then looked at the 'errors' it showed me for my existing code base and then excluded some rules which don't interest me.

    I also have it set up in my IDE, so it'll show me the linting errors while coding.

    In a larger code base there will definitely be many 'errors', but using it cleaned up my code really good and it stopped me from some footguns.

    • sevensor 13 days ago
      This is good advice. I learned Python long before ruff came on the scene, but I did the same with Pylint. I don't adhere rigidly to its recommendations any more, but I learned a lot about the language from trying. I fact, I think some of its recommendations are downright wrong, and what I learned was that I made my code harder to maintain by following them.
      • peteradio 13 days ago
        I'm curious what recommendations you remember disagreeing with.
        • IshKebab 13 days ago
          I can't say I've seen any but some of the code style ones are very prescriptive (function longer than N lines, short variable names etc.).

          Single letter variable names are totally fine in some cases. And while very long functions may be bad, it's pretty annoying when you're adding one line to a function for the linter to say "nope. have you considered dropping everything and refactoring this?"

          You can easily turn them off though. I can't remember any code based ones that are really wrong.

          Maybe some are prone to false positives, e.g. warning about a default `= []` argument. But you can waive them individually.

          • sevensor 13 days ago
            Yeah, it's the "refactoring" category where I disagree most. Too many identifiers in a function, too many lines in a function, too many branches, too many methods in a class, too few methods in a class. You can learn a lot by "fixing" these, but sometimes there's no elegant fix. Take "too many branches." Sometimes you just have to write a function with 90 lines of

                if foo(x):
                    return bar(y)
                if baz(x):
                    return quux(y)
                ...
            
            Sure, there are tricks you can play. You can make the conditions hierarchical and subdivide. You can evaluate all of the conditions first and reduce the function to a lookup. You can define 45 subclasses and define a polymorphic method "frob" on x such that all you have to do is call

                x.frob(y)
            
            I can probably think of half a dozen others. Sometimes it's a good idea, but sometimes you're just better off with a bug dumb function full of straightforward conditionals. It's worth listening to Pylint to learn all of these tricks, so that you can confidently ignore its advice when there really isn't a better way.
    • everforward 13 days ago
      The value of this will vary wildly based on experience level. I wouldn’t suggest this outside of fairly senior developers.

      Without a very good grasp of the language, it can be counterproductive to deep learning. The linters teach what to do, but not why, so it’s hard to grasp how the codebase is worse without following the linter, or when it’s appropriate to disable a linter vs dealing with it.

      Ie I’m not a great JS dev. When the linter says “this should be an arrow function” I understand how to do that, but not why an arrow function is preferable there or in general. I would probably be a better JS dev if I hadn’t had the linter, felt whatever pain from not using arrow functions, and know why they’re important. Or never feel the pain and realize it’s a style choice more than a functional one.

      I prefer using them to help me remember things I already understand. I know why mutating a copy of a string doesn’t mutate the caller’s copy, so I’m happy to have a linter point out when I’m trying to do that inadvertently.

      • BeefySwain 13 days ago
        > The linters teach what to do, but not why

        Ruff specifically actually has a web page for every single check with a section on the rationale behind it.

        I regularly use this when I see an error that I don't understand the purpose of, and am often convinced, though sometimes I recognize that the thing it's trying to protect me from doesn't apply in my particular case and so I disable it. Regardless. I feel it has had the effect of making me a better developer.

        • claytonjy 13 days ago
          Yes, this is a big part of what makes this approach tolerable. In VSCode, the on-hover tooltip includes a link the ruff rule page with this information, so there's no searching needed.

          I've come to rely on that so much, it's really annoying when the error is from mypy, whose tooltips do not have such links.

          Patiently waiting for Astral's mypy-killer!

      • manjalyc 13 days ago
        I agree that sometimes linters can enforce code styles that are more of hassle to deal with than offer any real concrete gain to new developers. But I disagree that only senior developers should use linters. Especially if you are learning a new language, it can introduce you to common conventions in that language, writing cleaner and more idiomatic code, and helps form good habits off the jump instead of building bad habits you will eventually have to change in a professional setting. Sure it can be overzealous at times, but I think on the whole it is a net positive.
        • SushiHippie 13 days ago
          > But I disagree that only senior developers should use linters.

          I'm on the same boat. I started using python ~1 year ago, because it is the main language I use at my dayjob. And I didn't really use python before this (although I was already proficient in other languages).

          In the beginning my code was very messy and I spent much time searching for how to do things the 'correct' way.

          And ruff made this so much easier, and it made me look at some python topics more thoroughly. And now I'd say I have a very good understanding of python and its best practices, and I'm now one of the most proficient python developers in my department (it's not a high bar, we have many data scientists, which are most of the time only proficient in their libraries/tooling they use, and I'm one of the few that is not a data scientist).

          I'm not saying that solely ruff was the reason I'm now in proficient in python, but it made it easier + I would have never looked into some things without it.

          For example, I also type my python code, and before I used ruff I had many problems with circular dependencies. But ruff could fix it with a simple automatic fix by using from __future__ import annotations and if TYPE_CHECKING.

          And the ruff documentation also gives more explanation on the why and how for most of their rules, which is also very valuable.

          https://docs.astral.sh/ruff/rules/future-rewritable-type-ann...

          https://docs.astral.sh/ruff/settings/#lint_flake8-type-check...

    • eternityforest 12 days ago
      To me, keeping the linter and the type checker happy is almost as important as actually writing good code.

      If you don't know how to use the tools, you're missing out on time savings and error prevention, and once you write code that the linter doesn't like, you can't make it compatible without significant manual work.

      I almost think Python would be better with mandatory types....

  • thelastbender12 13 days ago
    Simon Willison's github would be a great place to get started imo -

    https://github.com/simonw/datasette https://github.com/simonw/sqlite-utils

    So, his code might not be a good place to find best patterns (for ex, I don't think they are fully typed), but his repos are very pragmatic, and his development process is super insightful (well documented PRs for personal repos!). Best part, he blogs about every non-trivial update, so you get all the context!

    • hiAndrewQuinn 13 days ago
      simonw might be one of the best and most down to earth Pythonistas of our time. He was one of the co-creators of Django, and that was almost 2 decades ago by this point - getting better all the whole. I second this recommendation heartily.
  • shivekkhurana 13 days ago
    At my university, I followed David Beazly's talks and tutorials. Just seeing him work and present improved my style and approach manifolds:

    https://www.dabeaz.com/tutorials.html

  • zamubafoo 13 days ago
    I think I've learned more reading bad code bases than reading good code bases.

    The entire point is not to just mindlessly consume a code base, but instead form an idea of how to approach the problem and then see if your hypothesis is correct. Then comparing your approach to the actual approach.

    This can show you things that you might've missed taking into account.

    For example, gallery-dl's incidental complexity all lies in centralizing persistent state, logging, and IO through the CLI. It doesn't have sufficient abstraction to allow it to be rewired to different UIs without relying on internal APIs that have no guarantee that won't change.

    Meanwhile a similar application in yt-dlp has that abstraction and works better, but has similar complexity in the configuration side of things.

    • sevensor 13 days ago
      It's a pain, but you can definitely learn a lot from fixing a bad codebase. For that, I recommend trying to write type annotations and get the whole thing to type-check. I've found that bad codebases end up having very complex type annotations because their authors actually contradict themselves. One of my personal favorites in Python is mixing strings and UUIDs as dictionary keys. This positively guarantees a fun afternoon.

      Edit: speling

      • agumonkey 13 days ago
        Another learning point is being sensitive to your psychology / mental energy. I can start with high quality well named, well abstracted code.. but after two weeks I find myself writing shitty code.. and having a hard time realising I should stop, take a pause, take a step back instead of piling on.
        • eternityforest 12 days ago
          Pre-commit hooks help me immensely with this. At least there's some limits on crappiness.
      • mixmastamyk 13 days ago
        I'd start with pyflakes and ruff check/format as step 0. Much easier to get started, and will have fixed a lot of stuff quickly. Next, add types.
    • noufalibrahim 13 days ago
      Related to this. Read the stdlib. Decades of somewhat decent backward compatibility, optimisations etc.

      It's almost archaeological.

  • uneekname 13 days ago
    I can't think of any (small) libraries I could recommend to learn best practices, but what does come to mind is click [0], the CLI library for Python. Their documentation is pretty great, there are tons of short example scripts to be found online, and in my experience making little apps with click can be a nice way to learn different python features like args/kwargs, decorators, string manipulation, etc.

    I agree with others that code formatters like black or ruff might be helpful to you. The literature surrounding them, such as PEPs concerning code formatting, often include examples you may find useful.

    [0] https://click.palletsprojects.com/en/8.1.x/

    • mixmastamyk 13 days ago
      Beware, the pallets people are decorator supremacists. Everything is a decorator even when it arguably shouldn't be. DDD --> decorator driven development. It's a nice technique, too a point. Only exaggerating a bit. ;-)
    • thenipper 13 days ago
      I was just going to suggest this. Click is a great code base to learn form.
  • koutetsu 13 days ago
    If you're looking for some best practices related but limited to machine learning application code, you could have a look at Beyond Jupyter (https://github.com/aai-institute/beyond-jupyter)

    Here's an excerpt from the readme: "Beyond Jupyter is a collection of self-study materials on software design, with a specific focus on machine learning applications, which demonstrates how sound software design can accelerate both development and experimentation."

  • jihadjihad 13 days ago
    For scripts, I've learned a couple tricks from OpenAI's Cookbook examples.

    This one came in handy not too long ago: https://github.com/openai/openai-cookbook/blob/main/examples...

    • hopfenspergerj 13 days ago
      This is an example of bad code.
      • cinntaile 13 days ago
        Maybe you can expand upon that. Now we have no way of knowing why you think it's bad code.
        • radus 13 days ago
          Quick critique: module contains functions with many parameters, many branches, deep nesting, and multiple return points.
        • isoprophlex 13 days ago
          those nested if - for - for - if loops are horrendously difficult to understand.

          take the fn starting at line 387. they comment why they do certain imports, but this function is comparatively underdocumented. it's not easy to wrap my head around the control flow. some bits are nested about 6 levels too deep for comfort, there are too many positions from which it can return or raise, and the function is about 3x too long

          really difficult to grok what is happening here.

      • mixmastamyk 13 days ago
        It’s not horrible, but I found a few odd things, like f-strings w/o params, long cli options with underscores, non-pythonic if == 0, etc.
        • mixmastamyk 13 days ago
          Also the main god function is incredibly long and nested, as others mentioned. Nested --> long lines --> black making a mess.
      • parpfish 13 days ago
        [flagged]
  • oznt 1 day ago
    You should definitely read bottle.py, while full of hack to support python2 it still a very good code base to learn about many python features. Another one is stencil template engine.
  • achanda358 13 days ago
    Peter Norvig's work is great to learn from https://github.com/norvig/pytudes
  • mixmastamyk 13 days ago
    Pyupgrade is a good tool that focuses on upgrading to newer idioms. Which is more important for learning than simple pep8 type stuff, which is useful but has its limits.

    On that subject, Raymond Hettinger has a great talk called “Beyond Pep8” that talks about how to de-java your codebase among other things. Also reading the book Fluent Python now and it is so far excellent.

    • claytonjy 13 days ago
      And ruff has a bunch of the pyupgrade rules included, which has made it easy for me to start catching things like using List instead of list in py3.10+.
  • begueradj 13 days ago
    Take a look at PY4WEB: https://github.com/web2py/py4web

    It is an improvement for the the tiny but efficient web2py web framework.

  • rmorey 13 days ago
    Everything @simonw has worked on, honestly: https://github.com/simonw
  • llandy3d 13 days ago
    I don't know if I can consider my code "Great" but I dedicated way too many months on a prometheus library where I focused on quality since I did it for me.

    It's relatively small and I think the main take away would be the use of Protocols for the pluggable backend system. I hope you get something out of it :)

    https://github.com/Llandy3d/pytheus

  • encoderer 13 days ago
    You do what everybody does:

    Grind this out painstakingly and then copy and paste that as boilerplate for the rest of your natural life.

  • vismit2000 12 days ago
    Karpathy also writes beautiful python code and his GPT lecture series contains some of the finest python code to learn from. https://github.com/karpathy/nn-zero-to-hero
  • rookie101 13 days ago
    I've recently looked at tasktiger https://github.com/closeio/tasktiger. It's a simple queue system that helped me understand how workers and schedulers work.
  • rsyring 13 days ago
    I'd suggest Flask or some of the smaller projects in the Pallets ecosystem:

    Flask, in particular, has a very small number of open issues (2) for a project that is pretty popular. Its also maintained by a competent team and has a lot of project best practices.

    https://github.com/pallets/flask

    https://github.com/pallets

    https://pypistats.org/packages/flask

    https://pypistats.org/packages/django (for comparison)

  • 0xbadcafebee 13 days ago
    Mostly you should start by reading what's shipped with core. After that, look at larger projects with lots of committers, as they often end up getting "polished" down to something that's sort of generically useful without being too quirky or too simple, and the architecture tends to be more functional.

    Avoid anything developed/maintained by one corporation. In general, their organizational hierarchy leads to bad patterns and resists useful changes that don't conform to the goals and patterns of the business or engineering leads. Grassroots OSS projects aren't always better, but they're less likely to have a monoculture and perverse incentives.

    • martinky24 13 days ago
      Python is an example of a language where, in general, the standard library probably isn't the best thing to read if you want to learn to write downstream Python applications/libraries. It can be terse, and not follow "modern" best practices in places (it was written at a time with different "best practices", but the code works so no need to change it).

      There are some exceptions... I'd say the `statistics` module is one [1], the `collections` module might be another [2]. But in general, it's probably not the best place to start.

      If you stumble upon the `multiprocessing` library source code as inspiration for "good Python code"... you're going to be in for a bad time and your future collaborators will not be happy.

      [1]: https://github.com/python/cpython/blob/3.12/Lib/statistics.p...

      [2]: https://github.com/python/cpython/blob/3.12/Lib/collections/...

  • mind-blight 13 days ago
    The Django code base is excellent. I learned a ton early on by reading through it
    • 9dev 13 days ago
      As a derivation of this, in general I advise reading framework source code. Not because you should write code like that, but to learn what the language can do. Framework source code has often been refined by several people over a longer period of time, honed to avoid rough edges. I think you can learn a lot about designing an API, writing good abstractions, and encapsulating complexity.
      • aynyc 13 days ago
        I would caution this. Not because it's a bad idea, but because I've experienced juniors reading framework often ended up creating over-engineered codes that lean heavily into meta style programming. Their code become overly-complex due to massive amount of abstractions they put in. I like the old saying, "cook the recipe exactly as it is three times before you can add your own spin to it".
  • vram22 13 days ago
    The 500 Lines or Less section from https://aosabook.org/en/

    The Architecture of Open Source Applications

  • perfmode 13 days ago
    Peter Norvig’s python scripts are quite beautiful.
  • pixelmonkey 13 days ago
    I had 2 suggestions (plus a blog post) in my style guide here:

    https://github.com/amontalenti/elements-of-python-style#some...

    The style guide itself, published a few years back, also has some suggestions with small code snippets.

  • nmaleki 12 days ago
    Check out https://github.com/recursion-computing/starcel-panda3D for bleeding edge, but not refactored to be pythonic, Python OS development.
  • in9 13 days ago
    if you are into ML libraries, take a look at fklearn, a scikitlearn-like lib, but written in a functional. Fun read to compare both side by side.
  • nurettin 13 days ago
    Any codebase that uses and respects type annotations is probably a good place to start. So grep for "from typing".
  • kunley 13 days ago
    SQLAlchemy sources.

    Not exactly small, but great.

  • michaeljx 13 days ago
    The python requests library
    • oznt 1 day ago
      Requests is really not small and it's full of backwards compatibility code. Would not recommend it.
  • byyoung3 13 days ago
    just try to build stuff and then u will see over time what works and what causes issues. theres no shortcuts other than using chatgpt
  • srcreigh 13 days ago
    what’s your experience level? what kind of python code do you work with (web dev? Data analysis? Algorithms?)