A one-line change decreased our build times by 99%

(medium.com)

462 points | by luord 1277 days ago

54 comments

  • JOnAgain 1275 days ago
    I think it takes some real humility to post this. No doubt someone will follow up with an “of course...” or “if you don’t understand the tech you use...” comment.

    But thank you for this. It takes a bit of courage to point out you’ve been doing something grotesquely inefficient for years and years.

    • projektfu 1275 days ago
      I'd be interested to know how they came to realize what was missing. Did they read the Jenkins docs more thoroughly? Post on a mailing list? See something on StackOverflow? Hire a consultant?
      • pushrax 1275 days ago
        I would expect that internally someone profiled the build (i.e. looked at timestamps) and then either profiled git, or just looked at the logs and did some guessing/research. This didn't seem like it would be complicated to find once you realize the time is spent in git.

        Also, this probably has been an exponentially increasing problem, and wasn't really a priority to solve until relatively recently. I would bet there are a lot of stale undeleted branches.

        • marta_morena_28 1275 days ago
          It really doesn't sound complicated to find, unless you just have a handsoff approach to building things and just don't care as long as "something" comes out on the other end.

          What makes me wonder however is this: 40 min made them look into this? I mean 40 min is crazy long. What builds this long? Chrome, Windows, Linux Kernel on a single core? This should have been raising red flags much earlier. The only explanation I can come up with is that the whole build takes hours anyway, otherwise there is no way you wouldn't notice this sooner.

          • detaro 1274 days ago
            Remember "build" time likely also includes test runs, packaging for deployment, ... 40 mins is easy to get to, and has nothing to do with "on a single core".
            • sumedh 1274 days ago
              You can still see which stages are taking a long time.
          • jeromenerf 1274 days ago
            I wonder if people went into the habit of synchronizing git pushes with socializing breaks, with the proverbial excuse of "yeah, compiling ...".

            On day, someone forgot to brief the foreign intern about the necessity of breaks, intern fixes the issue, pointy-hair-boss gets wind of the news, old crew gets fired, new intern gets promoted and fixes also the Pinterest spam on google images.

            • znpy 1274 days ago
              > new intern gets promoted and fixes also the Pinterest spam on google images

              A man can dream.

          • pushrax 1275 days ago
            It takes >12h to build Windows on the MS build platform.

            On a single core, Chromium surely takes hours to build.

            Though I agree that 40min for the repository in question is highly suspect.

            • masklinn 1274 days ago
              > On a single core, Chromium surely takes hours to build.

              Earlier this year Bruce Dawson had a post indicating that it took about a CPU-day, though coalescing files (“jumbo builds”) significantly reduces build time (we’re talking down to 5h), however that’s at the expense of incremental building, and it constrains the code as you can get symbol collisions between the coalesced files.

            • Dylan16807 1274 days ago
              I read that comment as only applying the "single core" part to the Linux kernel, not Chromium.
          • maccard 1274 days ago
            I work in games, a clean sync of our project takes well over an hour (and if you're at home it takes multiple hours), and compiling takes a large amount of time. We use lots of tricks - "unity" builds with out build system detecting modified files and compiling them standalone as an example. On my last workstation (2x intel xeon gold) it took about 30 minutes to compile
          • oefrha 1275 days ago
            Yep, I’m shocked that it had to be bloated to 40min before they even thought about fixing it. Anyone who has used Jenkins for nontrivial builds must have had the experience of staring at the slowly expanding session log screen? It doesn’t take any “profiling” to realize git clone’s taking forever.
          • ImprovedSilence 1274 days ago
            I remember hearing facebook was on the order of hours (6+) to build.
      • jeromenerf 1274 days ago
        I would bet on "the new recruit found it, 45 minutes in in the onboarding process".
        • conradludgate 1274 days ago
          My first task at my current job, after familiarising myself with the codebase was to improve the CI pipeline
        • Aloha 1274 days ago
          I think everyone has done this, sometimes it really does take a second set of eyes
          • ric2b 1264 days ago
            In my experience it only takes someone who is not drowning in a backlog of tasks yet.

            That tends to be the beginners during the onboarding weeks.

        • koliber 1270 days ago
          The beauty and power of the beginner's mind.

          It can see things that were there all along, but everyone who has been there has developed a blindness to.

      • bluedino 1274 days ago
        I see this kind of stuff at companies where only one or two developers work on a project, or the team working on it hasn't had much experience on other projects.

        An example would be a company I worked for who ran a pretty standard LAMP setup but had never heard of memcached. Simply adding that reduced the database load by like 90%.

      • aeyes 1274 days ago
        The same way it always happens to me: Over time build time creeps up and you don't really look at this code thinking "I implemented shallow clone years ago so there is nothing I can do, it's slower because we have more code."

        Until you or some other person looks what the code is doing.

        It could also be that it was a new hire. I shallow clone a huge monorepo similar in commits/branches and it takes seconds. My experience would instantly tell me that something is worth looking into.

    • hashkb 1275 days ago
      They are a publicly traded company. They have a team dedicated to engineering support. A better article would include a management and hiring postmortem. It's shocking, really. Humility is nice, but competency is also nice.
      • jedberg 1275 days ago
        Why are you so angry about this? You've commented throughout this post about how this is boring and the Pinterest team is incompetent. Why?

        I found it quite interesting. I've been working in deployments for over 20 years at some pretty big places, and never really though about this before. I now have a new tool in my toolbox, and I'm quite happy about it.

        • hashkb 1275 days ago
          In general I'm frustrated that rigor, standards, etc are out the window in favor of all this warm fuzziness. I guess I might be angry...

          The culture change in our industry, towards warm fuzzies and away from tech screens, results in calculable waste. Time, money, electricity, customers. We lose good engineers and tell ourselves they were a bad fit. We push crap on users just to sell ads. Then we write engineering posts to brag about fixing our own mistakes. It's a terrible shame and I speak up about it to remind everyone there was a time when RTFM would be the only response to this.

          Edit: rate limited but one last thing: are we this forgiving of Equifax when they oopsie our data? Seeing this would immediately make me wonder if anything I have shared with Pinterest is safe. That's why they owe us a postmortem and not a thirst trap.

          • m45t3r 1275 days ago
            > are we this forgiving of Equifax when they oopsie our data?

            The kind of culture you're in favor for, blaming engineer for mistakes and punishing them, is exactly what makes the kinda of Equifax mistake possible. Suddenly, people stop to improve things and just do the minimum possible so they can keep their job, since anything else can cause a mistake that will cost your next performance cycle (or even worse, your job).

            • stjohnswarts 1275 days ago
              I know I have certainly left cans of worms close various companies because I knew there was only risk and no reward even though I would have loved to tackle the problem. Fix a major problem working late and weekends and get an attaboy at the weekly conference. Do a good job but introduce a bug that is relatively minor but causes a slight delay in deployment and get on the manager's shit list for the rest of your time there.
            • hashkb 1274 days ago
              I'm talking about blaming and punishing management, not engineers. I'm sorry that wasn't clear.
          • phist_mcgee 1275 days ago
            1. You seem to be projecting a lot of your own thoughts and biases onto this article, and this discussion.

            2. RTFM made sense when you wrote code in one terminal window and compiled it in another and then shipped a CD. There is no way you can RTFM for every tool you use at a modern software company. NO ifs ands or buts, it just is not happening. It sounds like you are looking at a nostalgic view of the past, and not understanding the context of the scope and scale of software that is built these days.

            3. Engineers should be encouraged to share their learnings with each other to collectively 'raise the tide'. I will never pooh pooh a development team wanting to share their learnings, even if you may or may not think it was a good idea, it may have helped their team or someone reading the article.

            4. We've been pushing crap to sell ads since advertising began, grow up and take a longer look, the technology has complicated it, but it's still the same as it always was.

            • unishark 1275 days ago
              I personally do think there is a systemic problem in companies trying to hire the bare minimum in skills/experience for technical roles and ending up with people operating right at the limit of their abilities (and intermittently being in over their heads).

              But do I agree RTFM is frustrating advice for a lot of things. Especially if you aren't going to use such information often enough, so will keep forgetting it anyway and have to start over with the manual each time.

              • corpMaverick 1274 days ago
                > But do I agree RTFM is frustrating advice for a lot of things. Especially if you aren't going to use such information often enough, so will keep forgetting it anyway and have to start over with the manual each time.

                This is related to one of biggest frustrations in the last ten years. We used to master the tool set. But now there are so many tools. Some of them we don't used them often enough. I find myself doing a lot of guessing whereas before I knew what was going on at every step.

          • pavel_lishin 1275 days ago
            > The culture change in our industry, towards warm fuzzies and away from tech screens, results in calculable waste.

            I have not noticed, in the past decade, any move away from tech screens whatsoever.

            • uglycoyote 1274 days ago
              Are we talking about technological screens, like LCD screens here?
              • xdavidliu 1274 days ago
                I believe they mean screens in the sense of technical interviews to assess candidates' technical ability.
      • Aeolun 1275 days ago
        I don’t know, I consider myself fairly competent but I’d never even considered that. It’s just not so relevant until your repo is multiple gigabytes big.

        Still, I’ll see if it works for our pipelines, and we can get our clone from 20s to 1s

        • hashkb 1275 days ago
          Good teams profile everything. This team's only goal is to support other engineers. Build time is a huge issue for every ops team. Missing this for so long is wasted money that's easy to calculate. We can be nice to people while still having high standards. It's a missed opportunity for a deeper postmortem, and it's bland content at best.
          • jacquesm 1274 days ago
            I think you live in a highly theoretical parallel universe. The one I occupy is the one where 'good teams' profile those things that take too long.

            Take yourself as an example: in spite of the wide availability of free certificates you are still hosting your domain without using a secure transport layer. Some would take that as incompetence. Others would assume you have more stuff on your plate rather than that you don't have high standards.

            • sdoering 1274 days ago
              And others would assume, that there are other philosophies out there regarding ssl everywhere. So the question is, who's POV has more validity and logical rigor attached to it. I actually can't see any side winning here on a purely logical level. Only on an ideological level. At least as long as we are talking about consuming public information.

              Am I in favor of the aggressiveness of OP in other posts? No. Am I using SSL myself. Hell yes.

              Nonetheless, I understand that there are people who feel that consuming public information like on a private homepage is nothing that necessitates using SSL. Even if I myself have a different ideology/value set governing my decision.

              I once heard the comparison that it is like the difference of sending a letter and sending a picture postcard. Not sure if I buy into that, but I can't argue against it on a purely rational basis.

              • hashkb 1266 days ago
                Yes, it's by choice. It's read only, public information. We don't set cookies or anything.

                We take security very seriously. But we don't take anything too seriously.

                Edit: by the way, persuade me that there's an upside and I'll turn it on.

            • dmurray 1274 days ago
              > The one I occupy is the one where 'good teams' profile those things that take too long.

              Deciding which things take too long is profiling. Maybe you do it in your head or with pencil and paper instead of using a software approach but I think your position aligns with "good teams profile everything".

      • argc 1275 days ago
        This is neither incompetence nor surprising. Maybe you’ve only worked at large companies who have had time to optimize things for years (and even then, I see grotesque software decisions at my large company quite often). Try accepting that software is often written poorly optimized on the first pass, for good reason, and learn to celebrate the wins without needing to shame someone.
        • hashkb 1275 days ago
          This is Pinterest. Every org I've worked at has been smaller. There's space between shame and ignoring mistakes.

          The purpose of this post is not to educate. There's nothing in here that anyone can use to improve. It's just marketing.

          • phist_mcgee 1275 days ago
            Well I learned something from this article, and I thought I had a good handle on CI/CD, so either I am incredibly stupid and shouldn't be reading these 'nothing' articles, or maybe there is so much to learn it's impossible to know it all.
            • sytringy05 1275 days ago
              If you dont have massive repos, then this is the sort of thing that not often a big problem. Also - if you are using things like gitlab runners, you might be in the same AZ and even large repos clones are fast.

              And it is impossible to know it all, I like these articles just for the differing ways people work

              • phist_mcgee 1275 days ago
                It sounds like GP has a bit of a chip on their shoulder when it comes to feeling like you're not worthy if you don't know everything.
          • stjohnswarts 1275 days ago
            Are you just an angry ex Pinterest employee? There is something to learn here and a reminder to pay more attention even when you're knee deep in other tasks. Besides the obvious feature/limitation of git that the author points out.
      • kevinventullo 1275 days ago
        A team discovers a major efficiency win requiring minimal engineering effort and your response is to... punish them?
      • swsieber 1275 days ago
        People praise my git skills at work (among other things) when they come to me for git help.

        My response is always the same: I've just run into these bugs more often than they.

      • Hnrobert42 1275 days ago
        Shocking? Incompetent? A hiring postmortem? Really? AYFKM?
      • ponker 1275 days ago
        They did a billion dollars of revenue last year, their management and hiring systems seem to be getting the job done.
        • sk5t 1275 days ago
          Yep--although multiple unpleasant experiences with pinterest have spurred me to permaban it from search engine results and smite it with network filters, somewhat wasteful CI/CD pipelines have clearly not prevented the company from flourishing.
      • fizixer 1275 days ago
        Fully agree, and the mindless HN downvote sheep bandwagon is in full effect.

        Working at Pinterest and acting humbled by learning basic stuff on the job?

        I play the world's smallest violin for how hard and stressful your work is. Poor babies.

        Somebody give this engineering team a participation trophy.

    • bald42 1274 days ago
      I don't get why they have to clone their repo frequently in the first place - seems to me as a brute force usage of a version control system prone to high cost in the first place.
      • rightbyte 1274 days ago
        It is a nice and fool proof way to get a clean working environment to just download everything from nothing. And you want different working folder for different jobs anyway so they don't mess with eachother or build of state between jobs due to scripting messups.
      • ForHackernews 1274 days ago
        I don't know about a big org like Pinterest, but it's pretty common for "clone the repo" to be the first step of a CI/CD pipeline when using something like CircleCI or GitlabCI.

        It's an easy (if inefficient) way to always get the latest changes and if you have disposable build-runners then it all gets thrown away at the end of the pipeline.

        • DougBTX 1274 days ago
          It is interesting that we trust our tools so little. A git hash is a pretty robust way to know whether the code in the repo is what it is supposed to be, so a "git fetch" rather than a fresh "git clone" should be safe, but we can't trust the build steps to not trash the build-runner so the entire thing needs to be thrown away.

          Edit: for context, I wrote this comment while waiting for `npm ci` to run. Its first step is to delete the node_modules folder, as otherwise it can't be trusted to update correctly.

          • ForHackernews 1274 days ago
            > we can't trust the build steps to not trash the build-runner so the entire thing needs to be thrown away.

            I think it's partly this, and partly that everything is shared infrastructure now. I don't want to pay to keep a machine up 24/7 just to use it to run a build for 10 minutes half a dozen times per day.

            So instead I lease time on shared hardware with ephemeral "containers" or "virtual machines" or whatever.

      • user5994461 1274 days ago
        Jenkins has a setting to keep the checkout directory (default) or to clear the directory between builds.

        At last job, the default was letting broken changes pass the build, they break some step of the setup/run process that's not run on a partial build. New joiners came in and they couldn't build because the build was broken.

        Had to fix it by setting up two jobs, one running from scratch (30 minutes) and one incremental (10 minutes). The build from scratch was catching a broken change or two every week.

      • mschuster91 1274 days ago
        Ephemeral CI runners. I have the same problem at work - 4GB repository that is redownloaded on every single pipeline run.

        Another reason (which is why we went for ephemeral runners in the first place...) is that if you have stuff that mounts a directory from the repository directory as a volume in a Docker container (e.g. for processing data), you may end up with the Docker container frying permissions in the repo folder (e.g. 0:0 owned files). Now, you can put a cleanup step as part of the CI (=docker run --rm -v $(pwd):/mnt sh -c 'chown -R $runner_uid:$runner_gid)... but unfortunately, Gitlab does not allow a "finally" step that always gets run, so in case the processing fails, the build gets aborted, the server hosting the runner crashes, ... anything happens, the permissions will be fried, and a sysadmin will need to manually intervene.

        An ephemeral runner using docker:dind however? It simply gets removed.

      • mytailorisrich 1274 days ago
        In order to start with a clean slate and to guarantee state and absence of artefacts from previous builds/pulls it is common practice to start off with a clean directory.
  • segfaultbuserr 1275 days ago
    Better title: A one-line change decreased our "git clone" times by 99%.

    It's a bit misleading to use "build time" to describe this improvement, as it makes people think about build systems, compilers, header files, or cache. On the other hand, the alternative title is descriptive and helpful to all developers, not only just builders - people who simply need to clone a branch from a large repository can benefit from this tip as well.

    • ma2rten 1275 days ago
      Right, from the article:

      "This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result."

      So the title is just completely wrong.

      • elwell 1274 days ago
        There's also this part of the article:

        "We found that setting the refspec option during git fetch reduced our build times by 99%."

        So, the article contains contradictions.

    • CGamesPlay 1275 days ago
      They set out to reduce build times, not to reduce git checkout times. It turns out that 99% of the entire build was spent downloading code.
      • Thorrez 1275 days ago
        Where does the article say "99% of the entire build was spent downloading code"?
        • Dylan16807 1274 days ago
          The title. If they reduced the build time by that much, then at least that much of the build time must have been spent downloading code.

          If the title is a lie (which it probably is), then nevermind that number, but it's clear where it came from.

          • mcherm 1274 days ago
            The text of the article clearly states that clone time was reduced by 99%.

            The only way build time could have been reduced by 99% is if every part of the build other than cloning is negligible. It is far more plausible to assume that the title is simply wrong.

        • colechristensen 1274 days ago
          It quotes a jenkins job going from 40 minutes to 30 seconds.
          • shawabawa3 1274 days ago
            They say "Cloning our largest repo, Pinboard, went from 40 minutes to 30 seconds"

            Presumably the build does more than just clone

      • lytedev 1275 days ago
        This isn't true either, as the article says that builds went from 40 minutes to 30 minutes. The time spent cloning was presumably about 10 minutes and came down very far, presumably by 99%.
        • Thorrez 1275 days ago
          > the article says that builds went from 40 minutes to 30 minutes.

          Where in the article does it say that? The article says this:

          > This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result. Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds.

          Both of those sentences say the clone time was reduced by 99%. There are percentage numbers given for how much the build time was reduced, nor any numbers about the total build time.

        • phildenhoff 1275 days ago
          It says from 40 minutes to 30 seconds, not minutes.
        • lytedev 1274 days ago
          I stand quite corrected. Sorry, all!
  • dada78641 1274 days ago
    This reminds me of my first programming job in 2005, working with Macromedia Flash. They had one other Flash programmer who only worked there every once in a while because he was actually studying in college, and he was working on some kind of project from hell that, among other problems, took about two minutes to build to SWF.

    Eventually they stopped asking him to come because he couldn't get anything done, and so I had a look at it. In the Movie Clip library of the project I found he had an empty text field somewhere that was configured to include a copy of almost the entire Unicode range, including thousands of CJK characters, so each time you built the SWF it would collect and compress numerous different scripts from different fonts as vectors for use by the program. And it wasn't even being used by anything.

    Once I removed that one empty text field, builds went down to about ~3 seconds.

    • tetris11 1274 days ago
      I take it that this is not something he added himself, but was likely a catch-all default of textfields at the time?
      • omegote 1274 days ago
        Yep. In order to use non-standard fonts in Flash I recall you had to embed the fonts, even if the movie clip containing the textfield was not being used anywhere.
  • dusted 1274 days ago
    This is the most I've ever gotten out of pinterest, other than this, it's just the "wrong site that google turns up, that I can't use because it wants me to create an account just to watch the image I searched for"
    • saagarjha 1274 days ago
      Can we not do the thing where we pick an organization from an article and then bring up the most generic complaint you can about it in a way that is entirely irrelevant to the post? We get it, you don't like Pinterest showing up in search results, nobody does. But this has absolutely nothing to do with the article other than it being pattern matching on the word "Pinterest", which is about the least informative comment you can make aside from outright trolling or spam. There are threads that come up from time to time where such comments would be appropriate, if not particularly substantive.
      • dusted 1274 days ago
        I guess you're right. I've not noticed this being a topic before, and I should have spent more words telling that the article in question is actually quite interesting, it definitely made me consider our own Jenkins setup.
        • saagarjha 1274 days ago
          Thanks :) I don't want to make it seem like I'm after you in particular, it's just that you were the top comment in this thread and it's that time of night when I should logged off and gone to bed is long past, so my patience for this was just a little thinner than it usually is. It's just that enough people have done this that I figured I might as well steal the second-to-top comment spot with this in the hopes that they might see it and not do it anymore.
      • amelius 1274 days ago
        So if Monsanto/Bayer had a post about their bio informatics stack, you'd expect nobody to complain about the company and its business practices?

        Sometimes the negative impact of a company is just more interesting to people than what the article brings to the table.

        • inopinatus 1274 days ago
          It’s not surprising when certain firms evoke a strong personal feeling, but it’d be terribly exhausting if every article about, say, React, attracted the annotation that Facebook is the Philip Morris of media. The subsequent discussion then tends toward the divisive and derisive rather than the illuminating and informative. Hard to tell anyone they should suppress what they feel, but overall I’d tip the balance towards “fewer like this please.”
          • krageon 1274 days ago
            I think it's valuable to keep saying it because otherwise we start thinking it's okay to fetishise a company's products just because they're technologically interesting. If a company made them on the back of incredibly shady and unethical dealings, they shouldn't be getting free advertising here.
            • 1123581321 1274 days ago
              Who here is fetishizing products because they learned something from the engineering blog? This is not happening.
        • saagarjha 1274 days ago
          I wouldn't expect it, because I have been here long enough to know that that is just not going to happen, but I would very much like it to be so, yes. Rehashing the same topic whenever you see something tangentially related is just a lazy karma grab, not an attempt at creating interesting, insightful conversation.

          Look, I get it, sometimes you want to rant about a company that you think is doing something you don't like: my point is that we have specific threads for them where such a comment could at least be on-topic. When you come to an article that about Pinterest doing some git thing to make their builds faster and your comment is "they're ruining my search!" you're commenting at the level of someone who hasn't read the blog post.

        • read_if_gay_ 1274 days ago
          The point is it’s not directly relevant to the article, and on top of that GP’s particular complaint was especially generic. In this case Pinterests negative impact is not that interesting and it’s constantly discussed too.
      • nabaraz 1274 days ago
        HN has always been very predictive.

        Praise Microsoft for turning the corner, Dislike Google for ads and snooping, Praise Apple for privacy, Dislike Zoom for privacy, Dislike Pinterest for middlewaring Google Image, and so on.

        • saagarjha 1274 days ago
          I'm not even complaining about Hacker News being predictive, we all know that likes to have certain conversations and there is no stopping that. My only request is that this doesn't happen in every single thread regardless of whether it is relevant or not. (To be clear, I am "guilty" of the former myself; there are a handful of topics that I have a particular opinion about and I don't hesitate to share them even if I have mentioned them many times before. I just try to not bring them up in places where they clearly have no connection to what's being talked about.)
        • dctoedt 1274 days ago
          Friendly amendment: *predictable
      • alkonaut 1274 days ago
        Sorry no. If an article is paywalled, on Pinterest, or similar, then please let's discuss the source instead, even if it ruins the discussion, so people learn not to post such links.
        • fenomas 1274 days ago
          TFA isn't paywalled, or on Pinterest, or similar.
        • saagarjha 1274 days ago
          Paywall complaints are explicitly off-topic: https://news.ycombinator.com/item?id=10178989. I am not a moderator, but I think I've made it clear that I personally consider comments like the one I responded to be as well.

          FWIW, in the all years I have been on this site, I have seen this happen regularly and I have yet to see any reduction in such links or these kinds of discussions. Seeing as you've been here longer, I'd be curious to hear about why you might feel differently.

          • sercankd 1274 days ago
            We heard your complaint but you are being acting entitled now. People are free to register, free to comment, if you don't like it, downvote it. It is the top comment, that means it is being upvoted. get over it.
            • saagarjha 1274 days ago
              I tend to downvote very rarely and only for clear violations for the rules, not for comments I don't like. Telling the author why you didn't like something they did often gets them to change or explain their behavior. Just because something is upvoted doesn't mean it is something that should be on Hacker News.
          • alkonaut 1274 days ago
            I just don’t mind repeating myself whether it changes anything or not I guess. Simply because discussing paywalled links or Pinterest linked is invariably more interesting than whatever is found (or not found) when following those links.
    • csunbird 1274 days ago
      I am not sure why google does not penalize this behavior in their search ranking.
    • randunel 1274 days ago
      The most frequent search keyword that I use is "-pinterest"
      • amelius 1274 days ago
        Yes, there seems to be no way to make it clear to Google that we want to never see certain websites in our search results. Yet, Google claims they need our information to "improve our experience".
        • mcv 1274 days ago
          If Google wants my information to improve my experience, I'd love to be able to vote search results up or down. Or entire sites, like pinterest and content farms.
        • hnlmorg 1274 days ago
          For what it's worth, DDG image results doesn't get spammed by Pinterest. While my browsing is a drop in the ocean compared to Googles market share, using a Google competitor is as clear a signal as one can send that you're unhappy with the Google service.
        • andromeduck 1274 days ago
          -pintrest should be a search extension.
    • bufferoverflow 1274 days ago
      That's my experience too. Imagine how many views they have lost over the years, just because they require a login.

      And shame on you, Google, for playing along and indexing their shit, when it's not visible when I click through.

    • some_furry 1274 days ago
      This fact has forced people to write browser extensions to filter Pinterest out.

      I opt for the "teach non-tech people how to dork" route instead: https://soatok.blog/2020/07/21/dorking-your-way-to-search-re...

    • jeromenerf 1274 days ago
      This is one situation where a duckduckgo search is objectively of a better signal/noise ratio.
    • sercankd 1274 days ago
      Yeah i always believed it was some kind of lone evil ai that lives through search results.
      • segfaultbuserr 1274 days ago
        The worse experience is when you have found a dead link that contains useful information that exists only as Pinterest snapshot while doing a web search...
    • Joker_vD 1274 days ago
      Y'know, I actually made a Pinterest account once because of one particular picture I really wanted. Guess what, even with an account you can't have it. Oh well, guess I'll just let it go.
    • syncsynchalt 1274 days ago
      They also created/maintain the kotlin linter, "ktlint".
  • mcv 1274 days ago
    On my first job, 20 years ago, we used a custom Visual C framework that generated one huge .h file that connected all sorts of stuff together. Amongst other things, that .h file contained a list of 10,000 const uints, which were included in every file, and compiled in every file. Compiling that project took hours. At some point I wrote a script that changed all those const uints to #define, which cut our build time to a much more manageable half hour.

    Project lead called it the biggest productivity improvement in the project; now we could build over lunch instead of over the weekend.

    If there's a step in your build pipeline that takes an unreasonable amount of time, it's worth checking why. In my current project, the slowest part of our build pipeline is the Cypress tests. (They're also the most unreliable part.)

    • ravishi 1274 days ago
      At my second job in the industry I worked on a Python project that had to be deployed in a kind of sandboxed production environment where we had no internet access.

      Deploys were painful, as any missing dependency had to be searched in our notebooks over 3G, then copied to an external storage, then plugged into a Windows machine, uploaded to the production server through SCP and then deployed manually over SSH. Sometimes we spent hours doing this again and again until all dependencies were finally resolved.

      I worked there for almost a year, did many cool gigs and learned a lot. But my most valuable contribution came when at some point, tired of that unpredictable torture that were the deploys, started researching into solutions. I set up a pypi proxy into one of our spare office machines and routed all my daily package installs through that. Then I copied that entire proxy content into the production machine before every deploy, and voila, no more surprises.

      I left this job a few weeks later, but have heard that this solution was very useful for many devs that joined the team afterwards.

      • greesil 1274 days ago
        I suppose no Docker containers were allowed in prod either?
        • ravishi 1271 days ago
          Of course not. That was before docker, circa 2010. Our production environment was impossible to recreate.
    • renke1 1274 days ago
      > If there's a step in your build pipeline that takes an unreasonable amount of time, it's worth checking why. In my current project, the slowest part of our build pipeline is the Cypress tests. (They're also the most unreliable part.)

      Would you say the (slow and unreliable) Cypress tests are worth it still?

      • mcv 1274 days ago
        I don't know. We need some sort of e2e tests, and all e2e test frameworks are terrible in one way or another. Cypress is okay. I would prefer to only run it on production or the dev server and have alarms go off when they fail, but either the requirement is, or other developers have decided that it's necessary to pass all e2e tests before a feature branch can be merged into the master branch.

        And I get the reason for it; you don't want to accidentally merge breaking changes. But it does make our build pipelines very slow and unreliable.

        So are they worth it? I don't know. If I had my way, we'd only run them on master, and not make it a requirement for feature branches to pass them. Because if you fix one tiny thing, you now have to wait 15 minutes again for the Cypress tests to run. I think they'd be better in a different setup than what we're doing.

        • pavon 1274 days ago
          We had similar issues with integration tests, and made them a separate jenkins job that didn't trigger automatically, but gitlab was still configured to require them to pass for merge. We would kick it off manually only after all other code review was complete. Then the only cases where we had to re-run it were the same cases where it would have failed in master if we only ran the test there, but it saved us the hassle of reverting or feeling pressured to get hotfixes into master quickly.
        • smaps 1274 days ago
          Check out https://reflect.run/ as a replacement for Cypress. I started using it recently to do E2E testing at work in our staging environment to run a suite of tests before we move anything to production.

          So far it's been great and has saved a couple of releases in a month or so of use!

      • dzhiurgis 1274 days ago
        That's the nature of UI tests for most part. IIRC Cypress are written declarative tools which would make them even more unreliable and slow, albeit easier to fix.

        Personally I've recently started using Playwright and I'm quite happy with it. There was occasional misunderstanding of their API, but 95% of time it's great. Microsoft is kicking butt these days.

      • holtalanm 1274 days ago
        Cypress is horribly unreliable. We used to use it, and tests would pass, then fail on subsequent runs with no code changes, due to internal bugs within Cypress screenshot plugins, if I remember right.

        I have no idea if it is any better now, but we dropped it about 6 months ago in favor of pure Selenium C# for our UI tests.

        edit: a word

    • holtalanm 1274 days ago
      > In my current project, the slowest part of our build pipeline is the Cypress tests

      Oh man, I feel your pain.

      • Cthulhu_ 1274 days ago
        Personally I think longer tests (like a full Cypress run) should not be a boundary to merging in prod if they take more than 10 minutes, but should be run nightly or continuously in the background.

        I've not yet had the opportunity of having a large Cypress suite (working on it as we speak), but is it still more stable than e.g. Selenium is? Honestly 80% of issues we had with that were 'unstable' tests.

        • mcv 1274 days ago
          Exactly. I would much prefer a setup like that over our current rule that all cypress tests must pass before merging.

          A better rule might be that at least one unit or e2e test was added or updated to reflect the change in the code, and that that particular test succeeds. But run all the others on master.

          One advantage (or occasional disadvantage) of Cypress test before merging, is that there is someone clearly responsible for fixing it if a test fails. Problem is, sometimes the failing test has nothing to do with anything the creator of the pull request did. It's still a mystery how that's possible, but it happens. Hence my feeling that Cypress tests aren't very reliable. At least some of ours aren't.

        • holtalanm 1274 days ago
          Unfortunately, the issues we had with Cypress were with the framework itself, not the tests.

          I used to write automation, and I can say that Selenium tests can be written to be very stable. Just depends on how they are written.

  • aidanhs 1275 days ago
    I sympathise a lot with this post! Git cloning can be shockingly slow.

    As a personal anecdote, clones of the Rust repository in CI used to be pretty slow, and on investigating we found out that one key problem was cloning the LLVM submodule (which Rust has a fork of).

    In the end we put in place a hack to download the tar.gz of our LLVM repo from github and just copy it in place of the submodule, rather than cloning it. [0]

    Also, as a counterpoint to some other comments in this thread - it's really easy to just shrug off CI getting slower. A few minutes here and there adds up. It was only because our CI would hard-fail after 3 hours that the infra team really started digging in (on this and other things) - had we left it, I suspect we might be at around 5 hours by now! Contributors want to do their work, not investigate "what does a git clone really do".

    p.s. our first take on this was to have the submodules cloned and stored in the CI cache, then use the rather neat `--reference` flag [1] to grab objects from this local cache when initialising the submodule - incrementally updating the CI cache was way cheaper than recloning each time. Sadly the CI provider wasn't great at handling multi-GB caches, so we went with the approach outlined above.

    [0] https://github.com/rust-lang/rust/blob/1.47.0/src/ci/init_re...

    [1] https://github.com/rust-lang/rust/commit/0347ff58230af512c95...

    • bertr4nd 1275 days ago
      > Contributors want to do their work, not investigate "what does a git clone really do".

      Exactly this. Especially if the repo and CI pipeline are complicated, it is incredibly easy to just assume “it’s slow” is a fact of life.

      And from the point of view of the dev-productivity team, well, they have tons of possible issues to deal with at any given time. Not just CI but the repos themselves, the build system, maybe IDEs, debuggers, ... Sure the fix ends up being easy but you have to know to go looking for it.

      • IggleSniggle 1275 days ago
        When you’ve got a billion other tasks to do, you might even know that it could be orders of magnitude faster and still not fix it, simply because of higher priority work.

        Frankly, I’d rather spend extra time trying to address problems/bugs/potential security holes in the actual shipped code than in fixing a poorly working CI pipeline...and I’m the kind of dev who gets really irritated by these problems. But you have to prioritize.

        Basically, barring “external” forces like cost overflow, customer unhappiness, or similar...stuff like that gets fixed at an equilibrium point between how much the problem hurts the dev, how adjacent to the codebase the devs current work is, and how interesting/irritating the dev finds the problem.

    • auscompgeek 1274 days ago
      Out of curiosity, why not use the submodule.<name>.shallow option in .gitmodules?
      • aidanhs 1270 days ago
        Primarily because, until you mentioned it now, I wasn't even aware it was an option!

        That said, I generally shy away from shallow clones and probably wouldn't use it here:

        - it's a trap for people who ever want to work in that repo normally (we use the trick for more than just LLVM) - I believe shallow clones, over time (e.g. for contributors), are less nefficient than deep clones - I would expect shallow cloning to reuse fewer objects and benefit less from git's design. [0] describes a historic issue on this topic

        [0] https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

  • sxp 1275 days ago
    > Even though we’re telling Git to do a shallow clone, to not fetch any tags, and to fetch the last 50 commits ...

    What is the reason for cloning 50 commits? Whenever I clone a repo off GitHub for a quick build and don't care about sending patches back, I always use --depth=1 to avoid any history or stale assets. Is there a reason to get more commits if you don't care about having a local copy of the history? Do automated build pipelines need more info?

    • mehrdadn 1275 days ago
      Some tools (like linters) might need to look at the actual changes that occurred for various reasons, such as to avoid doing redundant work on unmodified files. To do that, you need all the merge bases... which can present a kind of a chicken-and-egg problem because, to figure this out with git, you need the commits to be there locally to begin with. I'm sure you can find a way around it if you put enough effort into scripting against the remote git server, but you might need to deal with git internals in the process, and it's kind of a pain compared to just cloning the whole repo.
      • Dylan16807 1274 days ago
        If you're interested in metadata, you can use --filter=blob:none to get the commit history but without any file contents.
        • mehrdadn 1274 days ago
          Did not know, that's great, thanks! Seems this is a relatively recent feature?
    • MarkSweep 1275 days ago
      I can’t speak for the original post, but I’ve seen other people[1] increase the commit count because part of the build process looks for a specific commit to checkout after cloning. If you have pull requests landing concurrently and you only clone the most recent commit, there is a race condition between when you queue the build with a specific commit id and when you start the clone.

      All that being said, I don’t know why you would need you build agents to clone the whole damn repo for every build. Why not keep a copy around? That’s what TFS does.

      One other thing I've seen to reduce the Git clone bottleneck is to clone from Git once, create a Git bundle from the clone, upload the bundle to cloud storage, and then have the subsequent steps use the bundle instead of cloning directly. See these two files for the .NET Runtime repo[2][3]. I assume they do this because the clone step is slow or unreliable and then the subsequent moving around of the bundle is faster and more reliable. It also makes every node get the exact same clone (they build on macOS, Windows, and Linux).

      Lastly, be careful with the depth option when cloning. It causes a higher CPU burden on the remote. You can see this in the console output when the remote says it is compressing objects. And if you subsequently do a normaly fetch after a shallow clone, you can cause the server to do ever more work[4].

      1: https://github.com/dotnet/runtime/pull/35109

      2: https://github.com/dotnet/runtime/blob/693c1f05188330e270b01...

      3: https://github.com/dotnet/runtime/blob/693c1f05188330e270b01...

      4: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

      • pushrax 1275 days ago
        Also worth noting that git is pretty efficient at cloning a bunch of subsequent commits, due to delta encoding.

        edit: looks like git doesn't implement fetching thin packs when populating a shallow clone. It will still avoid fetching unnecessary packs, so the efficiency is still high for most software repositories.

        • senkora 1275 days ago
          Does git do delta encoding during clones? I know it doesn’t use deltas for most things.
          • pushrax 1275 days ago
            I am fairly sure it uses thin packs during a clone usually. Though I checked the docs at https://www.git-scm.com/docs/shallow, and it says:

            > There are some unfinished ends of the whole shallow business:

            > - maybe we have to force non-thin packs when fetching into a shallow repo (ATM they are forced non-thin).

    • globular-toast 1274 days ago
      Tags. All of my builds use `git describe` to get a meaningful version number for the build.
  • AdamJacobMuller 1275 days ago
    I expected this to be some micro-optimization of moving a thing from taking 10 seconds to 100ms.

    > Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds.

    This is both very impressive as well as very disheartening. If a process in my CI was taking 40 minutes I would be investigating sooner than a 40-minute delay.

    I don't mean to throw shade on the pintrest engineering team, but, it speaks to an institutional complacency with things like this.

    I'm sure everyone was happy when the clone took 1 second.

    I doubt anyone noticed when the clone took 1 minute.

    Someone probably started to notice when the clone took 5 minutes but didn't look.

    Someone probably tried to fix it when the clone was taking 10 minutes and failed.

    I wonder what 'institutional complacencies' we have. Problems we assume are unsolvable but are actually very trivial to solve.

    • nemothekid 1275 days ago
      I'm not sure this is complacency - this just seems like regular old tech debt. The build takes 40 minutes but everyone has other things to do and there is no time to tend to the debt. Then one day someone has some cycles and discovers a one line change fixes the underlying issue.

      I'm sure many engineering projects have similar improvements that just get a ticket/issue opened and never revisited due to the mountain of other seemingly pressing issues. From IPO to the start of the year Pinterest stock price had been trending downwards - I'm sure there was more external pressure to increase profitability than to fix CI build times. The stock has completely turned around since COVID, so I'm sure that changes things

      • mehrdadn 1275 days ago
        IMHO (from having addressed such CI issues personally on teams that otherwise wouldn't bother) it's likely due to other factors, like a lack of interest, being scared of breaking the build, not being terribly comfortable touching build scripts, or the inability to run scripts locally, than a genuine lack of time. The returns you can get can be ridiculously huge across the entire team compared to the hours you might spend, but I've found many people just aren't terribly interested in sitting down and digging into ugly scripts and pushing dozens of commits to figure out what might be slowing things down. And honestly, it's not exactly trivial to structure things in a way that's simultaneously both efficient and maintainable, especially if you're refactoring an existing system instead of starting from scratch, so that can be another turn-off.
        • MaulingMonkey 1275 days ago
          For me the biggest issue is that CI is often siloed to hell and back.

          Even when most of the rest of the engineering environment is fine, the build scripts and configuration often aren't under version control themselves, or are manually deployed - meaning any changes require access to carefully guarded server credentials. This may even be by design as a "security measure" - as if I didn't already have the ability to run arbitrary code on the build servers in question through unit tests etc. The gatekeepers in question are often an underfunded IT department that has too much on their plate already, and are underwhelmed by the idea of reviewing a bunch of changes to "legacy" code that they've somehow convinced themselves they'll rewrite "soon" that they don't directly benefit from anyways.

          And I find I can rarely run the scripts locally. They're also often hideously locked in to a specific CI solution that I can't locally install without a ton of work on my part to figure out the mess of undocumented dependencies, and rife with edge cases that I can't easily imitate on my dev machines.

          My preferred CI setups involve a single configuration file, checked into the same repository it's configuring CI for, that simply forwards to a low-dependencies script that works on dev machines. Getting there from an existing CI setup, however, can be quite the challenge.

        • Aloha 1275 days ago
          Or just creeping build time over years, "its always taken a while, I guess it just takes longer now". You dont bother optimizing things until they cause you sufficient pain to optimize them.
        • scsilver 1274 days ago
          I can totally see a situation where the engineers whp made the script are long gone, the new engineers are justifying their hiring by churning out features and trying not to break things, especially things they dont own and effect everyone, like ci/cd, and that annoying but manageable 40 minute wait, just gets put on the backlog, waiting for half a year until someone with just enough experience and frustration makes a push to management to dedicate a bit of time to diving into the issue.
        • rhizome 1275 days ago
          My assumption is some or all of those more than people thinking it's "fine," that it's deficiencies more than complacencies.
      • zorked 1275 days ago
        Yup, it's all about incentives alignment. If you get promoted for shipping a feature but you don't get promoted for saving 40 minutes of everybody's time every day you will get a lot of features, delivered slowly.
        • pojzon 1275 days ago
          This is the kind of thinking I tried to sell in my corpo.. where cloning monorepo takes 30m and building this monstrosity takes 1.5h (first time). Got scolded by management for saying - speed of changes should be more important than “looking busy” delivering stuff.
    • fn1 1274 days ago
      > I wonder what 'institutional complacencies' we have. Problems we assume are unsolvable but are actually very trivial to solve.

      I spend a lot of time optimizing builds, because the effect is a multiplicator for everything else in development.

      But it is not an easy task. One issue with performance-monitoring is that you have to carefully plan your work, or you will sit around and wait for results a lot:

      Try the build: 40 minutes. Maybe add profiling statements, because you forgot them: another 40 minutes. Change something and try it out: no change, 40 minutes. Find another optimization which decreases time locally and try it out: 39.5 minutes, because on the build-server that optimization does not work that well. etc.

      You just spent 160 minutes and shaved 0.5 minutes off the build.

      I'm not saying it's not worth it, but that line of work is not often rewarding.

      On the flip-side I once took two hours to write a java-agent which caches File.exists for class-loading and managed to decrease local startup time by 500% because the corporate virus-scanner got active less often.

    • innagadadavida 1275 days ago
      Considering the build host does this hundreds of times every day, a better solution would be to simply have a git repo cache locally, should be secure and reliable given git’s object store design?

      Any simple wrappers for git that can do this transparently?

      • manojlds 1275 days ago
        Build servers don't git clone everytime though. They do a git clean if needed followed a git fetch / git pull equivalent.

        GoCD for example maintains a single copy of the repo on the server for every pipeline that refers to it and the agents have the repos that they work on checked out. Any local changes or untracked files are by default cleaned. There are settings to force reclone etc, but it's not the default.

        • robjan 1274 days ago
          In many cases the build agent is a stateless container which is destroyed as soon as the build is finished. In cases like this the repo needs to be (shallow) cloned each time.
        • dagmx 1274 days ago
          That depends very heavily on the build infrastructure being used however
    • NikhilVerma 1274 days ago
      I doubt that they started off with a 40 mins delay. It probably crept slowly as the repo got bigger and no one noticed it because of the gentle gradient. And they didn't have the time/resources to look into it.
    • edoceo 1275 days ago
      You're confusing full clone, which for a huge repo is OK to be long as the fix which was to specify one RefSpec so they don't clone the full repo in CI.
    • bluedino 1274 days ago
      People probably did complain, but they were met with, "We're cloning a 20GB repo! It's not going to happen in an instant!"
    • raverbashing 1274 days ago
      This is the real complacency

      Did someone really think "well it takes 40min, what can you do about it?" and just left it as such?

      I knew people who would have that mentality in companies that are not around anymore. Take it as you want.

      Yes, git is hard, but you know, maybe someone else has a better idea, or you can check SO, etc. (I don't even know why they were adding the refspecs there)

  • gpapilion 1275 days ago
    I’ve found as an industry we’ve moved to more complex tools, but haven’t built the expertise in them to truly engineer solutions using them. I think lots of organizations could find major optimizations, but it requires really learning about the technology you’re utilizing.
    • megameter 1275 days ago
      It's a natural tradeoff made when we ask for generality and flexibility: Doing that means implicitly saying "I want to do less implementing and more engineering" because a complex configurable dependency becomes an object of study in itself, something that needs empirical testing to use at its best.

      Versus the simple thing you would author yourself: if you know the engineering tradeoffs made at the per-line level you have a decent grasp of the performance and flexibility, but you are implementing it and debugging it.

    • pushrax 1275 days ago
      Also, profiling applications is surprisingly easy to learn. It boils down to looking at timestamps, and seeing what takes the longest. The majority of the effort is just figuring out where/how to get the timestamps you are looking for.

      I will add that I think software complexity is only going to continue increasing over the long term; it reduces in some domains, but expands in others as we develop more advanced systems. Some kind of analogy to entropy.

      • franciscop 1275 days ago
        Totally agree. Example: now that Node.js supports native `import` and `export` from modules I can see how many JS libraries will not need a transpilation step.

        On the other hand TS seems to be more and more popular, which requires a compilation step.

    • smitty1e 1275 days ago
      The whole point of being an Agile "generalizing specialist" is that one is a mile wide and an inch deep.
      • gpapilion 1275 days ago
        Which i think is a fair approach when you’re early on. When you have a dev efficiency team you’re no longer hiring generalists.
    • temporallobe 1275 days ago
      This, this, so much this. When we build more complexity into a system, the less we understand it, similar to how development frameworks create multiple layers of abstraction to the point where the developers have no idea what actual code the framework produces, much less how to fix it.
      • slx26 1274 days ago
        Yes, we probably need people to stop thinking about tools as if they "solved" problems; what they really do is "transform" them. Now instead of having to deal with the original problem, you only need to deal with part of it and part of the new problem of using the tool that's supposed to help you, plus any leaks you might have because tools rarely solve problems perfectly. It's a trade-off, and you need to be aware of these transformations.
      • SamuelAdams 1274 days ago
        Another way of looking at it is this is the current golden age of infosec.

        Think of all these complex systems developers and SysAdmins need to maintain at a company. Then think of how well each person knows each technology. Most of them will be "T" shaped, ie know one tech well but surface-level on all the others.

        If I know several tools really well (or better than the company's sysadmins / devs) I can probably find some security issues with them.

    • jeffbee 1275 days ago
      We have not "as an industry" moved to git. There's a vocal subset of git fans, but it is by no means an industry standard.
      • wolco2 1275 days ago
        What industry are you part of?

        In many domains git has replaced other version control systems.

        I would love to see a new approach to version control. Things like subversion or mercurial have exposed too many drawbacks for them to win back industry.

        • joshuamorton 1274 days ago
          Google and Facebook both don't use git. Google uses a proprietary, perforce-esque system with multiple frontends, and Facebook uses Mercurial.

          Among startups, I'm sure git holds a near monopoly, but if you move into other parts of the industry, that monopoly loosens.

      • AMC11 1275 days ago
        Is Git not the most used VCS?
  • chinhodado 1275 days ago
    When I first joined one of my previous jobs, the build process had a checkout stage where it was blowing away the git folder and checked out from scratch the whole repo every time (!). Since the build machine was reserved for that build job I simply made some changes to do git clean -dfx & git reset --hard & git checkout origin branch. It shaved off like 15 minutes of the build time, which was something like 50% of the total build time.
    • mikepurvis 1275 days ago
      It's frustrating how many ways there are for a git clone to get out of sync, especially when it's an automation-managed one that is supposed to be long-lived (think stuff like gracefully handling force-pushed branches and tags that are deleted). I've dealt with a bit of this with my company's Hound (code search engine) instance. Currently there's a big snarl of fallback logic in there that tries a shallow clone, but then unshallows and pulls refs if it can't find what it's looking for, culminating in this ridiculousness:

          git fetch --prune --no-tags --depth 1 origin+{ref}:remotes/origin/{ref}
      
      See the whole thing here: https://github.com/mikepurvis/hound/blob/6b0b44db489f9aeff39...

      The pipeline I manage is many repos rather than a monorepo, and maintaining long-licheckouts in this context is not really realistic, but what does work and is very fast is just grabbing tarballs— GitLab and Github both cache them, so they don't don't cost additional compute after the first time, and downloading them is strictly less transfer and fewer round trips than the git protocol.

      The only real cost is that anything at build time which needs VCS info (eg, to embed it in the binary) will need an alternate path, for example having it be able to be passed in via an envvar.

    • mixmastamyk 1275 days ago
      A new checkout is good practice. Using refspec and depth options can make it quick.
  • SamuelAdams 1274 days ago
    > In the case of Pinboard, that operation would be fetching more than 2,500 branches.

    Ok, I'll ask: why does a single repository have over 2,500 branches? Why not delete the ones you no longer use?

    • dahfizz 1274 days ago
      Where I work doesn't delete branches, because there is no reason to. Git branches have essentially zero overhead and deleting them is just extra complexity in the CI toolchain. Deleting branches also deletes context in some scenarios. When dealing with an old codebase its nice to be able to checkout the exact version of the code at some point without having to dig through the log to get hashes and then dealing with a detached head.

      The example in the article is a bit of a special case. It is a huge, and old, monorepo. In the typical case, fetching everything and fetching master is equivalent because all commits in all branches make their way into master anyway. If you have a weird branching strategy where you maintain multiple, significantly diverged branches at once, but only care about one of those branches at build time, then this optimization would save you time.

      • rovr138 1274 days ago
        > Git branches have essentially zero overhead

        Based on the article linked here, they do.

        • dahfizz 1274 days ago
          > If you have a weird branching strategy where you maintain multiple, significantly diverged branches at once, but only care about one of those branches at build time, then this optimization would save you time.

          Its not the fact that they had lots of branches itself, its the fact that they had lots of commits hanging out in the middle of nowhere.

      • richardwhiuk 1274 days ago
        If you are doing squash merges, git branches have a cost.
        • dahfizz 1274 days ago
          A git branch is literally a file with a commit hash in it. It's conceptually a pointer to a commit. Creating, destroying, and maintaining a branch has all the overhead of a ~40 byte file.

          Squash merges leave a ton of commits just floating in your old branch. If you delete the branch (the 40B file), all those commits are still there. Doing lots of squash merges brings you into this case I mentioned:

          > If you have a weird branching strategy where you maintain multiple, significantly diverged branches at once, but only care about one of those branches at build time, then this optimization would save you time.

    • tetris11 1274 days ago
      If you have several releases with different targets, and want to make future security updates accessible to all
    • DoingIsLearning 1274 days ago
      They could already be doing that.

      That is if we assume they copy google's philosophy of a single monolith repository.

      Pinterest has about 2000 employees, assuming 20% are active developers thats about 400 people, that gives you roughly 6 branches per developer which wouldn't be outrageous.

    • hn_throwaway_99 1274 days ago
      Because they use a monorepo. With monorepos at large companies the individual git repositories will be much larger and contain a ton more branches than if you have a repository-per-project model.
    • casperb 1274 days ago
      Probably because they have 1600 employees and the 2500 branches are the active ones.
    • est 1274 days ago
      monorepo culture.
  • jniedrauer 1275 days ago
    One of the (many) things that drives me batty about Jenkins is that there are two different ways to represent everything. These days the "declarative pipelines" style seems to be the first class citizen, but most of the documentation still shows the old way. I can't take the code in this example and compare it trivially to my pipelines because the exact same logic is represented in a completely different format. I wish they would just deprecate one or the other.
  • chrisweekly 1275 days ago
    I find the self-congratulatory tone in the post kind of off-putting, akin to "I saved 99% on my heating bill when I started closing doors and windows in the middle of winter."

    If your repos weigh in at 20GB in size, with 350k commits, subject to 60k pulls in a single day, having someone with half a devops clue take a look at what your Jenkinsfile is doing with git is not exactly rocket science or a needle in a haystack. (Here's hoping they discover branch pruning too; how many of those 2500 branches are active?)

    As a consultant I've seen plenty of apallingly poor workflows and practices, so this isn't all that remarkable... but for me the post seems kind of pointless.

    • paledot 1275 days ago
      Indeed. I wasn't aware of that specific git option, but a build pipeline with a checkout step taking FORTY MINUTES is unacceptable. Plenty of ways to solve that problem, but it's a problem that never should have made it into a critical workflow.

      I don't care for casting stones. It's clearly a big win, and you don't get numbers like that every day. But I feel like someone should've twigged to this much sooner.

  • YokoZar 1275 days ago
    Can someone explain the intended meaning behind calling six different repositories "monorepos"?

    It sounds to me like you don't have a monorepo at all and instead have six repositories for six project areas.

    • yen223 1275 days ago
      My interpretation is that each "monorepo" is a big git repository that consists of a collection of individually-deployed services, as opposed to having a single git repository per service.

      I do not know whether that's what the blog author meant by that though.

      • whatatita 1274 days ago
        I got that impression too. I can imagine the Pintrest monorepo, for example, has the website and server code together.

        Their iOS and Android repos may contain the code for multiple apps. Though, I'm not aware of which other apps Pintrest (the company) creates besides the obvious one.

  • muststopmyths 1274 days ago
    I'm a git noob, so I'm sorry if this sounds dumb but wouldn't

    git clone --single-branch

    achieve the same thing (i.e, check out only the branch you want to build) ?

    Also, why would you not only check out one branch when doing CI ?

  • tracer4201 1275 days ago
    I truly appreciate articles like this — it’s warming to see other companies running into the kinds of issues I’ve ran into or had to deal with, and more so that their culture openly discusses and shares these learnings with the broader community.

    The most effective organizations I’ve worked at built mechanisms and processes to disseminate these kinds of learnings and have regular brown bags on how a particular problem was solved or how others can apply their lessons.

    Keep it up Pinterest engineering folks.

  • uglycoyote 1274 days ago
    He says that "Pinboard has more than 350K commits and is 20GB in size when cloned fully." I'm not clear though, exactly what "cloned fully" means in context of the unoptimized/optimized situation.

    He says it went from 40 minutes to 30 seconds. Does this mean they found a way to grab the whole 20GB repo in 30 seconds? seems pretty darn fast to grab 20GB, but maybe on fast internal networks?

    Or maybe they meant that it was 20GB if you grabbed all of the many thousands of garbage branches, when Jenkins really only needed to test "master", and finding a solution that allowed them to only grab what they needed made things faster.

    I'm also curious about the incremental vs "cloning fully" aspect of it. Does each run of Jenkins clone the repo from scratch or does it incrementally pull into a directory where it has been cloned before? I could see how in a cloning-from-scratch situation the burden of cloning every branch that ever existed would be large, whereas incrementally I would think it wouldn't matter that much.

    • degrews 1274 days ago
      > He says that "Pinboard has more than 350K commits and is 20GB in size when cloned fully." I'm not clear though, exactly what "cloned fully" means in context of the unoptimized/optimized situation.

      It probably means including all commits.

      It looks like they were successfully only pulling the last 50 commits, but they were doing that for each of 2500 branches. Now they are pulling only the most recent 50 commits for one branch.

  • bluedino 1274 days ago
    My similar story goes like this: We had CRM software that let you setup user defined menu options. Someone at our organization decided to make a set of nested menu options where you could configure a product, with every possible combination being assigned a value!

    So if you had a large, blue second generation widget with a foo accessory and option buzz, you were value 30202, and if was the same one except red, it was 26420...

    Every time the CRM software started up, it cycled through the options, generated a new XML file with all the results, this took about a minute and created like a 60MB file.

    The fix was to basically version the XML file and the options definition file. If someone had already generated that file, just load the XML file instead of parsing and looping through the options file. Started up in 5 seconds!

    What was the excuse that it took so long in the first place? "The CRM software is written in Java, so it's slow."

  • saagarjha 1274 days ago
    Seems like there's a lot of hostility towards the title, which might be considered the engineering blog equivalent of clickbait. If the authors are around, the post was quite informative and interesting to read, but I'm sure it would have been much more palatable with a more descriptive title.

    But back on topic: does anyone have any insight into when git fetches things, and what it chooses to grab? It is just "when we were writing git we chose these things as being useful to have a 'please update things before running this command' implicitly run before them"? For example, git pull seems to run a fetch for you, etc.

  • sambe 1274 days ago
    Ok, I'll ask the obvious question: why did setting the branches option to master not already do this?

    EDIT

    https://www.jenkins.io/doc/pipeline/steps/workflow-scm-step/ makes it sounds like the branches option specifies which branches to monitor for changes, after which all branches are fetched. This still seems like a counter-intuitive design that doesn't fit the most common cases.

  • jtchang 1275 days ago
    This is good info. Need to check my own build pipelines now and see if we are just blindingly cloning everything or not. 40 minutes to do a clone is a pretty long time to wait though.
  • quickthrower2 1275 days ago
    Parkinson's Law of builds. "work expands so as to fill the time available for its completion", or in this case the available time is the point at which people can't stand the build taking too long. 30-60 minutes is normal because anything > 1 minute required you to context-switch anyway, and > 60 minutes means you are now at risk of taking a day if you have a work queue of a 1-pizza team. So [1..60] range causes a grumble but nothing will be done.
  • nathan_f77 1274 days ago
    Is there any way to do this for GitLab CI [1]? I'm using GIT_DEPTH=1, but I'm not sure how to set refspecs. It's not too important right now since it only takes about 11 seconds to clone the git repo, but maybe it's a quick win as well.

    [1] https://docs.gitlab.com/ee/ci/large_repositories/

  • cma 1275 days ago
    > For Pinboard alone, we do more than 60K git pulls on business days.

    Can anyone explain this? Seems ripe for another 99% improvement even with hundreds of devs.

    • altdatathrow 1275 days ago
      An unhealthy obsession with CI/CD is the usual culprit.
  • timzaman 1274 days ago
    Misleading title. They reduced their clone time by 99%. Not their build time.
    • dastx 1274 days ago
      With a repo that is 20GB, I can imagine that could be 99% of the build time.
  • inopinatus 1274 days ago
    My CI servers have to build branches as well, though. A fresh clone for every build? No wonder it was slow, but even this solution seems inefficient. My preferred general solution is a persistent repository clone per build host, maintained by incremental fetch, and use git worktree add, not git clone, to checkout each build.
  • Dylan16807 1274 days ago
    Well, good advice, and good for them, but

    > Cloning monorepos that have a lot of code and history is time consuming, and we need to do it frequently throughout the day in our continuous integration pipelines.

    No you don't!

    If removing per-build clones was the only way to speed things up, I'm absolutely sure you could figure out how with medium difficulty at most.

  • villgax 1275 days ago
    60K pulls per day for 100 commit in a day? What tests are being done that can't leverage earlier pulls?
  • ibains 1274 days ago
    Thus just shows how poor visibility into git is, I hope it gets better.

    Building a product with poor visibility and ridiculing users for not knowing internals is the worst practice in Computer Science.

    Hadoop did the same, and has set a record of fastest software to become legacy.

    Super nice to see great comments here and the nice article.

  • mandeepj 1275 days ago
    Looks like Pinterest’s team is confused about Git Branches. These are not real full copy versions of the main branch like in SVN or TFS. A branch in Git world is simply a pointer to a specific commit in the code push history.

    Having said that, happy to be proven wrong, and learn about it.

    • ekimekim 1275 days ago
      IIUC the issue here is the depth option - they're telling it to only fetch the last 50 commits, but they were fetching the last 50 commits from EACH branch. In other words, they were fetching all commits that are within 50 commits of any branch head. By restricting the branches, they drastically reduce the set of commits to fetch.
  • Scaevolus 1275 days ago
    For CI on large repos, you can do much better than this by using a persistent git cache. It takes a little finessing to destroy it if it's corrupt and avoid concurrent modifications, but it's extremely worth it.
    • hakre 1273 days ago
      you mean syncing to git bares on CI-nodes and then in the build not using a WAN remote but just clone from the bare with hard-links and checkout?
  • yoz-y 1274 days ago
    Because of strife with 99% claim. If the pull time took 39.9min (and thus build took 0.1min = 6sec) then a 99% decrease in pull time would result in 99% decrease of total time and you would get 30sec total time in the end. (Rounding to 0 decimal places).

    Not that any of this is important for the article to be interesting. In a previous job we had to fight long pull times and we quickly created a git repo for CI that would sit on a machine next to the CI server and would periodically pull from GitHub to avoid the CI to do pulls over Internet.

  • soulofmischief 1274 days ago
    The title is a bit of a misnomer, isn't it?

    > This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result.

    Sounds like it didn't reduce build times quite by 99%.

  • joshribakoff 1274 days ago
    Misleading title. They reduced git clone time 99%, not build times.
  • xattt 1275 days ago
    Will this mean even more Google image search spam‽
  • ar7hur 1274 days ago
    Alternative title: "How one line of code made our build time 100x what it should have been"
  • mister_hn 1274 days ago
    I'm not impressed by the author of the post, since it's also something documented in the plugin, saying that you should not checkout all the branches, if not interested. The default behaviour of course is to get all of them.
  • JoeAltmaier 1274 days ago
    So git doesn't scale well with wide, deep source histories? That's a failing of git I think, not the Engineers who may even have written that line when the source base was far less gnarly.
  • cj 1274 days ago
    I once reduced the speed of our test suite from 10 mins to < 5 minutes by changing 2 characters in 1 line...

    Then bcrypt work factor! It was originally 12, reduced it to 1 (don’t worry, production is still 12)

  • nabaraz 1274 days ago
    Is it a common practice to clone the repo on every build (especially on web apps)? I just have Jenkins navigate to an app folder, run few git commands (hard reset, pull), and build (webpack).
  • KnobbleMcKnees 1273 days ago
    The article is erroneous in many ways as others have described, but the main error I see is that it says 'git clone' is run before the fetch.

    It should be 'git init'

  • wruza 1274 days ago
    It is pinteresting that a webapp for making your image saving obsession easier to satisfy takes hundreds to thousands developer actions per day and repository sizes of tens of gigabytes.
  • jakub_g 1274 days ago
    Semi-related for JS developers: if you do `eslint` as part of your build, make sure `node_modules` (and `node_modules` in subolders if you have monorepo-ish solution) is excluded.
  • andrelaszlo 1274 days ago
    We recently reduced our build times by 5-10% or so by changing the default bcrypt iteration count (for tests). It also felt silly once we found it.
    • nathan_f77 1274 days ago
      Thanks so much for this tip! I just made this change and some of my tests are now much faster. Here's the result for one of the affected tests (averaged across 5 runs):

      Before: 1.39 seconds

      After: 0.62 seconds

      I have this default line in config/initializers/devise.rb:

          config.stretches = Rails.env.test? ? 1 : 11
      
      So hashing user passwords was already very fast. But I'm also manually calling BCrypt in some other places, so these calls are now much faster as well.
    • user5994461 1274 days ago
      You should consider doing the same thing in production.

      It's a trope at this point how the modern slow hashing algorithms are utterly misconfigured. Stopped counting how many times I've seen it.

      Take a whole second to compute a hash on the production machine because "hashing is supposed to be slow", noting the production server is a low frequency Xeon that has many core but they're half as slow as your development with a 4GHz i7-9999.

      Hashing is supposed to take milliseconds, not seconds. If it's taking longer than 100 ms you need to make it faster.

      edit: found the problem, this bad stackoverflow answer that's been spreading bad recommendations for years https://security.stackexchange.com/questions/17207/recommend...

    • mytailorisrich 1274 days ago
      One thing to keep in mind is that this obviously changes the timing of your software with respect to production behaviour, which may or may not matter depending on what you are testing.
  • csours 1274 days ago
    Troubleshooting CI/CD feels like troubleshooting a printer: What the hell is it doing now and why is it doing that?!
  • fortran77 1274 days ago
    I'd rather Pinterest increased their build times by 99% so they could do less damage to search results.
  • leothekim 1274 days ago
    “We have six main repositories at Pinterest: Pinboard, Optimus, Cosmos, Magnus, iOS, and Android. Each one is a monorepo and houses a large collection of language-specific services.”

    What is an “iOS monorepo” supposed to be like?

  • TylerE 1275 days ago
    @Dang, can we get an edit?

    This did NOT slash build times 99%, but rather time to do a git pull.

    • lemax 1275 days ago
      If build includes a git pull, maybe it did.
      • hashkb 1275 days ago
        Nitpick... if 99% of your build time is consumed by preparing the workspace, that's the story. This isn't interesting to anyone who doesn't have that exact problem. Most people who click this won't find it interesting.
    • freedomben 1275 days ago
      From the article:

      > We found that setting the refspec option during git fetch reduced our build times by 99%.

      Seems pretty clear to me that build times were reduced by 99% as a result of cutting the git fetch times significantly (but exacyt number is not give). The headline looks correct to me.

      • TylerE 1275 days ago
        FTA: "This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result"

        Unless their build is 100% git pull time, this did not reduce build time by 99%.

        • qzw 1275 days ago
          Exactly. The article makes both statements in different places, and they are contradictory. Kind of gives an impression of sloppiness.
        • Aeolun 1275 days ago
          To be fair, if their pull took 40 minutes, that’s a very real option :)
  • hansdieter1337 1275 days ago
    missleading title. Not the build time was decreased by 99%. Only the git checkout step was.
  • s9w 1274 days ago
    clickbait
  • scoot_718 1275 days ago
    Who cares if all you're doing is optimizing the massive spam operation that is Pinterest? You've made garbage faster. Well done.
  • lerpapoo 1275 days ago
    tldr can i guess it was doing some extra network roundtrips or something?
  • smsm42 1274 days ago
    TLDR: they reduced "git clone" time on their massive monorepo by making it only check out master branch when building in Jenkins.