Launch HN: Speedscale (YC S20) – Automatically create tests from actual traffic

We’re Ken, Nate and Matt, co-founders of Speedscale (https://speedscale.com), a tool that automatically generates continuous integration (CI) tests from past traffic. Carefully scaling rollouts to ever larger groups of customers is the safest deployment strategy, but can take weeks. Even for elite DevOps organizations up to 15% of changes to production can result in degraded service [1] [2].

We met as undergrads at Georgia Tech and come from a DevOps and operations background so we’ve seen this first hand. Each of us has over 15 years of experience building high-reliability systems, starting in the early days with satellite earth station monitoring. As interns we once wrote a bug that caused a 32 meter antenna to try to point down through the earth, almost flattening the building we were in. It was a great environment to learn about engineering reliability. We leveraged this experience to tackle monitoring Java app servers, SOA, SaaS observability and cloud data warehouses. What if we could use a form of observability data to automatically test the reliability of new deployments before they hit production? That’s the idea that got us started on Speedscale.

Most test automation tools record browser interactions or use AI to generate a set of UI tests. Speedscale works differently in that it captures API calls at the source using a Kubernetes sidecar [3] or a reverse proxy. We can see all the traffic going in and out of each service, not just the UI. We feed the traffic through an analyzer process that detects calls to external services and emulates a realistic request and response -- even authentication systems like OAUTH =). Unlike guessing how users call your service, Speedscale automation reflects reality because we collected data from your live system. We call each interaction model a Scenario and Speedscale generates them without human effort leading to an easily maintained full-coverage CI test suite.

Scenarios can run on demand or in your build pipeline because Speedscale inserts your container into an ephemeral environment where we stress it with different performance, regression, and chaos scenarios. If it breaks, you can decide the alerting threshold. Speedscale is especially effective in ensuring compliance with subtle Service Level Objective (SLO) conditions like performance regression [4].

We're not public yet but would be happy to give you a demo if you contact us at hello@speedscale.com. Also, we are doing alpha customer deployments to refine our feature set and protocol support - if you have this problem or have tried to solve it in the past we would love to get your feedback. Eventually we’ll end up selling the service via a subscription model but the details are still TBD. For the moment we’re mainly focused on making the product more useful and collecting feedback. Thanks!

[1] https://services.google.com/fh/files/misc/state-of-devops-20...

[2] https://aws.amazon.com/builders-library/automating-safe-hand...

[3] https://kubernetes.io/blog/2015/06/the-distributed-system-to...

[4] https://landing.google.com/sre/sre-book/chapters/service-lev...

137 points | by Inchull 1359 days ago

22 comments

  • kahrensdd 1359 days ago
    For a little background story on the satellite antenna. I was building a monitoring device driver for an Antenna Control Unit (ACU). It's like a 4 rack-unit computer with special hardware for talking to the antenna motors (azimuth, elevation and polarization). After sending it a command, the device froze up, so we rebooted it. The CMOS battery was dead so when it came back up the date was wrong, but I did not notice. I sent it a command to reposition and it began moving to point below the horizon... The bad date meant that it had the wrong geo location position for itself. Well it turns out "below horizon" is really important because the building was just a structure to hold up the antenna and it was going to crash into the ground. Fortunately someone ran in and hit the STOP button while I was staring at the monitor. That day I learned that monitoring and alerting is important stuff.
    • nunez 1359 days ago
      Dead CMOS batteries suck.
  • d_watt 1359 days ago
    Using products like this in the pastI've run into a pretty simple issue:

    - Request 1 is a post that generates a "todo" with a random id "5y22"

    - Request 2 is a "get" for /todo/5y22.

    That works in production, but on replay of the traffic:

    - Request 1 generates a different random id, "86jj".

    - Request 2 is still a replayed "get" for /todo/5y22, which is now 404.

    How does your tooling handle this nondeterminism in replays?

    • kahrensdd 1359 days ago
      For one thing we look at both inbound and outbound traffic and we treat it separately. That use case looks different if we are trying to "test" TODO or if the TODO service is a backend that our app relies upon.

      So if you mean that we want to "test" TODO, our analyzer looks for data in subsequent requests that was provided by the system-under-test (SUT) in previous responses. A common example of this is an HTTP cookie. The SUT gives us a session id through the Set-Cookie header response. So in a subsequent request we use the cookie from the app, not the one that was recorded. This has been done in general way to look for tokens.

      Of course nobody is perfect so we'd love to see your real world app and test our algorithms against it. :)

    • atombender 1359 days ago
      I don't think that qualifies as non-determinism. This is just dependencies between operations.

      Non-determinism would be, for example, something that's time-sensitive. If some result varies by time, then the only way to test it is to include time as a parameter. This can be complicated if the time variable plays into asynchronous updates (e.g. you want to test that a POST update worked, but it's actually eventually consistent). Caching (e.g. through Varnish or a CDN) would be another thing to make such tests much more complicated.

      Another example is an API that has side effects. For example, a stock trading API might read real time quotes from another service. A stock trade then alters the next quote.

  • pbiggar 1359 days ago
    This sounds great! We actually discussed doing this at the very start of CircleCI (we had a partnership with an exception handling service, but we never executed on it). Coincidentally, my current company, Dark (https://darklang.com) is based around a similar concept -- using live traffic as an assistant as you're writing code.
    • kahrensdd 1359 days ago
      For sure we see a lot of synergy with the CI systems. One of our alpha customers is using CircleCI (no surprise there). They have an issue where devs deploy services on top of each other in staging and accidentally take it down for their internal users. So Speedscale lets them detect their new build is not a good candidate to deploy to staging.

      Thanks for sharing the info about your project, I am checking it out on GitHub right now. :)

  • billyhoffman 1359 days ago
    Congratulations on the launch Ken!

    Ken worked as a top SE at New Relic and has been involved in the Atlanta Web Performance meetup for many years. It was a wonderful surprise to see you all on HN

    • kahrensdd 1359 days ago
      Thank you Billy, we are very excited to be making progress with Speedscale. I miss seeing you and the rest of the crew from the ATL Web Performance meetup.
  • decentCapitalYC 1345 days ago
    Hi Ken, I have been researching your project for a while and would love to get early talk before Demo day starts. Will attend next Mon/Tue anyways, but timing for decision is critical in YC. Wanna have more time dig into SpeedScale. Really like you guys' solution can potentially bring huge value to complicated production release (how many times SpaceX tests even just combustion chamber pressure before the real launch?:). Your solution seems really a systematic one by the 3 pillars you mentioned and also go under the hood (vs Recording. 1st principle wins). Trust your 32M antenna lesson is scary enough long lasting in memory. :) I was physics background, and yes, definitely you should learn where is Panic Button before poking around in a sophisticated lab.:)

    Anyways, was trying to email you to founders at your domain but no response. Can you reply to my email? Handle is lli at my company domain (also in my profile). Would love to have a quick chat this week to get well warmed up before thee Day.

  • vii 1359 days ago
    Real traffic exposes wrong assumptions in code, which cannot be caught with unit or integration testing. Awesome to automate this kind of testing. I encourage people to invest heavily in setting up artificial environments to replay historical data.

    One benefit of the artificial environment is efficiency (can cover e.g. a week of historical data); this also requires mocking out time with simulation/replay time. The integration with data-platforms makes a big difference as it means data-scientists can help set up the right test scenarios.

  • nserrino 1359 days ago
    This is cool, is the traffic curated in any way? Like if the database isn't initialized, do you start with create requests before moving on to GETs for those IDs? Also does this only support HTTP or does it support other protocols as well?
    • mleray 1359 days ago
      Ultimately, the idea is to mock the database itself so we just return whatever the real database returned during the recording. We don't have to run create commands because we aren't actually managing a real database's internal state. We "only" need to accurately return the responses the database gives the system-under-test for a particular GET sequence. During the alpha we are limiting support to HTTP/s but protocols like MongoDB, redis, MySQL, etc are on the backlog. Until we have more database support we're asking alpha customers to deploy test data in a test database, which seems to be a fairly normal part of the CI process for big apps.
  • techdragon 1359 days ago
    I love the look of this, I cant even remember all the times I’ve run across applications out in the wild that were built without any thought about testing until long after the original devs are gone. All you have is production traffic and reverse engineering. It’s a real pain, and I would love to have more tools to attack this problem when it comes up.

    So how open is that alpha? I’d really love to try this out.

    • kahrensdd 1359 days ago
      Reverse engineering protocols has morphed from a hobby to a full time job. I've spent a lot of time going through logs, looking at data and knowing that I am not looking at the TCP level request and response. So we are trying to solve that problem. :)

      Please send a note to hello@speedscale.com so we can get the details of your environment and determine if it is a fit for our alpha.

  • RabbitmqGuy 1359 days ago
    Is this like goreplay[1]? How are you different?

    1. https://github.com/buger/goreplay

    • kahrensdd 1359 days ago
      Yes, I really like goreplay and it was an early inspiration for our sidecar. It uses gopacket (like tcpdump) to collect data, we currently use the TCP proxy route because we can use TLS libraries for HTTP/S support. There are similarities in that both can collect and replay data.

      One of the early observations from my co-founder Nate was there are 3 key ingredients to testing in a SOA environment [1]

      * Automation

      * Dependencies

      * Data

      While goreplay has a form of automation, it doesn't help you with dependencies (no mocking), and the data is either streamed or locked in a special file format. Of course like any open source project there's pieces of the solution but you have to assemble them. For instance there is no UI, no reports, no pass/fail assertions, no integration with CI system, etc. I'm by no means an expert, there is likely a path to combine with other open source projects to fulfill those.

      Our stuff isn't perfect, but we primarily see overlap with the automation capability of Speedscale and the goreplay project.

      [1] https://speedscale.com/2020/02/06/triplethreat/

      (edited formatting)

  • sramam 1359 days ago
    Congratulations on surviving the 32 meter odyssey and living to launch!

    Wondering how do you deal with stateful services? You mention an "analyzer process" for external services, what about internal services?

    It seems this would work well at a single service level, but would it also be possible to apply the analysis at both a single service and a group of services? Some form of unit-testing and integration-testing at a service level...

    • kahrensdd 1359 days ago
      Awesome thank you for the note. Fortunately there was a "stop antenna" button which saved the day lol.

      We've been down the path of stateful services before and actually reflect the proper state in our responder. Because we control the test that is being played against the system-under-test, we understand the sequence and order of calls that will be made to the downstream system as well.

      In addition, the analysis actually captures all outbound services at once. We are able to identify each separate hostname that is invoked and mock them all out as a group. One of our first alphas was stunned that it auto mocked 7 backend systems on the first try.

  • 2rsf 1359 days ago
    Sounds great, for using this in a financial environment it will need an option for data anonymization, I'm not sure how can you identify what needs to anonymized without human interaction though.
    • mleray 1359 days ago
      You're spot on :). We got this feedback from one of our financial services alphas so we built a DLP rules engine to cover it. That wasn't enough. So we offered to integrate with Google DLP. Still no. So in the end we architected for a split-plane architecture (similar to DataBricks) so big customers can host their own data but we can manage the control stack. It's not something we're doing during the alpha but it's part of the plan. Would that work?
      • 2rsf 1358 days ago
        I'm only the end user so I can't judge the details
  • jameslk 1359 days ago
    I had a similar idea, but from the frontend. I think it would solve a few edge cases where frontend logic is needed to complete full state-based interactions. But I think there's now some solutions out there for this. Good to see some traction here though, since automated testing and regression testing is still a very difficult and manual thing to set up and keep updated. Lots of opportunity to make it less painful.
  • ab_goat 1359 days ago
    There's another company that has a bit of an earlier start doing this: ProdPerfect (www.prodperfect.com)

    "Reach new heights with weightless test automation. ProdPerfect is the first autonomous end-to-end (E2E) regression testing solution that continuously identifies, creates, maintains, and evolves E2E test suites via data-driven, machine-led analysis of live user traffic."

    • gkapur 1358 days ago
      These are very different; one is clickstream based and the other sits on the networking layer.
      • ab_goat 1350 days ago
        Whoops! Failed to read past the first paragraph. Also should have added a "similar" in there. Thanks for pointing this out.
  • scraig2020 1358 days ago
    Super Excited about this. Data Variance is hard to solve and the bucket of data that your helping folks access is impressive...
  • mrkurt 1359 days ago
    Can you run tests from different geos?
    • kahrensdd 1359 days ago
      You can run it in your own environment. If you're running Kubernetes we provide an operator that orchestrates the test runs. If you are using docker we give you containers that you control with ENV VARs.

      Do you have more background on the multi geo use case?

      • mrkurt 1359 days ago
        We run a multi-geo service (Fly.io). Replicating user load on distributed apps is hard.

        Containers with env vars are easy though!

        • kahrensdd 1359 days ago
          Yes I recently went through a similar use case with one of my alpha users. They wanted to run the reverse proxy and playback as docker containers spread through their environment. Will drop you an email with more info...
  • CodeNasty 1359 days ago
    Terrific stuff and glad this is becoming industrialized at scale.will keep you guys in mind as we expand
  • samblr 1358 days ago
    Much needed! Congrats on the launch.
  • nunez 1359 days ago
    Congrats on the launch, y’all!!!
  • scraig2020 1358 days ago
    Sweeet.... cant wait to get my hands on it and leverage it
  • jmartens 1359 days ago
    Love it!
  • pixiemaster 1359 days ago
    does it work for gameserver-style multi-client websocket communication?
    • kahrensdd 1359 days ago
      One of our early inspirations was a video game company that would replay recorded gameplay against new builds of a game server. And our proxy assembles each request and response in the order they were received, even if they go with different user sessions. For the alpha we are focused on API type calls over HTTP/S, but websocket implementation is definitely something we are tracking.
  • jtchang 1359 days ago
    Congrats on the launch!