Design Considerations for High-Throughput Cloud-Native RDBMS [pdf]

(allthingsdistributed.com)

90 points | by yarapavan 1925 days ago

4 comments

  • stevev 1925 days ago
    One of the biggest reasons why we left Aurora for an rds solution was that basic ddl operations like table alters would fail to work on db sizes greater than 2TB. Such operations required double the amount of disk storage for copying. We were forced to solutions like github Ghost or Percona if we wanted to stick with Aurora. After thoroughly testing these solutions, it would take close to a month to perform such an operation on large tables. An rds solution only took a few days. I recommend a pure rds solution for db sizes greater than 1tb.
    • willvarfar 1925 days ago
      This has always frustrated me.

      The big reason that DDL is slow is because these systems haven't tried to make it fast.

      This is, of course, a contary opinion so hear me out before judging me ;)

      My thinking is thus:

      There are lots of virtual storage engines in the mysql world such as 'federated' and 'spider' and 'union' and such. These actually abstract away more than one data-source, and often the engine is smart enough to support when the data sources don't have identical schema.

      These virtual storage engines demonstrate that a layer of abstraction can cope with casting queries across more than one non-identical tables.

      So, either built in or as a storage engine abstraction, databases _could_ support DDL changes by putting rows with each version of the schema into actually different tables, and casting queries across them etc transparently.

      Another approach is that taken by the postgres engine, where each row has a version and some DDL such as add column with default null can be done instantly. (With a bit more thought, even defaults could have been coped with instantly; its a shame they weren't.)

      So, blame the DB designers!

      • ddorian43 1925 days ago
        From a high level things look doable, but going deep in the c/c++, they're not.
        • willvarfar 1925 days ago
          I poke around in mysql storage engine code, among other things, and its entirely doable. It just needs to be a goal. Wish it was on my day job list of things I can work on. I hope someone else has an itch that needs scratching.
    • sciurus 1925 days ago
      What flavor of Aurora were you using, and which RDS engine and version did you switch to?

      Within Amazon's Aurora and RDS product families (https://aws.amazon.com/rds/), there are

      * Amazon Aurora MySQL

      * Amazon Aurora PostgeSQL

      * Amazon Aurora Serverless

      * Amazon RDS for MySQL

      * Amazon RDS for MariaDB

      * Amazon RDS for Oracle

      * Amazon RDS for SQL Server

      plus the variations in different versions (e.g. Amazon RDS for MySQL supports 5.5, 5.6, 5.7, and 8.0).

      Without knowing that it's hard to understand how you solved your problem. E.G. If you switced from Aurora MySQL to RDS for MySQL i expect you'd still need to use a tool like pt-osc and copy your tables.

      • stevev 1924 days ago
        We were on Aurora MySQL with the highest tier available. Switched to RDS MySQL 5.7. AWS has a database migration tool that we used to migrate over.
        • hnmonkey 1923 days ago
          Hey stevev I know it's been a bit since you made this comment but could you provide further details on the tool and the benefits you found from the switch? I'm currently on Aurora MySQL and have hundreds of GB and am feeling it in a not great way. I'd love to get a feel for a path forward that could be better.

          Thanks!

    • ravedave5 1925 days ago
      Interesting, that's a surprising oversight. WE've just updated our system to maria in RDS, I'm interested to see the performance and hopefully convince some others to give aurora a try.
    • tirumaraiselvan 1925 days ago
      > We were forced to solutions like github Ghost or Percona if we wanted to stick with Aurora

      What do you mean by this?

      • stevev 1924 days ago
        A simple table alter operation such as adding a column or modifying a column on a big table would break once the instance reached maximum disk storage space due to copying.

        These solutions offered a way to alter the table without using all the disk space available on the instance. Thus bypassing the storage issue.

  • ignoramous 1925 days ago
    The Aurora team at Amazon subsequently published Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

    https://dl.acm.org/citation.cfm?id=3196937

  • tirumaraiselvan 1925 days ago
    No mention of connection pooling? Does Aurora handle 1000s of concurrent connection considering it is cloud-native?
    • wgjordan 1925 days ago
      Yes, Aurora has connection pooling that allows it to scale to thousands of concurrent connections:

      > Amazon Aurora for MySQL comes with internal server connection pooling and thread multiplexing, which helps reduce contention and improve scalability when dealing with thousands of concurrent connections. You can configure each Aurora DB instance to allow up to 16,000 concurrent connections.

      https://aws.amazon.com/blogs/database/planning-and-optimizin...

      • etaioinshrdlu 1925 days ago
        I had an issue where Aurora MySQL would refuse connections above 1000 even though Max Connections was configured several times higher.

        Most of the connections were idle too.

        Any pooling they are doing is not very effective.

        The product is pretty disappointing.

    • kondro 1925 days ago
      Aurora is still ultimately a standard cluster of MySQL (or other) instances on the front end — it's just the storage engine that's significantly different.

      From the docs, the largest instance size defaults to the recommend maximum of 6,000 connections (16k maximum) per server (and only one write-based sever and 15 read replicas) https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

  • hdpq 1925 days ago
    curious, why is this paper just now making the front page of this site? it's been published several years ago.
    • kondro 1925 days ago
      Probably renewed interest caused by the Mongo/DocumentDB drama.