One of the biggest reasons why we left Aurora for an rds solution was that basic ddl operations like table alters would fail to work on db sizes greater than 2TB. Such operations required double the amount of disk storage for copying. We were forced to solutions like github Ghost or Percona if we wanted to stick with Aurora. After thoroughly testing these solutions, it would take close to a month to perform such an operation on large tables. An rds solution only took a few days. I recommend a pure rds solution for db sizes greater than 1tb.
The big reason that DDL is slow is because these systems haven't tried to make it fast.
This is, of course, a contary opinion so hear me out before judging me ;)
My thinking is thus:
There are lots of virtual storage engines in the mysql world such as 'federated' and 'spider' and 'union' and such. These actually abstract away more than one data-source, and often the engine is smart enough to support when the data sources don't have identical schema.
These virtual storage engines demonstrate that a layer of abstraction can cope with casting queries across more than one non-identical tables.
So, either built in or as a storage engine abstraction, databases _could_ support DDL changes by putting rows with each version of the schema into actually different tables, and casting queries across them etc transparently.
Another approach is that taken by the postgres engine, where each row has a version and some DDL such as add column with default null can be done instantly. (With a bit more thought, even defaults could have been coped with instantly; its a shame they weren't.)
I poke around in mysql storage engine code, among other things, and its entirely doable. It just needs to be a goal. Wish it was on my day job list of things I can work on. I hope someone else has an itch that needs scratching.
plus the variations in different versions (e.g. Amazon RDS for MySQL supports 5.5, 5.6, 5.7, and 8.0).
Without knowing that it's hard to understand how you solved your problem. E.G. If you switced from Aurora MySQL to RDS for MySQL i expect you'd still need to use a tool like pt-osc and copy your tables.
Hey stevev I know it's been a bit since you made this comment but could you provide further details on the tool and the benefits you found from the switch? I'm currently on Aurora MySQL and have hundreds of GB and am feeling it in a not great way. I'd love to get a feel for a path forward that could be better.
Interesting, that's a surprising oversight. WE've just updated our system to maria in RDS, I'm interested to see the performance and hopefully convince some others to give aurora a try.
A simple table alter operation such as adding a column or modifying a column on a big table would break once the instance reached maximum disk storage space due to copying.
These solutions offered a way to alter the table without using all the disk space available on the instance. Thus bypassing the storage issue.
Yes, Aurora has connection pooling that allows it to scale to thousands of concurrent connections:
> Amazon Aurora for MySQL comes with internal server connection pooling and thread multiplexing, which helps reduce contention and improve scalability when dealing with thousands of concurrent connections. You can configure each Aurora DB instance to allow up to 16,000 concurrent connections.
Aurora is still ultimately a standard cluster of MySQL (or other) instances on the front end — it's just the storage engine that's significantly different.
The big reason that DDL is slow is because these systems haven't tried to make it fast.
This is, of course, a contary opinion so hear me out before judging me ;)
My thinking is thus:
There are lots of virtual storage engines in the mysql world such as 'federated' and 'spider' and 'union' and such. These actually abstract away more than one data-source, and often the engine is smart enough to support when the data sources don't have identical schema.
These virtual storage engines demonstrate that a layer of abstraction can cope with casting queries across more than one non-identical tables.
So, either built in or as a storage engine abstraction, databases _could_ support DDL changes by putting rows with each version of the schema into actually different tables, and casting queries across them etc transparently.
Another approach is that taken by the postgres engine, where each row has a version and some DDL such as add column with default null can be done instantly. (With a bit more thought, even defaults could have been coped with instantly; its a shame they weren't.)
So, blame the DB designers!
Within Amazon's Aurora and RDS product families (https://aws.amazon.com/rds/), there are
* Amazon Aurora MySQL
* Amazon Aurora PostgeSQL
* Amazon Aurora Serverless
* Amazon RDS for MySQL
* Amazon RDS for MariaDB
* Amazon RDS for Oracle
* Amazon RDS for SQL Server
plus the variations in different versions (e.g. Amazon RDS for MySQL supports 5.5, 5.6, 5.7, and 8.0).
Without knowing that it's hard to understand how you solved your problem. E.G. If you switced from Aurora MySQL to RDS for MySQL i expect you'd still need to use a tool like pt-osc and copy your tables.
Thanks!
What do you mean by this?
These solutions offered a way to alter the table without using all the disk space available on the instance. Thus bypassing the storage issue.
https://dl.acm.org/citation.cfm?id=3196937
https://aws.amazon.com/blogs/database/amazon-aurora-under-th...
https://aws.amazon.com/blogs/database/amazon-aurora-under-th...
https://aws.amazon.com/blogs/database/amazon-aurora-under-th...
https://aws.amazon.com/blogs/database/amazon-aurora-under-th...
> Amazon Aurora for MySQL comes with internal server connection pooling and thread multiplexing, which helps reduce contention and improve scalability when dealing with thousands of concurrent connections. You can configure each Aurora DB instance to allow up to 16,000 concurrent connections.
https://aws.amazon.com/blogs/database/planning-and-optimizin...
Most of the connections were idle too.
Any pooling they are doing is not very effective.
The product is pretty disappointing.
From the docs, the largest instance size defaults to the recommend maximum of 6,000 connections (16k maximum) per server (and only one write-based sever and 15 read replicas) https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...