This article was missing a couple of major points:
1. He doesn't say whether the RDS DB is Postgres or MySQL. The MySQL connection times tend to be much shorter than Postgres. With Postgres, it's a very good idea to put use a pgbouncer instance to implement connection pooling and have the Lambdas connect to that.
2. As others point out here, in AWS it is a very bad idea to have your RDS DB not behind a VPC. When you do that, it basically makes cold start times completely untenable for a user-facing synchronous function. Amazon has promised they are improving this situation, not sure what the status of this is.
3. Author points out some of this in his article, but caching your connection in global scope opens up a whole host of very difficult to track down bugs (e.g. you can have lots of cached connections open and thus killing your DB connection limits, needing to handle dead connections, etc.)
In my opinion, and with a lot of experience on a high traffic consumer site, it's a bad idea to cache connections in global scope. Either use MySQL or pgbouncer and connect in the function body.
We went down the Lambda route at work, and I think ultimately it's a poor fit for the type of service that is being described here.
Do your persistence elsewhere, probably in an API wrapped around RDS.
SSM or other secrets needing decryption and fetching at runtime should use the AWS recommended method of storing this data in the global scope (at least that's how it works in Node), but this should feel icky because it is icky.
Mixing statefulness into things that are inherently intended to be stateless probably indicates you've chosen the wrong tool for the job.
Each lambda has some state which, if one is sensible can be used to cache things. Having a db connection cached is perfectly sensible, assuming there is some retry logic.
Making an API around RDS its just a boat load of tech debt, when its trivial to cache a connection object. Not only that, it will add latency, especially as the goto API fabric of choice is http. So a 20ms call(worst case) can easily balloon upto 80ms (average)
its now possible to have SSM secrets referenced directly in lambda environment variables, so I don't see the need for the complications of doing it yourself.
Lambda is just cgi-bin, for the 21st century. There is nothing magic to it.
After they fix VPC cold starts (the ENI issue) I hope we get Lambda lifecycle events. onDestroyed, onFrozen, onThawed, etc. You kind of get onCreated now if you put the code into global scope but that's gross
Yeah, I should have been clearer, it's not that it can't be done, it's just a bit messier than it seems at first blush.
The issue is that actions taken by stateful clients can't always be retried.
You can detect these kinds of errors, but it's klugey because these drivers generally assume that sockets dying is relatively uncommon, and it's painful to test this behavior when FaaS doesn't give you direct control over it.
So you typically wind up either with a proxy, or you decorate any handlers to refresh the connection ahead of time to make sure you have a fresh object.
Agree, at the moment Lambda doesn't quite make it for directly calling RDS DBs, because of the connection problem and because your DB should be in a VPC, which has horrendous cold start implications for Lambda (it needs to attach an ENI in the invoke path, which takes many seconds).
Re: the connection issue, it would be great if there were a service or RDS feature that could do the DB connection pooling behind a standard AWS API, so that we don't have manage connections in Lambda and so that we can query the DB from public internet with standard AWS auth without having to expose the DB itself.
Why should DB be in a VPC? I've seen their architecture guidance recommend that, and it often gets turned into gospel in projects. I wish AWS was culturally more pro-internet. VPCs, bastion hosts, etc create more complexity and work, the cost of which could be more productively invested in other security.
Lambda is an incredibly interesting architecture, but its pretty rare to find a use-case that makes sense given its inherent limitations. There's always going to be a cold start; even if they manage to optimize the runtime cold start to near-zero, functions still have stuff they gotta do once they start.
Google's recent Cloud Run service feels like a much more generally useful Serverless platform. Even AWS Fargate, despite not having the whole "spin the containers down when its not serving requests" feature, is how I envision most customers adopting for Serverless on the medium term (especially with their recent huge price reduction, its actually competitive with EC2 instances now).
AWS needs to get Fargate support added to EKS. Stat. It was promised 18 months ago at Re:Invent and is still listed in their "In Development" board on Github. That'll change the game for Kubernetes on AWS, because right now its kind of rough.
A predictable near-zero cold-start would be amazing for most applications. Currently in the most extreme cases we've seen 20 second cold starts sometimes which just makes it seem like it's not ready for production.
Yes, but each database instance has a limited number of connections that can be open. You can use Lambda to handle as many web requests as you want, but if each Lambda invocation is creating a new database connection you've only shifted your bottleneck one layer down the stack.