Serious flaws in SQL (1990)

(dl.acm.org)

56 points | by lioeters 9 days ago

10 comments

  • forinti 9 days ago
    SQL needs to make you add a where clause to update and delete.

    If you really want to delete or update everything, then you should have to state it like this:

       delete all rows from customers;
    
       update all rows from items set price=0;
    • masto 9 days ago
      I spent a past life in a world where it was deemed necessary to perform this kind of interactive database surgery, and I used two techniques to try to mitigate the risk: I had my psql client set to disable autocommit (some people fear this because of leaving locks around, but you pick your poison), and my default user didn't have write access. Neither are perfect, but I ended up not being the person who accidentally deleted a whole table because, as they had explained to me with an annoying "I'm so macho" attitude, they always typed the semicolon first.
      • dools 9 days ago
        I had some similar strategies:

        1) Write the WHERE clause first

        2) Run a SELECT on that WHERE clause

        3) Edit the query to make it either update or delete as required

        Similarly, I only ever perform recursive filesystem commands using find and xargs. I run the find first, then hit the up arrow and pipe it to whatever I want to do (I learned that one from a very patient sysadmin after I accidentally chmod'd -R / 664)

        • arscan 9 days ago
          Been a few decades, but I was in a similar situation and came up with a similar approach. Run the select first and then swap in an update or delete.

          If the query was somewhat involved to determine what to delete, I’d write the select query to just select row IDs of rows for deletion (assuming a simple primary key), then wrap that with a “delete from table where ID in (select ID from …complex query…)”.

    • petepete 9 days ago
      MySQL has (or at least had) --i-am-a-dummy-mode which means you have to specify the rows or include a limit.

      https://dev.mysql.com/doc/refman/8.0/en/mysql-tips.html#safe...

      • jimbob45 9 days ago
        Had no idea this existed before this morning. T-SQL devs, if you're out there, please add something like this to MSSQL.
    • est 9 days ago
      Or add a mandatory `where` for delete and update queries.

      To delete/update all, add `where 1=1`

      --------------------------------

      p.s. In mysql I always add `limit 1` for every update. If `limit 99999` isn't enough you are probably doing it wrong.

      • chuckadams 9 days ago
        Jetbrains IDEs add a warning to updates or inserts without a "where" clause, and asks for confirmation when you try to run them.
      • jandrese 9 days ago
        Or just allow "limit unlimited" for those cases where you are working with lots of rows.
    • ZiiS 9 days ago
      Don't think it needs a new syntax, just that the WHERE clause is mandatory:

         DELETE FROM customers WHERE TRUE;
      • tonnydourado 7 days ago
        My first thought is that "where TRUE" is just a wonky thing to have to write, but on further reflection, maybe that's kinda the point. Where-less deletes should be harder to write, making it look stupid will probably help.
    • tomrod 9 days ago
      Most systems I work in raise a warning when an update or insert is missing a where clause.

      For all rows, we typically just use `where 1=1` to encompass all rows. Allows for cleaner readability too as each clause can be separated by an AND

    • 24t 9 days ago
      I find myself caught out more often by incomplete where clauses — paranoid dry runs with rollbacks have saved me more than once.

      Also fwiw DataGrip asks for confirmation before running an update or delete without a where clause

    • mrkeen 9 days ago
      I wouldn't mind if SELECT was guaranteed read-only as well. (rename SELECT INTO).
      • acimim_ha 9 days ago
        "SELECT * INTO table" is not ANSI standard.
    • randomdata 9 days ago
      The original relational calculus, created by the same person who wrote the linked paper, did not allow unbounded deletes. One has to wonder how SQL, which was based on his work, managed to screw things up so badly. These were already solved problems before SQL was created.
    • xnorswap 9 days ago
      I agree preventing DELETE and UPDATE without predicates would be a useful server configuration.

      And you don't even need special syntax, it just forces someone to write DELETE * FROM [customers] WHERE 1=1 to satisfy the need for a predicate.

    • ziml77 9 days ago
      I have Red Gate's SQL Prompt extension installed in SSMS. If I try to execute a delete or update without a where clause, it asks me to confirm that's what I want to do. It also clearly indicates in the editor that there's a problem. If I'm sure that's what I want to do and I want to avoid being warned I just add WHERE 1 = 1 to the end.

      It's a nice feature that adds a little bit of comfort (though I'm still wary enough to hesitate before running SQL data modifications against production)

    • Sakos 9 days ago
      I don't think this is something that can be solved at the language level. What's missing is a semantic, decently intelligent interface that can ask "hey, this is gonna delete everything in the table, is that what you wanted?" and it feels like we're slowly moving in that direction with the recent developments in AI.
    • cess11 9 days ago
      The difference in impact between

        DELETE * FROM tbl 
      
      and

        DELETE * FROM tbl WHERE id > 1
      
      isn't enough to warrant a change to the standard in my opinion.

      Data loss and corruption risks need other forms of mitigation and they'll cover the unconstrained cases as well.

    • bazoom42 9 days ago
      How about having sql statements default to “dry run”, only executing if an additional keyword (like “execute”) is added.
      • byteknight 9 days ago
        That becomes a non-boundary because you'd write it every time.
      • pak9rabid 9 days ago
        Isn't this something that explicit transactions (BEGIN/COMMIT/ROLLBACK) do? Many DB clients can be configured to have autocommit disabled, so that you're forced to either COMMIT or ROLLBACK each transaction.
  • mehulashah 9 days ago
    There was a lot of criticism of SQL back in the day, but it eventually became “inter-galactic data speak” as Stonebraker dubbed it. It’s hard to fight gravity when new languages are adopted.
    • randomdata 9 days ago
      There still is, but SQL is mostly used as a compiler target these days, with users using higher level languages (e.g. "ORMs") to hide the nastinesses. As such, its flaws don't impact all that many people.

      Which is a shame as it needn't be that way. A well designed relational calculus could avoid these abstractions that come with their own problems.

      • pulisse 9 days ago
        > SQL is mostly used as a compiler target these days, with users using higher level languages

        This is not remotely true of analysts and data engineers, who probably write most of the world's SQL.

        • megadal 9 days ago
          Have you seen the queries an ORM generates? Pretty sure they have data engineers beat in query volume, and there's more web developers using ORMs than data engineers.
          • randomdata 9 days ago
            But of the SQL queries written by people, those who write SQL queries write most of the world's SQL queries.
            • cies 9 days ago
              I think pulisse meant "compared to SQL queries written by proxy (e.g. jooq, linq, ORMs, etc.)

              I'm not sure it's true.

      • refset 9 days ago
        Relevant context & reading:

        > for most of the history of sql we did not know how to translate it to relational algebra, and now that we do know most databases still don't do it.

        https://www.scattered-thoughts.net/writing/unexplanations-sq...

        > None of this could be expressed in the original relational algebra, and once you add it all it's not obvious we should even still be calling this an algebra, let alone granting it any mathematical mystique. I'd settle for calling it the 'sql calculus'. Or 'the algebra formerly known as relational'.

        > It is still a reasonably good compiler IR though, and that's still the most useful way of thinking about it.

        https://www.scattered-thoughts.net/writing/unexplanations-re...

      • civilized 9 days ago
        I don't think a single one of these claims is true:

        1. SQL is not mostly used as a compiler target these days.

        2. ORMs do not hide SQL nastiness.

        3. Relational calculus, however well-designed, does not manage to avoid the practical problems of data management.

        • randomdata 9 days ago
          Leaving the following claims that you think are true:

          1. There is still a lot of criticism about SQL.

          2. Users use higher level languages (e.g. "ORMs") [without involving SQL?]

          3. Its flaws don't impact all that many people.

          4. It is a shame [that SQL is mostly used as a compiler target] as it needn't be that way.

          5. A well designed relational calculus could avoid these abstractions that come with their own problems.

          To be fair, it could be that you don't know what you think about those claims, but since, as the original link points out, SQL does not have a "don't know" representation...

          • smaudet 9 days ago
            > 1. There is still a lot of criticism about SQL.

            And rightly so, while you can write whole programs inside it, the debugger/tooling/syntax for most if not all implementations is some mix of inferior/more arcane/legitimately worse.

            > 2. Users use higher level languages (e.g. "ORMs") [without involving SQL?]

            In certain fields (web dev in particular) with "Enterprise" development mindsets, this is often true. ORMs make it eas(y|ier) to hide your SQL.

            At least, until you screw something up and end up either rolling back (expensive) or writing various code snippets (often in SQL) to undo changes. So even in the most common case, no you end up writing SQL if you are doing anything non-trivial or making trivial mistakes.

            As an aside, this more or less lines up with my experience, any time there's an additional target (SQL, Assembly, IL, make files from cmake, visual studio projects from random build system, autotools), usually the higher level tool doesn't save me from having to debug the lower level one. It usually just gets in my way and I curse the person who was "making my life easier".

            Source code in a common language does an unusually good job at this, actually, its a miracle I don't usually need to bust out reversing tools or stare at processor assembly. But these are the popular, well-trodden-path tools, the fringe ones usually are not worth the trouble. ORMs are just barely worth it, IMO.

            > 3. Its flaws don't impact all that many people.

            Hard disagree - I posit most tools worth using are more complicated than the "easy" case, which is a dictionary - inputs and outputs. A "tool" usually has several dictionaries, or several relational tables. So after your third relation, it becomes very easy to miss-design queries (maybe the tooling has just got way better? But that's been my experience, at least).

            And then, if its affecting most tools, its affecting those people who use those tools. So I'd argue the effect is actually very common, if only underappreciated ("I upgraded my PC to be 10x faster and things are fast now!" - used to be a common refrain, that's less true each year).

            > 4. It is a shame [that SQL is mostly used as a compiler target] as it needn't be that way.

            Agreed. Although normally in ORM situations, you'll find the complex/difficult queries hard coded (with parameters of course), so it is used (as not just a compiler target?).

            > 5. A well designed relational calculus could avoid these abstractions that come with their own problems.

            Maybe? Despite complaining, I'm not a database guru, however if you've been paying attention to AI at all, graph databases seem to appear to be superior, in function and form. Relational databases are good enough for many things, but I don't think they are the form most natural for data, just what we as humans like to see as reports.

            • da_chicken 9 days ago
              > Despite complaining, I'm not a database guru, however if you've been paying attention to AI at all, graph databases seem to appear to be superior, in function and form. Relational databases are good enough for many things, but I don't think they are the form most natural for data, just what we as humans like to see as reports.

              This is the same line the NoSQL crowd trotted out 20 years ago, complete with "they don't need joins!" as one of the commonly listed pros.

              The issue is that AI doesn't care about data integrity and consistency the way, say, a medical records database, financial database, or similar database does. Social media is largely the same way. Who cares if a random comment, post, or vote is lost? It's not ideal, but specific facts are not that important. Meanwhile, a lot of RDBMSs do store data where every fact in every field of every row is critical.

              We end up right back at the same point we were at with NoSQL. These alternative data stores make sense for their special purposes, but for general-purpose data storage of objects you can define the important properties of, the RDBMS works extremely well. The RDBMS is one of the oldest and most heavily tested technologies in all of computing. It's extremely unlikely that it will end up being replaced.

              • smaudet 9 days ago
                > Meanwhile, a lot of RDBMSs do store data where every fact in every field of every row is critical.

                Excellent point, however how many applications are truly critical? I didn´t say DMBS (R or otherwise) are useless - I would probably reach for one before a graph database. However I´m debating that we should all use relational algebra instead of SQL. It probably doesn´t really matter for most cases.

              • randomdata 9 days ago
                > The RDBMS is one of the oldest and most heavily tested technologies in all of computing.

                Im not so sure about that. Not since Postgres switched to SQL in 1995 has there even been any notable RDBMS in use.

                As Codd points out in the link, SQL is not relational. He literally invented the relational model, so he is kind of the authority.

                Maybe SQL is even better, but “R” it is not.

        • cies 9 days ago
          > 2. ORMs do not hide SQL nastiness.

          This is certainly true: the do not hide nastiness, they simply make dealing with the db more (much more) nasty.

          I mean: ORMs are now well known to "make the easy queries slightly more easy, while making intermediate queries really hard and complex queries impossible". So for anything slightly complex you still need to break out of your ORM to... yes... plain old SQL.

          I think the are of ORMs is over. It simply did not deliver.

          If a book on SQL is --say-- 100 pages, a book on Hibernate is 400 pages. So much to learn just to make the easy queries slightly easier to type? Just not worth it. You still need to know SQL if you use an ORM!

          I prefer jooq any day over ORMs. And dont get me started over what tools like Hasuna have to offer.

          There are also some languages (forgot the names) that are SQL-done-right. Select in the back, more type safe, more logic, more in the same steps as the query gets executed. These need to be adopted by PG and MySQL and we're good to go. (IMHO)

          https://www.jooq.org/

          https://hasura.io/

          • randomdata 9 days ago
            > the do not hide nastiness

            They do, though. Some of the problems Codd speaks of are not made possible in the high level language. Depends on implementation, of course, but as a rule.

            I mean, think about it: If you were building an ORM, why would you make the very same design mistakes?

            That's not to say you won't make all kinds of your own design mistakes (there is no ORM that isn't full of design mistakes), but I mean why make the design mistakes that are already well studied and you know to avoid?

      • Swizec 9 days ago
        > with users using higher level languages (e.g. "ORMs") to hide the nastinesses

        Maybe I’ve been doing it wrong my whole career because every project I’ve worked with the engineers eventually said ”Screw this leaky abstraction. Having to know SQL _and_ how the ORM works on top is too much, we’ll just put everything in an ORM.rawQuery(…)”

        • magicalhippo 9 days ago
          We prefer writing our own SELECTs so we have more predictable performance, also it makes debugging production issues easier.

          A poor ORM-generated query might be fast enough during testing but crashes and burns when it hits client's database. Not us, but a perfect example was the recent story discussed here[1] on optimizing an ORM-generated query to get 1200x speedup.

          [1]: https://news.ycombinator.com/item?id=40069583

        • hn_throwaway_99 9 days ago
          100%. I've come to believe that essentially every software engineer, if they write software long enough, comes to despise ORMs. As I always say "They make the easy stuff a little easier, and they make the harder stuff way harder". I have yet to see an ORM actually save me time in the long run; most of the time it just feels like I'm battling it.

          SQL definitely has its warts, but I don't feel like ORMs make it possible to ignore those warts at all. I'd love it if a better language like PRQL was accepted by DB engines and just compiled to SQL in the DB. Was a good blog post a couple years ago that I think highlights a lot of problems with ORMs and query builders: https://gajus.medium.com/stop-using-knex-js-and-earn-30-bf41...

          • smaudet 9 days ago
            I too think ORM was the wrong approach. I still enjoy SQLite - its not perfect, but every time I see it being used I get the warm and fuzzies.

            We should have just built better libraries around using/targeting one flavor (SQLite, e.g.), instead of wasting time with bad libraries that transformed one bit of bad code to another bit of bad code and ran it on a database engine that wasn't even doing it right...

            When you solve a problem, you solve it at its source, not five miles away with a teleprompter. If relational algebra is what we should all have been using, then relational algebra is what we should have been optimizing for, not overpriced SQL database engines.

            To shamelessly steal from another comment here: https://www.scattered-thoughts.net/writing/unexplanations-re... https://www.scattered-thoughts.net/writing/unexplanations-sq...

      • jandrewrogers 9 days ago
        I think you greatly overestimate how many people use ORMs instead of SQL. SQL usage is ubiquitous in my experience.

        That said, even SQL databases usually don't run SQL. They run a virtual machine against which SQL is compiled. A more enlightening question is why there are no alternative/better query languages compiled against the same target that SQL targets since that would be strictly additive.

      • vkazanov 9 days ago
        Unfortunately, this is wrong. Sql is everywhere in databases.

        I think it's horrible, it's an endless torture of the beautiful-beautiful relational algebra, but it is more popular than ever and there no way around it.

      • tpm 9 days ago
        It does not matter how well designed it would be, it would still remain a compile target for the folks of 'everything we write is in _one_language_'. You could expect a beginner Java developer to also write proper SQL, but we don't do that anymore, which leads us to suboptimal SQL written by ORMs. I don' really blame anyone, it's a natural development and can't be changed. People use tools they don't understand all the time.
      • dools 9 days ago
        Every time I've done a project using an ORM, I wound up either writing SQL directly or spending a long time trying to make the ORM do what I knew the SQL should do.
      • elevation 9 days ago
        Could you describe the features would you add to a data query language that would eliminate the need for an ORM?
        • randomdata 9 days ago
          To be clear, we're talking about what would more reasonably be called a query builder, but which for some reason has taken on the "ORM" monicker. Ever since duality views were invented, it's questionable how relevant ORMs (in the original sense) even are – maybe as a remaining stop gap for working with DB engines that haven't yet acquired such a feature.

          I'm not sure if "add" is the right framing, but I posit the reason people reach for query builders/ORMs is:

          1. Because SQL is not reasonably composable.

          2. Because SQL syntax is horrid.

          3. Because SQL implementations are terribly inconsistent.

          Adding some layer of protection above the pitfalls Codd speaks to in this paper is an additional benefit, but I don't expect that is a driving motivator; still something to address. I'm probably forgetting other things, but something that tackles those issues would go a long way in reliving any desire to add another abstraction, with its own set of problems, on top.

  • jroseattle 9 days ago
    Way back in the day, I picked up a cheap ($5!!!) copy of Readings in Database Systems from Michael Stonebraker. I found it fascinating to read the original papers that proposed concepts, and then to see those concepts implemented and become the norm.

    What I didn't expect was the amount of drama within the context of those papers. In Codd's original paper on relational theory in the 60s, he spends a chunk of time dedicated to talking about IBM as "the man". Hilarious.

    • commandlinefan 9 days ago
      > concepts implemented and become the norm

      I had a coworker some time in the late 90's who was reading the book "Design Patterns". We were working in Java at the time and he remarked how odd it seemed to see a book that was targeted at C++ programmers use terms like "interfaces" which were "Java concepts".

  • rbanffy 9 days ago
    Well... That didn't age well...

    In the end, we all learned to work with SQL, and to go around its idiosyncrasies. We even learned how to put objects in tables and to force trees into rectangles.

    • forgetfulness 9 days ago
      The author of the paper invented SQL
      • randomdata 9 days ago
        He invented Alpha as part of his relational theorem, although it was never implemented. QUEL, of the Ingres/Postgres lineage, was heavily influenced by the design of Alpha and is the closest thing we have to what he envisioned.

        SQL comes from Boyce and Chamberlain.

      • sixbrx 9 days ago
        invented the relational model rather
    • randomdata 9 days ago
      > In the end, we all learned to work with SQL, and to go around its idiosyncrasies.

      In the same way we've learned to work with C and all its memory idiosyncrasies. But let's keep that between you and me as the poor Rust crowd is going to cry when you tell them that their language is unnecessary.

      The problems Codd speaks of here are exactly the source of database-related bugs I continually see out in the wild. I'd expect to see it even more frequently, but a lot of developers have started using higher level languages like 'ORMs' that add some layer of protection from those issues. It aged quite well, frankly.

    • acimim_ha 9 days ago
      and we got "IS [NOT] DISTINCT FROM" in the meantime, for better or worse
  • sinuhe69 9 days ago
    It proves that hyperbole and over-dramatization are not a new phenomenon. Btw, what are physiological features? I guess the authors meant physical features, didn’t they?
    • evertheylen 9 days ago
      > As used here, the term "psychological" refers to what is often called the human-factors aspects of a language. The term "logical" refers to the logical power of a language, especially the power achievable without resorting to the usual programming tricks, such as iterative loops.

      It goes on to give an example of two queries which "one might reasonably expect" to be equivalent but give different results. (Personally, I disagree, but maybe I've been biased too much by SQL.)

    • housecarpenter 9 days ago
      The paper says psychological, not physiological.

      It looks like what Codd means by "psychological features" are features that are (or are intended to) make it easier for users to express what they want to do in the language, as opposed to "logical features" which are what actually enable things to do. It's similar to the concept of "syntactic sugar", I guess. In a well-designed language, there should be a relatively small, theoretically well-understood core of logical features, and the psychological features should be straightforwardly expressible in terms of those logical features. This makes it easier to understand how the language works, which to me, does seem like an important thing for a language (it makes it easier to learn, easier to optimize, easier to add new features to without causing surprising interactions, etc.)

      Codd's complaint is that SQL isn't like this, specifically with regard to the psychological feature of being able to nest queries. This was intended to be a more user-friendly alternative to using predicate logic. But instead of designing the nested queries feature as a syntactic sugar layer on top of an underlying predicate logic core, both features were implemented at the "core" level. The result is a hodgepodge language which isn't clearly based on either predicate logic or nested queries alone.

  • civilized 8 days ago
    Allowing duplicate rows is the only practical decision for the following reasons:

    1. Most of the ways you can mess up in SQL involve failing to respect unique keys in joins, and this can be done just as easily whether or not duplicate rows are allowed.

    2. Enforcing no duplicates would be very expensive.

  • owlstuffing 9 days ago
    It’s interesting that the relational model reflects interface inheritance as opposed to implementation inheritance. In my view it’s another indication that (true) delegation is generally a more suitable way to design software.
  • jokoon 9 days ago
    At least, refusing to use SQL gave me opportunities to understand how data structures work: B-tree, RB trees, hash tables, R-trees, Z order curve, quad tree, KD tree, BSP, grids, N-ary tree, geohash, etc etc
    • chuckadams 9 days ago
      Which ironically enough, probably makes you more qualified than most to write efficient SQL or even a SQL implementation.
  • da_chicken 9 days ago
    I mean, most of these have been addressed. 1990 was a very long time ago.

    > SQL permits duplicate rows in relations

    Indeed.

    One of the first things you learn is that, no matter what you're doing, all tables should have a primary key. It's simply too annoying otherwise. It's just as easy to make your own surrogate key in the case where there aren't enough natural identifiers, and anyone with any experience will do so. It's almost entirely a beginner trap.

    That said, I've seen so many applications get a simple many-to-many relation wrong for representing tags or categories on an object that I do understand where Codd's complaint is coming from.

    > it supports an inadequately defined kind of nesting of a query within a query;

    Thanks to the efforts of ANSI and ISO, this is much improved. It should be noted that 1990 is before the introduction of the LEFT JOIN syntax in SQL-92. At this time, the only JOIN syntax was the ANSI-89 "comma" join, and if you wanted an outer join you had to rely on vendor extensions. This only took about 20 years to resolve.

    The alternate IN syntax at the top of p382, for example, is now valid syntax in PostgreSQL and other RDBMSs. Further, the EXISTS operator allows this type of query to occur.

    > it does not adequately support three-valued logic, let alone four-valued logic.

    Additional operators have improved three-valued logic quite a bit.

    I'm not sure I've seen many systems that would require or benefit from four-valued logic. I'm not entirely sure what Codd is looking for here. I think he wants MAYBE for outer joins? I don't know. Indeed, most DB designs seem to go out of their way to ensure that unless you're dealing with outer joins that you're likely to only have two-valued logic. Three-valued logic is already too complicated for a lot of people.

    • jklowden 9 days ago
      > It's just as easy to make your own surrogate key in the case where there aren't enough natural identifiers

      A row with no natural is misdesigned. Proof: what does the row _mean_ if its values don’t distinguish it uniquely in the real world?

  • lukaseder 8 days ago
    Everyone has a different set of "serious" flaws to share.