Foundations of Databases (1995)

(webdam.inria.fr)

662 points | by tosh 1829 days ago

15 comments

  • craigkerstiens 1828 days ago
    I'd also highly recommend the redbook (readings in databases) which is by several luminaries in the database world including Peter Bailis, Joe Hellerstein, and Michael Stonebraker. They also updated it in recent years to ensure it was up to date for the latest research on databases as well - http://www.redbook.io/
  • acacar 1828 days ago
    My Ph.D. advisor gave me his copy when I was just getting started with my thesis, almost twenty years ago. I remember spending 6-7 very intense weeks with it. It gave me most of the mathematical/logical background specific to databases I needed to finish what was primarily a DB focused dissertation. I kept coming back to it all through my studies and later.

    If you ever need to design a logically sound query processor/optimizer and/or data storage layer, this is a book you should study.

    It still sits on my bookshelf, waiting for the student whose interest in databases is deeper than dujourDB, so that I can pass it down to them.

    • ativzzz 1828 days ago
      What are you working on now? If in industry, do you actively use what you learned in your PHD? Was it worth it?
      • acacar 1828 days ago
        Mostly bioinformatics these days. I’m in academia, and have been virtually all my career, so I can’t really speak to whether a Ph.D. is worth it in industry.
  • billfruit 1828 days ago
    Of all the database theory books I have seen, I found the C.J Date book(An Introduction to Database Systems) to be the most elegant one, though the edition I used attempted to use tutorialD than SQL. What I liked is that it brought out the beauty of relational algebra to the reader with great clarity.
    • thunderbong 1828 days ago
      Absolutely!

      It's still one of my favourite technical books.

      I read it when I started looking at RDBMS after having worked in NoSQL and I clearly recall how excited I was to learn that there is a full fledged system that I could utilize which would get rid of 90% of my code.

      I know this term is usually not used when it comes to technical books, but I found it to be quite a page turner!!

    • mistrial9 1828 days ago
      .. with a caveat of his efforts to remove NULL
      • j88439h84 1828 days ago
        That's definitely part of the elegance from my perspective.
        • geertj 1828 days ago
          I used to think this also. However after diving really deep into OLAP for business reporting over the last 6 months, I find NULL to be the single most powerful idea in SQL. Here’s some use cases for NULL that I find super useful:

          COUNT(case when <complex condition> then 1 end) as foo

          Count occurrences of complex condition over a row set. Works because COUNT ignores null.

          NVL(1/NULLIF(foo, 0), 0)

          Protect against division by zero. The expression 1/X is a simple example but this generalizes well. In fact, I’ve come to think that NULL turns any SQL value into a Monadic option type like those used functional languages and Rust.

          Note that I’m not saying the above cannot be done without NULL, just that NULL is a really useful tool in my experience.

          • lostmyoldone 1828 days ago
            Done a fair bit of BI and OLAP this last decade, and I still feel NULL cause more issues than it solves.

            For the counting case, sum(case when ... then 1 else 0 end) generally works as well, and generally att little to no cost.

            Having nullability as a concept in the query language isn't very problematic though, I might even consider it be helpful on average bas long as a view or table can never contain nulls

            Nullable columns in tables and queries is the problem. It boils down to how each nullable columns (in)effectively describes a schema equivalent to a table for each nullable columb, and one for the non-nullable, where the relationship between these tables are quite under constrained.

            Thus to ensure data quality and correctness you will have to query all nullable columns of interest to make sure that the various combinations of nulls in those columns doesn't interfere with what you are trying to report. You will also have to track that no component of the system suddenly starts reporting nulls in columns where it didn't use to, as that is equivalent to a schema change.

            This wouldn't be much of an issue if there were never more than one nullable column per table, or att the very least that only the absolute minimum of columns were nullable. This is however only very rarely the case, usually there are far too many nullable columns. To make reliable and stable reporting one then has to solve the under specification of the source system in the analysis and reporting pipeline for a class of issues that for the most part should be the responsibility of the source systems.

            Best use of nullls I know of, is using them to lower cost of table scans in Google BigQuery.

          • RmDen 1828 days ago
            Just be careful not to use * , count does not always ignore NULLS, if you specify a column, it ignores NULLS, if you use * then it counts it (this is on SQL Server btw... might be different on other platforms)

              create table foo(bar int)
              insert foo values(1),(null),(2)
            
              select count(*), count(bar) from foo
            
              3 2
          • tonyg 1828 days ago
            It's exactly unlike a monadic option type. It's much more like null is in Java, C# etc. Every type gets a NULL value here, where in ML/Haskell/etc use of Option/Maybe/etc is explicit.
            • quelltext 1828 days ago
              I believe they were well aware.of that but specifically pointing out similarity with Maybe in how NULL values just bubble up just like Nothing.

              This is very different from Java, where a chain of operation on null values typically very quickly leads to runtime exceptions.

          • nkozyra 1828 days ago
            Also that one purpose for NULL is "unknowable" or "undefined," which means you're jamming some other value there in its place.

            I know NULL causes a lot of heartache but it is a useful concept in DB design.

  • mitchtbaum 1828 days ago
    Chapter 3, 'The Structure of the Relational Model', has a helpful background on the origins of table-oriented relational algebra. Unfortunately, it misses the importance of graphs, which afaict, transcend the dependence on tables' bias for normalized topologies of pathologically regular 2D metric spaces, and it even reinforces their relevance by saying, "Intuitively, the data is represented in tables...". Whereas,

    > Graphs allow us to build complex, intuitive relationships between data points. Unlike many traditional methods of structuring data, which focus on the format of relationships, graphs focus on the topology of relationships in the dataset.

    https://blog.usejournal.com/data-modelling-with-graph-theory...

    When it comes to making database concepts more intuitive, I find visually rich representations very helpful, for example using geometric space-encoded logical relationships to aid diagrammatic reasoning.

    It goes along well with many other people's efforts to improve mathematical represenations and interfaces through intuitive visualizations, ie:

    https://intuitive-math.club/

    http://acko.net/

    https://www.reddit.com/r/mathgifs/

    https://www.reddit.com/r/visualizedmath/

    ...

  • mushishi 1828 days ago
    If you are curious how databases are implemented, here is a simplistic implementation in clear abstractions https://github.com/komu/ahwen

    It's a good idea to familiarise with the code in bottom-up order of layers.

    Readme says this: > The implementation takes heavy inspiration from Edward Sciore's SimpleDB, augmented by implementations of various exercises in his textbook Database Design and Implementation.

    • amelius 1828 days ago
      Nice. It would be interesting to have such an implementation for a distributed database as well.
  • fnord123 1828 days ago
  • bostonvaulter2 1828 days ago
    I see that this is now #3 on the homepage with 112 points but no comments yet. Are people already very familiar with this work? Can some expound on its merits?
    • bentona 1828 days ago
      I think that a common occurrence on HN, which I'm definitely a perpetrator of, is upvoting things you wish you had a better grasp on. Theory-heavy & math posts seem especially prone to this.
    • chongli 1828 days ago
      Perhaps people find it interesting but are too shy to leave the first comment. Like being the one to ask a question during a lecture, sometimes it's really hard to put yourself out there.
    • mj_olnir 1828 days ago
      I'm curious too. As a CS grad student with relatively little database knowledge, would this be a good text to start learning?
      • evil-olive 1828 days ago
        I haven't fully read it (yet), but based on a quick scan of a few chapters, it looks like a dense but well-written introduction.

        Some additional resources I'd recommend if you're interested:

        Designing Data-Intensive Applications is a fantastic place to start if you're interested in the intersection of databases & distributed systems:

        https://dataintensive.net/

        The Architecture of Open-Source Applications book has a fewer chapters on databases:

        https://aosabook.org/en/bdb.html

        https://aosabook.org/en/hdfs.html

        https://aosabook.org/en/nosql.html

        There's some fantastic documentation on Postgres and its internals:

        http://www.interdb.jp/pg/index.html

        https://momjian.us/main/presentations/internals.html

        https://www.postgresql.org/docs/current/internals.html

        • mj_olnir 1828 days ago
          Thanks so much! I'll definitely take a look into these.
      • jpamata 1828 days ago
        On the preface of the book (omitted from the ebook), it details several learning paths:

        The Classic: a basic database theory course covering the classical material would center around Parts B and C. Chapter 10 and parts of Chapter 9 of Part C are somewhat more advanced and could be skipped. If time allows, some of Chapter 12 and a selection of Part F might be covered.

        Feast of Query Languages: a course on the theory of query languages would start with a quick review of the basic material on classical languages (Part B), and continue with Parts D and E. If time allows, some material on languages for complex objects and object-oriented databases (Part F) could be covered.

        Gourmet Sampling of Database Theory: a course for people with theoretical appetites that emphasizes the specificity of database theory. Logicians wishing to explore the connection between finite-model theory and databases will be interested in Parts C and E. Those interested in descriptive complexity will find Part E closest to their hearts. Researchers in logic programming will prefer Part D, particularly Chapters 12, 13, and 15. People with a background in theoretical artificial intelligence will find Parts D and F of particular interest. Rule-based systems are related to Chapter 14 (see also parts of Chapter 22). Programming language people will be interested in much of the material on query languages, including Chapters 20 and 21 in Part F.

        Fast-Food Database Theory: a course for applied database students that is meant to provide informal exposure to some of the basic results of database theory. This would include assorted topics, with an informal presentation emphasizing the examples, results, and intuition provided in the text, rather than the proofs. A possible syllabus would include Part B; parts of Chapters 8, 9, and 11 in Part C; Chapter 12 and parts of Chapter 15 in Part D; and selected chapters of Part F.

        Numerous exercises have been included, and they are essentially of three categories. Routine exercises (unmarked) require easy manipulation of the basic concepts and results. Harder exercises are marked with a (*). Another category of exercises is meant to complement the material in the text and often contains results from related research articles. These exercises are usually on the hard side and some may constitute term projects. These exercises are marked with a ( ).

      • basetop 1828 days ago
        With databases or anything in general, I believe in doing first and then learning the principles.

        Play around with a small RDBMs like sqlite. Do a project around it. Look at the source maybe and then learn about the principles. Foundations of databases are based on discrete mathematics ( where relational comes from in RDBMs ) so a background in discrete math helps.

        For enterprise level stuff ( clustering, replication, distribution, etc ), it's more practical to learn on the job where you have access to the hardware, servers, etc.

      • wslh 1828 days ago
        This is a text for understanding the fundamentals of databases but might be too deep if you just want to use an existing database.
    • littlestymaar 1828 days ago
      This is really interesting. Now it's ranked #20 with more than 500 points, and it now has 36 comments. But none is actually about the link. People are giving advice on database books they read, and arguing about which one is the best, but it looks like nobody read this particular one. Yet it's massively upvoted.
      • Stratoscope 1828 days ago
        That's not surprising. People often upvote an interesting conversation regardless of the actual article link.

        In fact I added this thread to my favorites because the book references look useful.

    • pault 1828 days ago
      I think a lot of people use the upvote to bookmark a link. The favorite feature was rolled out much later and I've always used votes for that purpose.
    • a-saleh 1828 days ago
      I have seen it previously and it reads really well :-)
  • slyrus 1828 days ago
    I took Serge's class when he was on sabbatical (?) teaching at UC Berkeley back in the days of steam-powered web browsers. Was a great course but it felt like the theory wasn't _quite_ there yet to be the next big thing in databases.
  • aikah 1828 days ago
    Looks like a great textbook. I'm really looking for a book on relational database implementation itself, but I understand this is a topic so large that a single book probably could never cover it all.
  • megakluntjes 1828 days ago
    My Tip: Grab an old cheap copy of "Database Systems: The Complete Book" (by Hector Garcia-Molina, Jeffrey D. Ullman and Jennifer Widom).

    Here is an updated edition: https://www.amazon.com/dp/129202447X/

  • tonetheman 1828 days ago
    Another interesting link on making your own DB: https://cstack.github.io/db_tutorial/
  • commandlinefan 1828 days ago
    Damn, the PDF download is not working but it specifically says not to post the PDF anywhere else...
  • dogsh1t 1828 days ago
    For those that may be interested in "Foundations of Databases", also see C.J. Date's "Introduction to Database Systems" for a diatribe on why Nulls are evil.

    Update: Odd that I'm inexplicably shadowbanned. Guess the moderators at yc are flinching cowards...

  • dogsh1t 1828 days ago
    Are you a moderator here? Strange that I posted a recommendation for this book an hour ago on this thread before discovering I've somehow been shadowbanned. If you are a moderator, congrats for bravely re-purposing my post, if not, you'll never see this anyway...
    • dang 1828 days ago
      Obviously more than one HN user has heard of C.J. Date's work on databases.

      We banned you last year for egregiously violating the site guidelines, and told you why. When banned users create new accounts and show new signs of abusive behavior, we ban the new account too. The reason is obvious, but I'll explain anyway: when a banned user gives us reason to believe they've had a change of heart and will use HN as intended in the future, we're happy to unban them—but when they give us reason to believe the opposite, we do the opposite. Otherwise bannage would be a bit of a joke on a site when people can just create a new account in a few seconds (and many do).

      We detached this comment from https://news.ycombinator.com/item?id=19735388 and marked it off-topic.

      • anitil 1828 days ago
        Thank you for keeping this place civil - I know it musn't be fun.
      • Huwey_McAsshurt 1828 days ago
        "abusive" Care to elaborate on that overstatement, or are your personal histrionics beyond debate?
        • dang 1828 days ago
          In that case I was referring to https://news.ycombinator.com/item?id=17922994. When a new account posts like that and we have evidence linking it to a previously banned account, we usually take it as a sign that the user does not want to use HN as intended. In such cases we extend the previous ban to the new account, for what I'd hope would be obvious reasons.
    • AnaniasAnanas 1828 days ago
      > if not, you'll never see this anyway...

      All users can enable showdead.