41 comments

  • simonw 1286 days ago
    I spent a while looking at this today. It's really interesting.

    It's not based on SQLite at all (except for borrowing the SQLite shell implementation) but it looks very much like SQLite, in particular:

    - It's designed to work as an embedded library, eliminating the network overhead you usually get when talking to a database

    - Each database is a single file on disk

    - It ships as an "amalgamation" build - a single giant C++ file (SQLite is a single giant C file)

    Impressively, if you run "pip install duckdb" it Just Works - you can then "import duckdb" and start using it, with an interface that looks very similar to the sqlite3 module that ships with Python.

    The key reason this exists is that it's a column store, with vectorized operations across columns - making it ideal for analytical workloads. This blog entry has some benchmarks that illustrate how well it works in that regard: https://uwekorn.com/2019/10/19/taking-duckdb-for-a-spin.html

    It's also backed up by some strong computer science. It's by the academic researchers behind MonetDB and includes implementations of a bunch of interesting papers: https://duckdb.org/docs/why_duckdb#standing-on-the-shoulders...

    It's a really interesting piece of software, and unlike many other "new databases" it feels like it fills a very genuine gap in my toolbox.

    • gijzelaerr 1285 days ago
      And MonetDB took some inspiration back from DuckDB to develop an embedded version also:

      https://github.com/MonetDBSolutions/MonetDBe-Python/

      Disclaimer: i'm working on this.

    • webmobdev 1285 days ago
      If everything you said is accurate, somebody should make another pitch for WebSQL ( https://en.wikipedia.org/wiki/Web_SQL_Database ) with a custom version of DuckDb!

      Every browser maker was interested in implementing it but the W3C couldn't go ahead with it because everyone chose to implement it using SQLite, where as W3C required more than one db back-end implementation to move forward.

      • wffurr 1285 days ago
        Having sqlite included in all browsers with standard bindings would have been great.
        • jansommer 1285 days ago
          It would! But it would also be a security nightmare. Exploit SQLite and you could use it in Safari, Chrome and Firefox - everybody could be affected.
          • andyfleming 1285 days ago
            How would that be different from someone editing something in window.localStorage?
            • jansommer 1285 days ago
              If that's all you need, it wouldn't make sense to include SQLite in the browser. If you're going to include a relational database you might as well use it
              • CRConrad 1285 days ago
                I think that meant: How would that be more of a security nightmare than some malicious site editing your browser's whatever.localstorage?
          • hajile 1284 days ago
            Ironically, all the IndexDB implementations were just thin layers on top of sqlite last I checked.
            • WorldMaker 1284 days ago
              The original slow IndexedDB implementations were done that way, but my understanding is that anything fast and "modern" probably isn't in 2020 as most of the browsers moved to their own IndexedDB storage implementations that are not SQLite-based. IndexedDB has very different expected performance characteristics than a pure relational DB (as it is a key/value store with indexes closer to most "NoSQL" DBs than to SQLite) and it did not (and does not) make long term sense to use SQLite below the hood.
      • afiori 1284 days ago
        This could work if there is a subset of SQL where duckDB and SQLite behave almost identically
        • webmobdev 1283 days ago
          I often wondered why someone didn't do it with BerkleyDb too? I know Oracle owns it, but it is open source and someone like Mozilla could have implemented it in their browser just to get the W3C rolling on WebSQL.
          • afiori 1283 days ago
            The problem with WebSQL would remain the same, it would be necessary to specify a subset of SQL as a web standard.
    • Abishek_Muthian 1286 days ago
      > - Each database is a single file on disk

      Are these architecture independent? i.e. If I create the database on x86_64 and move it to ARM64; would it work seamlessly?

      • mytherin 1286 days ago
        One of the authors here - portability of the storage is indeed one of our goals. We test that the same file can be loaded everywhere on different systems, including between ARM and x86.

        I should mention the storage is still very much a work-in-progress. We are actively working on redesigning the storage to add compression and other extra features, meaning the storage format can change and be incompatible between different DuckDB versions. We plan to stabilize the storage format with V1.0, after which every subsequent DuckDB version should be able to read database files created by previous versions.

        • snthpy 1285 days ago
          DuckDB looks very interesting and I'm quite excited to examine it more closely in the next few days!

          I just wanted to add to the discussion that an unchanging file format, or at least a backwards compatible one, is a key feature of sqlite. See for example Richard Hipp's comments here [1] (I think he also mentioned earlier in the talk that the file format has become a limiting factor now in terms of some of the refactoring that they can do). The file format therefore seems likely to be a major factor in the long term success of this project and I am glad to see that you are taking your time before settling on any architecture here.

          Given that you are targeting the data science and analytics space, what are your plans for integration with arrow and the feather file format? From a purely user/developer perspective, arrow's aim of shared memory data structures across different analytics tools, seems like a great goal. I know Wes McKinney and Ursa Labs have also spent quite some time at the file storage part of this, see for example the Feather V2 announcement [2].

          What are your thoughts on the tradeoffs they considered and how do you see the requirements of DuckDB in relation to theirs?

          From the Carnegie Mellon DuckDB talk [3], I saw that you already have a zero-copy reader to the pandas memory data structures, so the vision I have is that DuckDB could be the universal SQL interface to arrow datasets which can then also be shared with more complex ML models. Is that something that we can hope for or are there obstacles to this?

          [1] https://youtu.be/Jib2AmRb_rk?t=3150

          [2] https://ursalabs.org/blog/2020-feather-v2/

          [3] https://www.youtube.com/watch?v=PFUZlNQIndo

          • mytherin 1285 days ago
            Thanks for your detailed reply! As you mentioned - lessons learned by SQLite here are indeed crucial. We are very carefully trying to craft a storage format before fixing it in-place, specifically to try to avoid these problems. Backwards compatibility is a must, but backwards compatibility to a sane format is massively preferable :) No doubt we will end up making some mistakes in hindsight, though.

            We already support reading from the arrow in-memory format [1], therefore it is already possible to use DuckDB to perform SQL-on-Arrow. More support there is definitely possible, and I think this is a very promising direction.

            [1] https://github.com/cwida/duckdb/pull/866

            • snthpy 1284 days ago
              Thanks. That's great and I see you even support the new Arrow streaming model!
          • legg0myegg0 1285 days ago
            It's also possible to return a result set as an Arrow table, so round trip SQL on Arrow queries is possible (Arrow to DuckDB to Arrow)! It's not 100% zero-copy for strings, but it should work pretty well!
        • Abishek_Muthian 1286 days ago
          Thanks for the confirmation on portability, I didn't see Go support in the home page; Is there an official Go driver support planned as well?
          • mytherin 1286 days ago
            Neither Hannes nor me are very experienced with Go, so we are not the most suitable to do that. We do very much welcome external contributions, and someone did make a WIP go driver [1] in the past.

            We also have a sqlite3 compatible C API that can be used to use DuckDB. This is actually how we use the sqlite3 shell: we directly use the sqlite shell but bind it to the duckdb library using this API. It might be possible to use the Go SQLite driver in a similar manner to talk to DuckDB.

            [1] https://github.com/marcboeker/go-duckdb

          • sgt 1285 days ago
            And Dart bindings? That could be really interesting for Flutter developers.
        • StreamBright 1285 days ago
          Do you consider (or maybe you already have) the advanced columnar features? Run length encoding, dictionary encoding, etc. It would be great to see how those perform under these workloads.
          • mytherin 1285 days ago
            Definitely. We have quite detailed plans for compression :) more to follow when we get to it.
            • StreamBright 1285 days ago
              Excellent! Thanks for the update.
        • marton78 1285 days ago
          Couldn't you have used an already existing format for storage, e.g. Apache ORC?
          • mytherin 1285 days ago
            Those formats have a different use case from a database system - they are mostly designed for write once, read many times workloads. As far as I’m aware you cannot update them in an ACID compliant manner, for example, which makes it difficult to use them as backend for an ACID database system.
          • hfmuehleisen 1285 days ago
            Another DuckDB developer here, we support reading table data from Parquet files directly.
          • StreamBright 1285 days ago
            I am a big fan of those formats but decoupling the actual storage features from the ecosystem is not a trivial task. I haven't look at the C++ version of ORC for a while but it used to be incomplete. Other than that, the solutions ORC uses to compress data is pretty amazing.
    • brian_herman 1285 days ago
      Yes, their testing folder is 26mb while their source code of the database is only 5.6mb. It looks like they took the same approach to testing as sqlite also.
    • tmpz22 1286 days ago
      How might one go about using this in a language that’s not supported in the documentation such as golang?
      • mytherin 1286 days ago
        Generally you would need to write bindings for the system. Experimental DuckDB bindings for Go do exist [1], however, they are rather old and might not work anymore.

        We also have JDBC support, which might help for languages that know how to use that protocol. ODBC support is not implemented yet, but also planned.

        We also have a SQLite-compatible C API [2], that can potentially be used to use an existing SQLite driver that uses the SQLite C API.

        [1] https://github.com/marcboeker/go-duckdb

        [2] https://github.com/cwida/duckdb/blob/master/tools/sqlite3_ap...

        • at6 1285 days ago
          Hi, author of the Go bindings here. I've just tested them and they are still working with the latest DuckDB version from master.
          • mytherin 1285 days ago
            Excellent, thanks again for writing them :)
    • haggy 1286 days ago
      I really appreciate the breakdown here but this comment smells like a giant plant for DuckDB.
      • Smerity 1286 days ago
        A giant plant as in someone from their team planning this out and then planting this as a comment..?

        simonw wrote Datasette[1] which makes extensive use of SQLite and acts as a viewer, allowing you to create interactive websites and APIs from arbitrary databases. He'd be a very long term plant and it instead seems far more likely he's interested in the possibilities of DuckDB compatibility within the context of Datasette and other similar projects.

        [1]: https://github.com/simonw/datasette

      • gregsadetsky 1286 days ago
        I don’t think that it’s reasonable to assume that Simon would risk his reputation by coming here to astroturf.
  • ksec 1285 days ago
    I really wish ALL software project / framework / libary could follow the lead here. Instead of your marketing page telling me how world changing awesome tech you have, which really is a consumer marketing strategy. Just do it like DuckDB,

    When to use DuckDB ; When to not use DuckDB

    • LockAndLol 1285 days ago
      I couldn't put my finger on why I felt like I had just read something useful, but that's it: clear, concise and straight-forward information to help you evaluate use.

      Only thing wrong I saw was the detection of the platform. It detected my linux box as "Unix" and proposed I download OSX libs.

    • FridgeSeal 1285 days ago
      We should make a list of technology that does this, because I know Clickhouse also has a reasonably detailed page on when to use it and when to not use it and why. Postgres also has a very nice “do and donts” wiki page.
    • fsloth 1285 days ago
      What an excellent point! I suppose the consumer marketing strategy comes from pure imitation - people look around them how "professional" products discuss themselves - with the aim of selling a product and service - and simply copy that.

      Algorithms textbooks don't create marketing blurbs for linked lists...

    • ComodoHacker 1285 days ago
      Looks like DuckDB doesn't seek investments. So they can afford to be defeloper-first. Not everyone can, though.
    • maigret 1284 days ago
      I'd wish every product would tell what's in for me. Telling that it's world changing doesn't mean it's useful or impactful for me, as consumer or as developer. Usually, usual products brand themselves as "new" for differentiation, but real new products stay away from this to not scare people. Usually you notice when something is really "new" (aka means paradigm change) when the marketing says loud "Don't worry, everything will stay the same as it was, no visible difference for you!".
  • adrianmonk 1286 days ago
    First of all, really neat project.

    There's an issue with the Installation section of the web page, though. Running Chrome on Linux, it says "System detected: Linux", which is right. But under "CLI", it offers me a download of duckdb_cli-osx-amd64.zip.

    Just in case it was just the zip file name that's wrong, I downloaded it, but the duckdb binary inside is a "Mach-O 64-bit x86_64 executable" according to the file command.

    The correct file is available if I click on Other Installations (https://duckdb.org/docs/installation/).

    Also, if I try it on running Firefox on Linux, it says "System detected: UNIX" and gives me the OS X download.

    • mytherin 1286 days ago
      Thanks for reporting that! We will have a look at fixing it :)
      • alexchamberlain 1285 days ago
        FWIW I got the same issue on Android, though offering me the "correct" link wouldn't do me much good either :)
  • viraptor 1286 days ago
    I was hoping from the title that it aims for postgres SQL compatibility, but I can't find it explicitly mentioned in the docs. This really makes me think I really want something like sqlite://memory which completely disregards speed or even persistence. Instead you could say for example "open an in-memory database that behaves like postgres 9" and run your tests against it. With typical fixtures of 10 or so rows, you wouldn't even need a query plans - just naive filte+copy over whole tables for most operations.

    Currently I'm using https://pypi.org/project/testing.postgresql/ which spawns a temporary server per each test suite, which has quite a lot of overhead.

    • giancarlostoro 1286 days ago
      > SQL Parser: We use the PostgreSQL parser that was repackaged as a stand-alone library. The translation to our own parse tree is inspired by Peloton.

      Right at the bottom of this page:

      https://duckdb.org/docs/why_duckdb.html

      • viraptor 1286 days ago
        That's awesome news, thanks for that!
        • giancarlostoro 1286 days ago
          No problem I was curious as well and hunting it down, this project does seem interesting, I'm a huge fan of database systems like SQLite and H2. There's something wonderful about being able to copy a simple file and share it with other developers. This one seems rather interesting as well.
          • dd36 1286 days ago
            That’s what made Excel so popular.
    • mytherin 1286 days ago
      One of the authors of DuckDB here: we use the PostgreSQL parser, and try to be generally compatible with the syntax used by SQLite, Postgres and MySQL. In some cases those are unfortunately mutually exclusive (e.g. null ordering - we have a PRAGMA to change this system-wide). Making the SQL dialect “as compatible as possible” with these systems is very much one of our goals, and if you find any incompatibilities feel free to open an issue :)
      • mst 1285 days ago
        I'd note that when moving from pg to $other the thing that really trips me up isn't the syntax changes, it's the lack of ARRAY and ROWTYPE.

        I'm not sure whether those are in scope for you but it'd be nice if the docs said "to be implemented" or "out of scope" somewhere ... and my apologies in advance if they do and I somehow missed it.

        • mytherin 1285 days ago
          We already have support for LIST and STRUCT (which I think are equivalent to ARRAY and ROW, respectively). There is still some functionality missing there (notably storage, and several functions) - but the base functionality is there and the remainder definitely will be added.
          • mst 1284 days ago
            Could only find a pull request on struct, and nothing at all on 'list' because I just got a billion 'select list' articles. Also nothing in the docs.

            Struct looks 10x uglier but perfectly easy to generate, mind, but the whole thing is non-trivial to find :/

    • jrockway 1286 days ago
      I would just create a per-test-file database (and delete it the next time the test runs). The overhead is very small (compared to booting Postgres) and it works exactly like production because it is exactly production.

      In general, I am not comfortable using an "almost equivalent" database for tests. Every database engine has its quirks, and if you don't know what they are, your program that passes its tests only "almost" works in production, which is annoying.

    • chrisdalke 1286 days ago
      I would also love this, in the past I’ve used the H2 in-memory database for tests with Spring boot applications.

      The Hibernate ORM handles configuring the H2 schema but they’re not fully compatible, so it means I have to be careful to not rely on Postgres-specific features. I generally am not testing database logic, just need a working persistence layer to get the tests running, but an in-memory Postgres implementation would be amazing.

      • viraptor 1286 days ago
        Although it's something I'd love to write, supporting it over a long time as pure FOSS would probably be a sad experience. I wonder what model could be used here. Maybe something like patreon or "sponsor a feature" with an open-source project? FOSS core + proprietary Java / .NET bindings because corps can pay?
        • chrisdalke 1285 days ago
          Yeah, for sure this is the type of niche, high-effort work that would be used extensively by the closed-source world and yet struggle to find adequate support.
      • vips7L 1286 days ago
        IIRC h2 supports a postgres compatibility mode. Though I'm not sure how feature complete it is.

        http://h2database.com/html/features.html

    • c17r 1286 days ago
      Man, I’ve wanted the “compatible SQL engine with only RAM storage” for testing for YEARS. Closest I got was some shenanigans with MSSQL’s LocalDB.
      • mStreamTeam 1286 days ago
        I use docker to spin up new postgresql DBs on the fly
        • GordonS 1285 days ago
          Same here, and I run the same set of migrations that run in production. To he clear, this is only done once per test session, not for individual tests, and the tests are written in such a way that they don't interfere with each other.

          The overhead is actually pretty small, less than 10s. I'd saw too much for unit tests, but we'll within the tolerable range for integration/functional tests. Compared with the time I'd spend hacking together some brittle and unrealistic in-memory alternative, I much prefer to use a real database.

        • toyg 1285 days ago
          I do that too but it’s a noticeable overhead, I guess it wouldn’t be great for mass-testing scenarios.
      • striking 1286 days ago
        RAM FS + fsync off?
        • viraptor 1286 days ago
          Those solutions still have a high overhead. There's acid compliance, serialisation in memory, maintaining indices, and many other layers. Compare it to an ideal testing solution with no initialisation cost and insert being literally: parse the query, add a new entry to a list, done.
          • RantyDave 1286 days ago
            But you want to be testing against something that is as close as possible to the deployment environment. So if that means acid, indices etc, then that's what it is.
            • viraptor 1286 days ago
              You can still do them in a trivial way that works like production. For example: if some column has a unique index, look at all rows and compare the new value. You don't need an actual index for it. (And definitely not a fancy concurrent access btree) For transactions/mvcc you can literally make a copy of everything.
          • Avamander 1285 days ago
            Have you tried postgresql with libeatmydata?
      • xyzzy_plugh 1286 days ago
        I'd build this, but I'm not sure if I'd be able to get anyone to fund it.
    • dkarlovi 1285 days ago
      You shouldn't rely on any "just like X" systems for tests, you should use X exactly, including correct config and version as production.

      What you want is having a Postgres server, but optimized for tests. What I've done is 1. One instance per pipeline 2. Parallelize by having separate databases per process 3. Build schema once 4. Before each test, purge the database (or run tests in transactions so you can just rollback in the end) 5. Run in Docker, use tmpfs volumes for writes (no disk writes)

      It runs fairly quickly.

  • awb 1286 days ago
    Cool project. Really appreciate the "When to use DuckDB" and "When to not use DuckDB" sections.

    Anyone using this in production? Also, any benchmarks vs. other DBs?

    • legg0myegg0 1285 days ago
      I work at a Fortune 100 company and we have this in production for our self-service analytics platform as a part of our data transformation web service. Each web request can do multiple pandas transformations, or spin up it's own DuckDB or SQLite db and execute arbitrary SQL transformations. It fits our use case like a glove and is super fast to/from Pandas.
  • sradman 1285 days ago
    3-month old HN thread [1] with good information from the authors and others.

    “An embeddable column store for data science” (Pandas and dbplyer) would have been a good description at that time but the addition of JDBC/Java support expands the use cases.

    [1] https://news.ycombinator.com/item?id=23287278

  • ryndbfsrw 1285 days ago
    The performance of DuckDB's predecessor, MonetDBLite, was/is stellar. DuckDB, when I tried in 6 months ago, was a bit behind in terms of csv parsing or aggregation by groups and didn't seem much faster than regular sqlite so I didn't really jump at it. Does anyone here know if the performance improved since then and how it compares to MonetDBLite?
    • mytherin 1285 days ago
      We are currently actively working on performance optimization, and have recently finished implementing parallelization. Expect more performance improvements soon :)
  • mastazi 1285 days ago
    I think the title chosen by the poster misses the mark a bit, the main point seems to be that it’s an OLAP DB optimised for analytics.
  • Piisamirotta 1285 days ago
    This is slightly offtopic but do you guys have any good guides/best conventions how to save timeseries data to sqlite/similar? I'm no DB expert so struggling with this. I'm having a dynamic number of measurements that usually have the same timestamp. So one table for timestamp_id, signal_id and value?
  • jononor 1285 days ago
    What would be the recommended way to migrate data from SQLite and DuckDB? Should I dump SQL from SQLite and load in DuckDB? Or use Parquet, or CSV export/import? I think those interested in the tagline "like SQLite but more features" might be interested in a quick howto/guide like that.
  • sterlinm 1285 days ago
    This talk they did with a database research group at CMU may be of interest as well. https://youtu.be/PFUZlNQIndo
  • microtherion 1285 days ago
    A suggestion for the web site: It would be good to mention the license more prominently. I had to click through to Github to find out what it was, and I believe this information would matter to more people than just me.
    • microtherion 1285 days ago
      Correction: As CRConrad, whose comment was inexplicably downvoted, pointed out, the MIT License IS mentioned on the front page (And the Wayback Machine proves it was mentioned before the HN post).
    • CRConrad 1285 days ago
      Dunno if this was put up in reaction to your comment, or you missed it, but looking at their front page right now it says “Permissive MIT License”.
  • jsumrall 1285 days ago
    Very cool project! Seems basically to be a column store and stores everything in a single file.

    That reminds me of LMBD, which is similar in some ways, except bring a key/value store. Very highly recommend checking it out too to see what embeddable DB makes sense for your project.

    http://www.lmdb.tech/doc/

  • chrismorgan 1285 days ago
    What is it that makes a database an OLAP database rather than a generic relational database?

    Expressed otherwise: why would I choose DuckDB over SQLite, or SQLite over DuckDB?

    • FridgeSeal 1285 days ago
      OLAP databases can/are still Relational databases. The difference is that they’re optimised for different workloads.

      SQLite/MySQL/Postgres/MSSQL etc are all OLTP databases whose primary operation is based around operations on single (or few) rows.

      OLAP databases like ClickHouse/DuckDB, Monet, Redshift, etc are optimised for operating on columns and performing operations like bulk aggregations, group-bys, pivots, etc on a large subsets or whole tables.

      If I was recording user purchases/transactions: SQLite.

      If I was aggregating and analysing a batch of data on a machine: DuckDB.

      I read an interesting engineering blog from Spotify(?) where in their data pipeline instead of passing around CSV’s or rows of JSON, passed around SQLite databases: DuckDB would probably be a good fit there.

      • chrismorgan 1285 days ago
        I gather that there may be some differences in functionality offered, but that it’s probably not much. So I presume performance is the biggest area of difference. In that regard: how big are the differences in performance for each category’s representative workloads?
        • legg0myegg0 1285 days ago
          Before the latest optimization, and only using 1 core, vs. SQLite we were seeing 133x performance on a basic group by or join, and about 4x for a pretty complex query. It was roughly even to Pandas in performance, but it can scale to larger than memory data and now it can use multiple cores! As an example, I could build views from 2 Pandas DataFrames with 2 columns and 1 million rows each, join them, and return the 1 million row dataset back to Pandas in 2 seconds vs. 40 seconds with SQLite/SQLAlchemy... Pretty sweet. DuckDB is going to be even faster now I bet!
          • jmatthews 1285 days ago
            Have you compared this to dask from a performance standpoint? That is the larger than memory solution we're currently using for analytics.
          • chrismorgan 1285 days ago
            How about the other way? When will SQLite perform better than DuckDB?
            • FridgeSeal 1285 days ago
              If your workload is lots of singular INSERT or UPDATE’s.
        • greggyb 1285 days ago
          I routinely see order of magnitude or greater improvements moving analytical workloads to compressed, columnstore engines vs rowstore engines.

          The shortest useful description I can give of what constitutes an analytical workloads is that it is table-scan-heavy, and often concerned with a small subset of columns in a table.

        • FridgeSeal 1285 days ago
          To add to the other comment (albeit with different DB’s because I haven’t used DuckDB yet) comparing MSSQL and Clickhouse: similar hardware and same dataset, CH responds in ~30-60ms, MSSQL 300-400+ ms for simpler queries. 90-180ms vs several seconds, up to about 30s for more complex queries.

          I could add more indices to MSSQL and do all sorts of query optimisations, but out of the box the OLAP wins hands down for that workload.

  • 3np 1285 days ago
    Please think twice before choosing an embeddable database for hostable software. I see so many open-source projects that get hampered own the line because they chose sqlite and there's now a SPoF that would require a significant rewrite to eliminate - especially as sqlite depending on configuration either corrupts or becomes extremely slow when stored on distributed/network file systems. I assume this goes for duckdb as well.

    Embeddable databases absolutely have legit use-cases, but I see them being used for projects where it'd have saved so much trouble to use a proper ORM (perhaps with sqlite support!) from the beginning.

    • brunoluiz 1285 days ago
      Not sure where an ORM would help in this case. If anything, it might get in the way sometimes (in terms of performance), as some queries might not be well optimised.

      But yeah, people shouldn't just "follow the hype". They should always ask why would use this and not that.

      • 737maxtw 1285 days ago
        I think ORM might have been brought up because in many embedded DBs there is some level of language native bindings that eliminate the need for an ORM. A couple of examples that come to mind would be mnesia (KV store rather than SQL but still an example) or some of the SQLite providers for c# which give you their own mini orm (that you will have to refactor if you change out dbs instead of using sqlite with an orm from the get go)

        My preference for ORMs is still on the 'give me a SQL DSL and map my objects' side. Let me use SQL as SQL, give me convenience in interfacing with the DB. Complex features like object tracking tend to be nice up front but require extra diligence down the line to avoid pitfalls, and tend to be risky from a standpoint of composability.

        Sorry. Almost went on an ORM rant there.

      • GordonS 1285 days ago
        I guess the thought behind using an ORM here was as an abstraction over the database/storage layer, so theoretically you could change switch out the database easily.

        In practice though, real projects often have to sidestep the ORM and handwrite SQL, sometimes using database-specific features.

      • 3np 1285 days ago
        What I meant more specifically was a database-independent RBDMS ORM (e.g EF Core, sqlalchemy) - for cases when there aren't resources or need for separate db driver implementations.
    • magicalhippo 1285 days ago
      > especially as sqlite depending on configuration either corrupts or becomes extremely slow when stored on distributed/network file systems

      I had Plex running via Docker, and the Plex configuration/metadata stored via NFS on my NAS. For quite a while seeking would be intermittently slow, or plain crash the Plex server requiring a restart of the container. Other weird issues as well.

      Finally I had enough and did some debugging and searching, and found that Plex uses sqlite in a way that is not compatible with networked storage. So I moved it to an iSCSI volume, mounted as a regular drive on the Docker host and it's been rock solid since.

      But yeah, didn't expect that.

      • mceachen 1285 days ago
        Yup. I was surprised and did the same sleuthing for PhotoStructure when libraries are stored on remote filesystems. I had to add a "local db replica" mode to work around this issue.

        When running PhotoStructure under docker, it turned out to be simpler to just assume the DB is stored on a remote filesystem, and enable this mode by default. When running outside a container I can detect if the host volume is remote automatically and set the mode appropriately.

        The reason why, other than missing file locking semantics, seems to be from WAL mode's use of an SHM file. Unsurprisingly, memory-mapped files don't work well from remote shares.

      • defnotashton2 1285 days ago
        You just saved me a lot of time. In the middle of migrating homelab from day from docker/swarm to kube and had not thought of this. Ty.
  • davidgaleano 1285 days ago
    Can multiple separate clients access the same single-file database at the same time?
    • hfmuehleisen 1285 days ago
      One of the DuckDB authors here. The answer is yes, but only in read-only mode at the same time.
      • davidgaleano 1285 days ago
        What about one single client writing and multiple clients in read-only mode? Any problems with storing the file on network storage? Basically, how far can you push it before it is better to just use PostgreSQL?
  • torrance 1286 days ago
    Strict typing has been the one big feature missing from SQLite, that this presumably brings.
    • gralx 1286 days ago
      Foreign keys enabled by default or by configuration setting would be another big feature missing from SQLite3, but I couldn't find any mention of foreign keys in DuckDB's documentation.
      • mytherin 1286 days ago
        Foreign keys are not supported yet in DuckDB currently, but we do plan to add support for them in the future. If you have a need for them, feel free to post a comment on the issue. We tend to prioritize implementing features that many people ask for :)
      • joppy 1286 days ago
        SQLite has foreign keys, but they are disabled by default for backwards-compatibility reasons. https://sqlite.org/foreignkeys.html
        • gralx 1286 days ago
          And it's not possible to enable them by default - there is no existing configuration setting to override that design choice. They hope to include the feature in SQLite4, as I understand it.
      • gigatexal 1285 days ago
        OLAP databases seem to omit ForeignKeys. Redshift does this for example. They don’t exist in BigQuery (but BQ pushes you towards a single table design anyway so moot point I guess).
      • Master_Odin 1286 days ago
        It doesn't look like it provides FK looking at the create table syntax. Just PKs, unique, and check on an expression similar to sqlite.
        • cryptonector 1286 days ago
          What? SQLite3 absolutely has FKs.
          • gralx 1286 days ago
            You just have to write

                PRAGMA foreign_keys = on;
            
            each and every single time you want to do an operation that requires an FK constraint, like `INSERT`.
            • mceachen 1285 days ago
              s/each and every single time/once when you instatiate a new database connection/

              ftfy.

              You really shouldn't be needing to create new connections often. Really only once for any given process.

              There are some 20 PRAGMAs that PhotoStructure sets for library databases, but it's only at process startup, and it takes a couple millis to run them all. It's wonderful to be able to configure stuff so easily.

              I also think the design decision for backward comparability (which meant this default is false) is absolutely defensible.

              • gralx 1285 days ago
                I'll defer to anyone with expertise.

                I'm modelling a program in Python on a work computer without administrative privileges, which means it has to stay in user space. Because I haven't decided on a final implementation language, I don't want to commit to any one language's memory model. Vanilla SQL is my solution there.

                My specification limits procedures to SQL as much as possible, using the implementation language only for user interaction with and computations on the database not possible in standard SQL. It minimizes use of the heap, which requires opening connections to the database frequently, but this is practical given the low performance requirements of the application.

                SQLite3 satisfies my spec except for the extra diligence required first to implement the foreign keys PRAGMA in order to maintain referential integrity, and second to remove it if an RDBMS closer to the SQL standard were eventually chosen.

                In a nutshell, my constraints are user space, limiting operations to standard SQL as much as possible, referential integrity, and minimal implementation language dependencies besides having an available SQL API. Given those constraints, would you recommend a different RDBMS? Or would you agree SQLite3 is my least worst option?

    • hfmuehleisen 1285 days ago
      Yes, DuckDB has strict typing.
  • KingOfCoders 1285 days ago
    Sorry for my perhaps stupid question,

    has anyone tried this with ESP32/Arduino? I would need an anlytical DB for an hiking application (Temperatur/Pressure data etc.)

    • Avamander 1285 days ago
      Ooh, this sounds like a fun use-case but a power loss sounds super painful.
      • KingOfCoders 1285 days ago
        Would also store on MicroSD.
        • jimsmart 1284 days ago
          That’s part of the problem. Power loss during SD card write will, more likely than not, cause corruption - FAT file systems do not have journaling.
          • KingOfCoders 1284 days ago
            Well yes. But then it's just hiking not emergency services, SD cards also do break, so I would need dual slot etc.
  • aabbcc1241 1285 days ago
    Simple, embedded database deserve more attention, especially those with native language support, e.g. map,filter,reduce methods on wrapped data that behave similar to javascript array; and support to nested data structure (like auto join and normalize).

    (I made one using Proxy API, called ref-db. The cons is it require manual indexing to speed up lookup by non-primary key)

  • xiaodai 1286 days ago
    Say I want to fix logistic Regression. Can I do that in database? Or have to extract it out as a dataframe? Cna I stream the data?
    • winrid 1286 days ago
      If you want to do linear regression aggregations with any DB, one thing you can do is store the coifficents and then aggregate them on request. You sacrafice some accuracy for speed.
  • cmrdporcupine 1285 days ago
    I love/hate it when I see pieces of software come out that would solve problems I had 10 or 15 years ago and had imperfect solutions for. This would have been great for a software project for call centre analytics I was employed to work on back in the mid-2000s. I believe I used HSQL (embedded Java DB) but it was really not the right tool. At the time I looked at MonetDB but its architecture wasn't right for what I was doing. This would have fit the niche perfectly.

    This is really neat. Wish I had a problem I could use it for now.

    • tracker1 1284 days ago
      I used embedded firebird for a couple projects in the early/mid 2000's myself. Wish it were a bit more popular.
  • ulanmaasan 1285 days ago
    So I tried a simple query in the live demo and it seems that the WHERE clause isn't filtering anything. Am I missing something?

      SELECT * FROM part WHERE p_size = 7;
    • mytherin 1285 days ago
      Looks like a bug, likely related to an optimizer since adding count(*) does produce the correct result. I will have a better look after the weekend. The demo runs a relatively old version so it’s possible it has already been fixed in the meantime. Thanks for the report!
    • mytherin 1284 days ago
      Should be fixed now. This was an old bug related to filter pushdown into scans. The demo now runs the latest version of the system. Thanks again for the report!
  • agustif 1285 days ago
    I was discussing with a coworker what would be the best option, embbbedabble or service for a database runnning for a Windows GUI app.

    My bet was on embeddable, so SQLite, is that the right choice?

    It seeems a nightmarish scenario to have 500 postgres instances running on client machines and being accessible from the internet albeit with security enabled and such...

    am i right?

    • alecco 1285 days ago
      Countless mobile apps use SQLite.
      • agustif 1285 days ago
        In this case though I'm talking of a COBOL backend very old windows app.

        We want to move data from COBOL files to a database, we're making our own migration tools/scripts in python (In order to mainly rename weird name table/column names to sanish ones), and we can target any SQL flavour/database we want to insert that data at.. the question is what's the better call here...

        SQLite seems pretty good enough, at least until we can move our whole app to a more web-friendly environment at which point something like postgres might make more sense? but that could take a couple years... (we're a small team/company in a very specific niche/locale)

        • 737maxtw 1285 days ago
          SQLite works well for app files. Bentley uses it for all sorts of stuff and I had way fewer corrupt files in their format than Autodesks. :)

          Do note that sqlite itself doesn't have the kindest handling of date/times, you are at the mercy of your data provider (unless you are going native) Especially when timezones are involved.

          If it's running on windows machines you could consider Sql Server LocalDB (runs in user mode, i believe you are allowed to distribute as such in an app). But it doesn't sound like that would make sense from what I've read for your case and tech stack.

          Sqlite is a good start because even if you are wrong, it should be pretty dang easy to migrate to whatever is right. :)

          • agustif 1285 days ago
            First of all thank you for all this info.

            Actually SQL Server was one of the other contendents, but I disregard it due to it not being open source and having some hard caps in the free license (I think it's 10GB)...

            But, I didn't knew about SQLserver local db, so we will look into that!

  • kfk 1285 days ago
    How does duckdb compare to aws athena or spectrum? Both services offer sql directly on parquet files. I guess you would say duckdb is not distributed but if you create one on time when the user needs to run the sql on the parquet you’d be in a similar use case?
  • stormdennis 1285 days ago
    So could the DuckDB, Postgresql combination function in a manner analogous to Firebird which has both embedded and server modes. I'd find that really exciting
  • akkishore 1285 days ago
    Can we create a view over multiple parquet files?
  • dcl 1284 days ago
    I will definitely look at using this for some personal projects. The R API is greatly appreciated. Thanks for alerting me to it.
  • rilut 1284 days ago
    Really want to use this for EDA. I hope it has support for client apps like DBeaver or something
  • pknerd 1285 days ago
    Any GUI Client to explore data?
  • stormdennis 1285 days ago
    This sounds great. I wonder if there are plans for an ODBC driver at any point?
  • zmmmmm 1285 days ago
    Amazing work!

    It was asking a lot, but still :-( that JSONB and range datatypes are missing.

  • threcius 1286 days ago
    Good name.
  • xgenecloud 1285 days ago
    Kudos, much needed. Neat documentation. Would love to try with JS APIs.
  • unnameduser1 1285 days ago
    Can someone recommend whats the best way to connect to a memory DuckDB instance from php(php_fpm) and get return in usable format?

    E.g. would be used for offline statistical analysis

    Alterntively, any way to export cli sql results as csv?

  • secondcoming 1285 days ago
    Is this backward compatible with SQLite?
    • jononor 1285 days ago
      The on-disk format is different it seems. But there is an implementation of the SQLite C API for interacting with the database. So it depends on which level / what purpose you want compatibility.
  • mraza007 1285 days ago
    Just curious about these Do you think embeddable databases are common in car navigation systems
  • RileyJames 1285 days ago
    This sounds like a solution I’d use. Embedded. Simplifies simple sites.
  • apexol 1285 days ago
    Looks interesting! Are there any .NET Core specific drivers available?
  • sovasovad11 1285 days ago
    Neat project..does it support fulltext queries?
  • Sm0keDev 1285 days ago
    Amazing ...