Temporal Tables PostgreSQL Extension

(github.com)

153 points | by jka 1113 days ago

14 comments

Codesleuth 1111 days ago
I was part of a team at NearForm using this for a project on an EC2 instance. In order to move to AWS RDS we had to recreate the functionality of temporal_tables as a PostgreSQL function, rather than extension.
When we switched, we found that although there were minor bugs, we didn't have any noticeable loss of performance and we have used it ever since for many projects.
https://github.com/nearform/temporal_tables
If you're also limited by cloud services and the extensions limitations, this is a great solution.
[-]
- take-five 1111 days ago
  We also started with "temporal_tables" C-extension, and also had to switch to NearForm's version written in PL/pgsql to be able to migrate our database to RDS.
  Didn't notice any performance drop.
  It also much easier to install (just run an SQL-script, no need to restart the DB) and it works everywhere, on all PostgreSQL versions. Great piece of work!
- antman 1111 days ago
  I see you have a “nochecks” version, what would be its use case?
  [-]
  - luto 1111 days ago
    It says right there in the readme: performance.
    > This version is 2x faster than the normal one, but more dangerous and prone to errors.
adamfeldman 1111 days ago
See also: recent threads on bitemporal history:
Bitemporal history https://news.ycombinator.com/item?id=26735260
Crux: Open-source document database with bi-temporal graph queries https://news.ycombinator.com/item?id=23493163
schlowmo 1111 days ago
For everyone using MariaDB instead of PostgreSQL, a similiar feature called "System-Versioned Tables" exists since MariaDB 10.3 [0].
We're using this feature in production as part of a financial forecast application. This makes it possible to forecast spendings/revenues based on data from different points in time. For example the users can generate a forecast in december based on data as valid as in june and can compare how good their forecast model predicted the reality at the end of the year.
This works quite well but there are some pain points:
1. System-period (transaction time) data is immutable and can't be changed with queries. If "faulty" (in the sense of "technically correct" but not reflecting reality) data is imported to the database, the database will return this faulty data for this "as of" date no matter what and there is no easy way to drop this data later.
2. This is the main reason why logical backups (mysqldump) for historical data are not possible, since there is no way to write that data back with queries. This should be possible in the future but the correpondig (critical) bug ticket wasn't solved in almost 3 years.[1] You can use mariabackup (physical backups of the underlying file-system storage) instead but this requires console-access to the database server which isn't always allowed in corporate environments.
3. I'm not aware of any ORM framework which supports System-versioned tables (neither MariaDB nor PostgreSQL) so you have to write your own solution on top of an existing ORM or use raw queries.
[0] https://mariadb.com/kb/en/system-versioned-tables/ [1] https://jira.mariadb.org/browse/MDEV-16029
mildbyte 1111 days ago
We're building something that partially overlaps with this at Splitgraph [0] (co-founder here). Instead of bitemporal tables, we're using columnar storage (cstore_fdw) + delta compression as a storage backend and support Git-like operations (commit, checkout, etc...). We also let you build datasets with a Dockerfile-like language as well as share them with other Splitgraph peers or the public Splitgraph catalog.
[0] https://www.splitgraph.com/docs/getting-started/introduction
[-]
- jka 1111 days ago
  Very nice work, thanks for sharing!
  And a slightly off-topic observation: excellent choice of parsimonious as a grammar parser :)
- jerryluc 1111 days ago
  edit: wrong thread.
phonon 1111 days ago
How does this compare to https://github.com/scalegenius/pg_bitemporal ?
neolog 1111 days ago
How does this compare to https://github.com/xocolatl/periods ?
hmsimha 1111 days ago
I'm currently getting started with TimescaleDB for storing lots of time-series data (all the public data from cryptocurrency exchanges).
Turns out, there's a bit of weird stuff required in order to collect orderbook data, which involves recording updates, but also the 'full state' (or at least as much as I can get) periodically. The current plan is to add an interface between the DB and the consumers that makes it work a bit like a temporal table (if we're storing the full state at the beginning of every hour, the previous state along with any updates should expire at the end of the hour).
If postgres could also leverage the temporal tables extension we might prevent needing to store a lot of entries that 'invalidate' previous entries, but then I imagine there's internal overhead with the temporal tables extension as it manages expiry time.
I'm wondering, are there any notes on using the temporal tables extension with the timescaledb extension? Wondering if that's something worth investigating.
arecurrence 1111 days ago
I've used this a couple times both with the regular implementation and the version that can run in AWS managed databases.
It is surprisingly useful. Many use cases benefit from having a full history that can be instantiated at any time. In particular, you can apply migrations to these tables which means your old data is still in the correct format when you need it.
cpursley 1111 days ago
Interesting, looks like this could be used to build an activity / audit log.
Can any of you folks who have implemented something like that in Postgres recommend best practices, how to model it, tools, etc?
runeks 1111 days ago
What problem does this solve?
[-]
- schlowmo 1111 days ago
  (Disclaimer: I'm coming from MariaDBs temporal table feature but this is basically the same in PostgreSQL)
  Temporal tables adding an additional "time axis" to SQL databases. A "valid from" and "valid to" field is added to each row and the SQL syntax is extended for allowing two new types of queries:
  1. Query the data as of a specific point in time. E.g. "show all customers as of April 9th 2021". This not only limits WHICH customers you see, but also select the exact state of each customer as valid at this point in time. For example if the adress of a customer changed on April 10th, you will get the old address.
  2. Query all versions of a specific entity. This makes tracking of changes or time series analysis possible.
  This feature is transparent, so if you use "normal" SQL queries, you keep getting the current state of the data. I don't know if this is true for this PostgreSQL extension, but MariaDB even hides the "valid from" and "valid to" columns and only show them when you explicitly select them.
  Additionally there a two types of "valid from"/"valid to" data, which can exist at the same time:
  1. Application period: Those validity dates represent a period in the real world. If we stay at the customer address example, they can express "a customer informed me, that their addess will change on April 15th, so the current address is valid until then and the new address is valid from then".
  2. System period (also called transaction time): Those validity dates are a kind of technical period. They represent when data changed in the database, e.g. the exact point in time when an UPDATE query was executed.
  [-]
  - magicalhippo 1111 days ago
    > Those validity dates represent a period in the real world.
    We have tons of that at work, so this feature would have been nice. Currency exchange rates, dozens of official code lists (including countries!), VAT registration status of companies.
    If a user makes a change to a declaration submitted at an earlier date, then the data from the original submission date must be used, so we need to keep all this around.
    Alas, not using PostgreSQL.
    [-]
    - girvo 1111 days ago
      MariaDB and SQL Server both have equivalent features, for what it’s worth.
      [-]
      - magicalhippo 1111 days ago
        Thanks, nice to know!
        I know we have plans to start adding SQL Server support this year (due to customer demand), and we might end up doing a full transition. Stuff like this certainly doesn't make that case weaker.
- wenc 1111 days ago
  In short, it lets you time-travel to retrieve prior states in tabular data.
  One useful example is continuous forecasting. Every forecast is done with a certain moving window of historical training data, and each time the forecast is run, this training data is different.
  In order to evaluate/backtest the performance of the forecasting algorithm over time, you need to be able to retrieve all the previous training datasets from the database, calculate a performance metric, and then aggregate. This lets you do that on a continuous basis in the database itself, without dumping each training set to secondary storage each time.
  Also, in many real life applications, historical data isn’t immutable — data corrections can arrive after the fact that will change state, so it’s usually not sufficient to just use a simple WHERE clause to retrieve a particular time range. This is where time travel becomes really useful.
- ComodoHacker 1111 days ago
  System period and versioning can solve several problems, like change auditing, failures investigation, human errors correction.
  App period solves the problem of https://en.wikipedia.org/wiki/Slowly_changing_dimension
mch82 1111 days ago
Is this extension, or the function recommended by Codesleuth, a good solution for creating “diffs” that summarize how the values of a record have changed from time A to time B?
Codesleuth’s comment: https://news.ycombinator.com/item?id=26768220
Tajnymag 1111 days ago
That looks fantastic. I'll definitely give it a go for a test project. It could save quite a lot of time with creating versioning logic.
beeforpork 1111 days ago
Why do you need the versioning_trigger on 'INSERT' also? There is no original row to archive in this case, or what does this do?
[-]
- karavelov 1111 days ago
  I guess they have to populate the time range column.
- jerryluc 1111 days ago
  Probably because of:
```
  ON CONFLICT DO UPDATE
```
  See: https://www.postgresql.org/docs/9.5/sql-insert.html
starfox64_ 1111 days ago
Does anybody know of an equivalent for MongoDB?