Decentralized decision making processes are difficult, and this (through a 'human example story') shows the assumptions of state that can make in person processes seem easier than they actually are.
The solution of recording relativistic epochs for each voting member (name and 'year' (version)) seems like part of the full solution. An enumeration of current members (and their version) as well as a careful protocol for members to be modified (including add, remove, and increment version) seems to be required.
The race described here sounds pretty similar to a type of cache consistency race: Client X reads a complex value from a database, does some calculation, and writes the result in a cache. Client Y updates something that should invalidate the calculation. When Client Y tries to invalidate the cache, they don't yet see Client X's entry, which will then appear slightly later. Now a stale value persists in the cache.
The assumption is that after the invalidate done by Y, the cache cannot contain a value that missed Y's update is incorrect.
Byzantine Paxos ;)
It does make me wonder how much failure do we design to tolerate. Many modern systems are designed for five nines availability - 99.999% of the time.
A partition like the one described here is rather unlikely and having more data on every message might pretty inefficient for the 99.999%+ of cases (not every failures will result in a partition like this)
The solution of recording relativistic epochs for each voting member (name and 'year' (version)) seems like part of the full solution. An enumeration of current members (and their version) as well as a careful protocol for members to be modified (including add, remove, and increment version) seems to be required.
The assumption is that after the invalidate done by Y, the cache cannot contain a value that missed Y's update is incorrect.