If you're super distrustful you could argue that you should never store a timestamp with a signup event, because it could potentially reveal a user's identity...
Here's a crazy thought, what if you would do this:
1. You fire off a default first event, say “init"
On the server you generate a PGP key pair, store the private key with the init-event and return the public key
2. Second event (first real event) is fired by the website owner and encrypted with the PGP public key from 1
3. On the server you try decrypt event #2 with all available active private keys (stored with init-events)
4. Once a solution is found you link the 2nd event to the 1st event, delete the private key of the 1st event, generate a new PGP key pair, store private key with 2nd event, and return the new public key
5. Third event is encrypted with the public key of 4 and...
No need to store timestamps and all traffic is encrypted, now how to make step 3 fast?
And what would it be from a privacy perspective if I set a cookie for 90 days. I can't link this to any personal information and my customers will only see my tool where they can see the conversions (they don't get access to the "link" in the tables above).
I don't want to use a session cookie with an ID to link all events. I don't want any ID because I could potentially link those ID together in the back end based on IP (I don't, but I want people not to have to trust me). I want to make sure I don't get any data that my system could use wrong.
I think maybe a good way to go is a compromise - Since you're already taking efforts to protect privacy without needing to trust you, thats already a good start. But maybe you need some kind of ID to tie behavior together, so you do record one temporarily, until you've processed it into an aggregate (anonymized individual behaviors)
Basically, train a machine learning model on the data of invididuals. You don't want to overfit or that could be de-anonymizable, but a slightly underfit model could capture most of the important patterns, while throwing out most of any identifying aspects.
The hard part then becomes finding a way to demonstrate this actually is happening so that you can be trusted. Unfortunately I can't think of a provable way, since you pretty much either can track users by IDs or not. And if you do..then trust has to be assumed
I think that the problem is, while you have near-total control over the information you collect, and can carefully consider its interactions, you have no control over the interaction of your information with other publicly available information. For example, the famous AOL de-anonymisation (https://techcrunch.com/2006/08/06/aol-proudly-releases-massi...) did not (I think it is accurate to say) rely on any metadata attached to the queries, only to the queries themselves.
> The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
I don't think I have a similar issue with page views of one website of one session. I strip all query params and only save the hostname and path of the URL. I think it's nearly impossible to ever link that to a user. Maybe if you have very little amount of users, but then still you don't get personal info.
We are going way off topic here, too bad there is not even one answer to my question:
> Do you think this is acceptable from a privacy perspective?
But back to your point. Query params contain usually tokens, search queries, and id's. This is not so much the case for paths. I think you agree with that. But indeed, paths can have sensitive information too.
How would you prevent that data to be sent to my server?