What is a visitor in a privacy-focused analytics tool? Can we have a returning visitor when it is not tied to identity over time and across visits? How can we even interpret these numbers?
Let's summarize the ladder of identity on the web: Logged in user > Persistent identity (e.g. cookie) > Ephemeral identity (e.g. 24h hash) > no identity.
Privacy-focused tools seem to provide the last ones while promoting the same advantage as the first ones! Coolness over effectiveness?
What's all the fuss really about?
It's worth noting that the question is not about all the different kinds of statistics these tools can provide without relying on a cookie but about the legitimacy and the relevance of the visitors' related statistics (e.g., new, returning, etc.).
Personally I have a background in metrics and reporting tools. I've been tasked to find and explain 0.3% differences between two reports or have cookie related (or timezone related) code getting reviewed by other engineers at previous companies. With millions of dollar at stake, Powerpoint meetings or investor or financial documention it makes sense to question every definition and the whole data pipeline.
> Coolness over effectiveness? What's all the fuss really about?
Ok, I admit, there's a bit of coolness factor. Paying $25/month to a small bootstrapped company (with a great podcast) beats feeding data to an ever growing global player (Google).
The privacy part, compared to other tools, comes from the fact that it's self-hosted, so no data is shared with 3rd parties, which is the best way to achieve data privacy. You can detect returning visitors in various way, an option in userTrack is to store the hash of IP + user-agent string of the visitor. It is not 100% accurate and if the visitor updates his browser or his IP changes it will be considered to be a new user. If the user is logged-in, you can tag each session with his username or user ID.
Also keep in mind that fully persistent identities rarely exist (unless the user is logged-in), as the cookies can be cleared at any point or simply be blocked/reset by the browser on each visit.
PS: I do agree that many privacy-focused tools are also not really private, because they still are a 3rd-party aggregating data across the web.
[0]: https://www.usertrack.net/
Nowadays, privacy is a pretty convoluted word. I like to consider it from the point of view of the most impacted actor, the end-user. And from his perspective, you remain a third party as long as his data is concerned. The sole fact that the tool is self-hosted cannot be a guarantee of privacy. Though, it's more likely to achieve stronger privacy if the number of third parties is small.
Therefore, with your tool:
- Either you have an identity (i.e., hash(x,y,z)) that is persisted over time (notwithstanding its accuracy).
- Or you have an identity that is forgotten after a certain period of time (e.g., 24h).
In the first case, it cannot be considered a privacy-focused tool, and in the second case, it has the same shortcomings I've described in the original question.
---
It is crucial to note that the question is about the quality of users' metrics in privacy-focused tools.
There ain't no such thing as a free lunch. End-user's privacy comes at the expense of actionable metrics. Furthermore, at best, people using these tools are not aware of the shortcomings and the risk of misleading numbers. At worse, these very concerns are kept away in the marketing speeches of these tools to minimize their real impact.
Above is an opinion, and I would like to debate about it. About my possible misunderstanding of these tools. About possible solutions.
I do think there is a HUGE difference between centralizing data from many users across the web and sharing it with 3rd parties for marketing or intelligence goals AND simply tracking app stats in order to improve the user experience. The main privacy issue about 3rd party cookies and tracking is that a specific user can be targeted by entity B based on the actions he did on a property owned by a different entity A, without the user ever coming in direct contact with B.
I understand that your main concern is not privacy, but the usefulness of such privacy-friendly stats. Analytics is a very complex space and even if you collect all the data in the world, usually the best decisions can not be derived directly just from the data itself. Also, for in most cases companies are not looking to spend a vast amount of time and resources in order to find the "best" way to go forward, instead they want to collect just enough data that helps them improve there business in some way.
So, for example, if just by approximatively knowing the top referring domain for converting users you can spend more on marketing towards those specific users and increasing your sales, the analytics tool already proved to be useful.
I believe that most of those basic privacy-focused tools already clearly state that they are a "simple" alternative and that they offer only basic stats. To be honest, in many cases, most users have no idea how to use Google Analytics to drill down into data and take relevant actions, so all the data being collected is many times never used.
To sum it up: yes, privacy-friendly tools might offer fewer stats but for the majority of the users using those platforms those basic stats are enough.
> I do think there is a HUGE difference...
This is off-topic and a matter of perspective.
> I understand that your main concern is not privacy, but the usefulness of such privacy-friendly stats.
It's not about my privacy concerns. It's about the purpose, legitimacy, and effectiveness of a feature. Showing a count of unique/returning visitors is simply a lie; privacy friendliness apart.
> Analytics is a very complex space...
This is off-topic and a matter of approach and experience.
> Privacy-focused tools already clearly state that they are a "simple" alternative and that they offer only basic stats.
This is the central subject of the discussion. They all promote simplicity and coolness. However, simple != erroneous. You can be simple and provide correct information or not provide them at all. Otherwise, there is a problem of ethics and liability.
How can we trust tools supposed to handle our online privacy while at the same time the same tools are pretending something that is not true?
And please, don't get me wrong. It's all about the metric on users. In effect, the same tools without these particular metrics may have an audience striving for such simplicity. However, with these invalid metrics baked in, it seems to be a more opportunistic move than a privacy-focused one.
> simple != erroneous
What type of analytics are 100% accurate? I am almost certain almost ANY analytics tool is more accurate than the most popular Google Analytics (mostly because it is blocked by adblockers and being so popular it's used as a spam medium). So, I wouldn't really bash any of the simple analytics for being erroneous when GA is the "most" erroneous out of all, yet it is still the most used.
I still believe my initial point stands, in that having somewhat accurate analytics is usually good enough to take good business decisions in most cases, and if you can do that while being more privacy-friendly, why not?
It's not about accuracy. In the original question, I've depicted a simple ladder of identity, and we both agree that the most accurate counter would be the counter of logged-in users. Anything after is by definition less accurate. In addition, the argument about adblockers is a matter of popularity and time, take, for example, the many contribs to open blocklist projects trying to blacklist the most popular privacy-focused tools. How ironic! Finally, I won't go into the GA bashing game at least out of respect for those who work on and with it, every day; an because it's out of scope here.
So what can we compare? How can we conclude? Let's focus on the very constituent of a unique/returning user metric.
It's all about semantics. We must first agree on semantic and then compare tools. And we both know what is the semantic of a unique/returning user and what cannot be.
Clearly, there is no privacy-focused tool, to the extent of my knowledge, that can or do provide a unique/returning users metric. Though, the problem is that all of them advertise the opposite, sometimes event viciously.
Any other discussion going beyond the semantic feels like I wanted orange juice; still, you provided me with a blend containing no orange while trying to either convince me that orange isn't that good or misleading me by advertising loudly that the blend contains orange.
The metric is pretty simple, the number of unique visitors. And the setup is simple, too, with no identity whatsoever. But are they compatible? This is the question.
Your approach is interesting. You take a stance: let's have precise metrics when users consent and don't consider those who didn't give their consent. This approach relies on the underlying assumption that those who do not give their consent represent a minority and thus don't have a perceivable impact on the overall statistics. And this very assumption may be false.