Ask HN: Privacy-focused or useless analytics tools?

The number of privacy-focused analytics tools is ever-growing. And the perfect balance between freshness and promoted evilness of legacy tools is appealing. Rightfully, all of them try to mitigate identity to promote privacy. And while it makes sense, at first sight, questions arise around the usefulness of the resulting statistics.

What is a visitor in a privacy-focused analytics tool? Can we have a returning visitor when it is not tied to identity over time and across visits? How can we even interpret these numbers?

Let's summarize the ladder of identity on the web: Logged in user > Persistent identity (e.g. cookie) > Ephemeral identity (e.g. 24h hash) > no identity.

Privacy-focused tools seem to provide the last ones while promoting the same advantage as the first ones! Coolness over effectiveness?

What's all the fuss really about?

It's worth noting that the question is not about all the different kinds of statistics these tools can provide without relying on a cookie but about the legitimacy and the relevance of the visitors' related statistics (e.g., new, returning, etc.).

8 points | by fsenart 1084 days ago

3 comments

  • mtmail 1084 days ago
    As a small company using one of the providers (https://usefathom.com/) it offers us choice. We don't want to build and run our own analytics and we don't need the vast features of a Google Analytics or similar either. Even without the privacy benefit the set metrics we get now acceptable. We don't have, e.g. a growth manager who needs to report these, investors who question the or make decisions when one metric looks off. Legitimacy didn't come up once during our migration and we didn't look too closely to compare to our previous provider (Google Analytics). It's not the core of your questions but there's usecases where less metrics, less accuracy is good enough.

    Personally I have a background in metrics and reporting tools. I've been tasked to find and explain 0.3% differences between two reports or have cookie related (or timezone related) code getting reviewed by other engineers at previous companies. With millions of dollar at stake, Powerpoint meetings or investor or financial documention it makes sense to question every definition and the whole data pipeline.

    > Coolness over effectiveness? What's all the fuss really about?

    Ok, I admit, there's a bit of coolness factor. Paying $25/month to a small bootstrapped company (with a great podcast) beats feeding data to an ever growing global player (Google).

    • fsenart 1084 days ago
      Thank you so much for your thorough answer. I can totally understand what you say about helping seeds grow and the "analytics fatigue" when it comes to legacy tools. I also understand that you are not much interested in how your visitor base or visit cohorts evolve. Beyond, please share if you have any hints about how you may estimate these metrics with your current tools.
  • XCSme 1083 days ago
    I am creating a "privacy-focused" analytics tool[0], that actually provides useful stats.

    The privacy part, compared to other tools, comes from the fact that it's self-hosted, so no data is shared with 3rd parties, which is the best way to achieve data privacy. You can detect returning visitors in various way, an option in userTrack is to store the hash of IP + user-agent string of the visitor. It is not 100% accurate and if the visitor updates his browser or his IP changes it will be considered to be a new user. If the user is logged-in, you can tag each session with his username or user ID.

    Also keep in mind that fully persistent identities rarely exist (unless the user is logged-in), as the cookies can be cleared at any point or simply be blocked/reset by the browser on each visit.

    PS: I do agree that many privacy-focused tools are also not really private, because they still are a 3rd-party aggregating data across the web.

    [0]: https://www.usertrack.net/

    • fsenart 1083 days ago
      Thank you very much for your comment.

      Nowadays, privacy is a pretty convoluted word. I like to consider it from the point of view of the most impacted actor, the end-user. And from his perspective, you remain a third party as long as his data is concerned. The sole fact that the tool is self-hosted cannot be a guarantee of privacy. Though, it's more likely to achieve stronger privacy if the number of third parties is small.

      Therefore, with your tool:

      - Either you have an identity (i.e., hash(x,y,z)) that is persisted over time (notwithstanding its accuracy).

      - Or you have an identity that is forgotten after a certain period of time (e.g., 24h).

      In the first case, it cannot be considered a privacy-focused tool, and in the second case, it has the same shortcomings I've described in the original question.

      ---

      It is crucial to note that the question is about the quality of users' metrics in privacy-focused tools.

      There ain't no such thing as a free lunch. End-user's privacy comes at the expense of actionable metrics. Furthermore, at best, people using these tools are not aware of the shortcomings and the risk of misleading numbers. At worse, these very concerns are kept away in the marketing speeches of these tools to minimize their real impact.

      Above is an opinion, and I would like to debate about it. About my possible misunderstanding of these tools. About possible solutions.

      • XCSme 1082 days ago
        I do agree that user privacy is a very broad concept.

        I do think there is a HUGE difference between centralizing data from many users across the web and sharing it with 3rd parties for marketing or intelligence goals AND simply tracking app stats in order to improve the user experience. The main privacy issue about 3rd party cookies and tracking is that a specific user can be targeted by entity B based on the actions he did on a property owned by a different entity A, without the user ever coming in direct contact with B.

        I understand that your main concern is not privacy, but the usefulness of such privacy-friendly stats. Analytics is a very complex space and even if you collect all the data in the world, usually the best decisions can not be derived directly just from the data itself. Also, for in most cases companies are not looking to spend a vast amount of time and resources in order to find the "best" way to go forward, instead they want to collect just enough data that helps them improve there business in some way.

        So, for example, if just by approximatively knowing the top referring domain for converting users you can spend more on marketing towards those specific users and increasing your sales, the analytics tool already proved to be useful.

        I believe that most of those basic privacy-focused tools already clearly state that they are a "simple" alternative and that they offer only basic stats. To be honest, in many cases, most users have no idea how to use Google Analytics to drill down into data and take relevant actions, so all the data being collected is many times never used.

        To sum it up: yes, privacy-friendly tools might offer fewer stats but for the majority of the users using those platforms those basic stats are enough.

        • fsenart 1082 days ago
          I hope you don't mind me if we stay focused on the original question even though all the subjects are interesting per se.

          > I do think there is a HUGE difference...

          This is off-topic and a matter of perspective.

          > I understand that your main concern is not privacy, but the usefulness of such privacy-friendly stats.

          It's not about my privacy concerns. It's about the purpose, legitimacy, and effectiveness of a feature. Showing a count of unique/returning visitors is simply a lie; privacy friendliness apart.

          > Analytics is a very complex space...

          This is off-topic and a matter of approach and experience.

          > Privacy-focused tools already clearly state that they are a "simple" alternative and that they offer only basic stats.

          This is the central subject of the discussion. They all promote simplicity and coolness. However, simple != erroneous. You can be simple and provide correct information or not provide them at all. Otherwise, there is a problem of ethics and liability.

          How can we trust tools supposed to handle our online privacy while at the same time the same tools are pretending something that is not true?

          And please, don't get me wrong. It's all about the metric on users. In effect, the same tools without these particular metrics may have an audience striving for such simplicity. However, with these invalid metrics baked in, it seems to be a more opportunistic move than a privacy-focused one.

          • XCSme 1082 days ago
            I am not going to contradict you regarding the opportunistic nature of many of the privacy-focused analytics tools showing up lately.

            > simple != erroneous

            What type of analytics are 100% accurate? I am almost certain almost ANY analytics tool is more accurate than the most popular Google Analytics (mostly because it is blocked by adblockers and being so popular it's used as a spam medium). So, I wouldn't really bash any of the simple analytics for being erroneous when GA is the "most" erroneous out of all, yet it is still the most used.

            I still believe my initial point stands, in that having somewhat accurate analytics is usually good enough to take good business decisions in most cases, and if you can do that while being more privacy-friendly, why not?

            • fsenart 1082 days ago
              We may converge. Though, let's don't end up in an ideological battle. Thank you for the ride so far.

              It's not about accuracy. In the original question, I've depicted a simple ladder of identity, and we both agree that the most accurate counter would be the counter of logged-in users. Anything after is by definition less accurate. In addition, the argument about adblockers is a matter of popularity and time, take, for example, the many contribs to open blocklist projects trying to blacklist the most popular privacy-focused tools. How ironic! Finally, I won't go into the GA bashing game at least out of respect for those who work on and with it, every day; an because it's out of scope here.

              So what can we compare? How can we conclude? Let's focus on the very constituent of a unique/returning user metric.

              It's all about semantics. We must first agree on semantic and then compare tools. And we both know what is the semantic of a unique/returning user and what cannot be.

              Clearly, there is no privacy-focused tool, to the extent of my knowledge, that can or do provide a unique/returning users metric. Though, the problem is that all of them advertise the opposite, sometimes event viciously.

              Any other discussion going beyond the semantic feels like I wanted orange juice; still, you provided me with a blend containing no orange while trying to either convince me that orange isn't that good or misleading me by advertising loudly that the blend contains orange.

  • elevate_lsk 1084 days ago
    We at Splitbee (https://splitbee.io) are trying to solve this with a hybrid approach. We allow people to use track people without a cookie and if they can get consent later on we can stick them with a cookie. Generally it depends on what data you want to get out of your analytics tool. For a ton of websites this data is more than enough.
    • fsenart 1084 days ago
      Thank you for your insights.

      The metric is pretty simple, the number of unique visitors. And the setup is simple, too, with no identity whatsoever. But are they compatible? This is the question.

      Your approach is interesting. You take a stance: let's have precise metrics when users consent and don't consider those who didn't give their consent. This approach relies on the underlying assumption that those who do not give their consent represent a minority and thus don't have a perceivable impact on the overall statistics. And this very assumption may be false.