It provides rationale for why GoatCounter exists and comments about _why not_ other solutions like Fathom, Open Web Analytics, KISSS, Ackee, Countly, Analysing log files, Google Analytics, statcounter, Simple Analytics, getinsights.io, statcounter.com. plausible.io/.
I can't say that I agree with that rationale. One of his major requirements is:
> Have a “strong” copyleft, including the so-called “network protection”, which mandates that people submit changes even if they operate the code as a service (rather than sending people binaries).
However the EUPL allows you redistribute under other "compatible" licenses most of which don't provide that "network protection". Effectively, the EUPL is only as strong as the weakest "compatible" license listed in the appendix.
 These "compatible" licenses wouldn't otherwise be compatible, except that the EUPL explicitly allows re-licensing to them instead.
First time I've ever seen a comment about accessibility on the homepage of a mainstream product like this.
As a blind developer this was just awesome, made me really feel like somebody out there is listening.
Cheers. Right now a11y support is a bit like IE11 support: it should work, but it's tested only sporadically since it's rather time-consuming, and as a solo dev time is rather precious. I'm also not blind myself or even a a11y expert so there will probably be some issues I'm just unaware of. I would really appreciate feedback on it, so do get in touch if you have issues.
Also, a11y support isn't just for "blind" or "disabled" users; it tends to make the page better for all users. This applies to everything really; for example while being able to tell coins apart by touch is critical for you, it's also pretty convenient for me at times, so this kind of coin design is better for everyone.
Also, a11y support isn't just for "blind" or "disabled" users; it tends to make the page better for all users.
Yes! Although I'm not "officially" disabled, I'm "blind" to my screen when I'm driving (you'll be happy to know), I'm half-blind to a message that pops on my screen when I'm drying off from a shower (no glasses in the shower is my motto), I'm "mute" when surrounded by strangers on a train, my fingers can't operate a mouse or keyboard when I'm doing dishes, etc., etc.
We're ALL disabled, and our circumstances change over very-short to very-long term as well. Having things designed with flexible interface options was one of the original goals of the web. Some of us remember before CSS, the publisher was supposed to specify semantics, and the user was supposed to specify presentation. I don't think we should go to that extreme, but I'd like to see our browsers, tools, and frameworks designed to make multi-UI flexibility easier and more common.
One tool I'd suggest looking into when getting started is Accessibility Insights for Web. A team at Microsoft developed a free, OSS browser extension for automatically detecting most common accessibility issues on your site: https://accessibilityinsights.io/docs/web/overview
Disclaimer: I do work at Microsoft, but my only affiliation with Accessibility Insights is as a happy customer :)
Modern FE tooling, for all the hate it (sometimes deservedly) gets, has some pretty great a11y stuff, eg my react projects use `eslint-plugin-jsx-a11y`, and devtools / lighthouse supports accessibility audits...
Thanks for the goaccess.io suggestion. GoatCounter looked really cool but their crazy decision to go with a GPL-style license for code you run on your website is a pretty serious deal-breaker for me. GoAccess looks much more usable.
It's crazy because the use of the EUPL license forces all your customers to copyleft their website code, which almost no business wants to do, blocking any potential customers from adopting your product. No adoption means no usage, no pull requests, and no revenue. You are free to license your code however you want, but I think you'll find the tremendous effort you put into developing this great project will end up being essentially unused by others simply because of your licensing choice. Monetizing open source projects is incredibly difficult, and simply pasting a GPL or EUPL license text into the project doesn't make them easier to monetize, it makes them harder to monetize.
That is not my interpretation of the EUPL, which defines "Derivative" as software "based upon the Original Work or modifications thereof". I don't think that including this could reasonable be considered as that.
I could add a clause about it to make it unambiguous, perhaps, but it strikes me as rather redundant as it seems fairly clear to me, unless I missed something?
> Monetizing open source projects is incredibly difficult, and simply pasting a GPL or EUPL license text into the project doesn't make them easier to monetize
Sure, I don't disagree with that. But as mentioned non-copyleft includes the risk of a certain kind of abuse that I don't really want to take, either.
You have to do what you feel is best in the face of uncertainty, just as potential adopters of your software have to do what they feel is best in the face of uncertainty about the detailed legal interpretation of how GPL-like language applies to libraries included by or bundled with a website. The interpretation of what is or is not a derivative work in this context is a subject that is legitimately complex enough to be the domain of actual lawyers and actual court cases not of armchair opinion-stating by developers. Even a tiny bit of uncertainty over whether ones entire web operations might end up GPL'd or EUPL'd is more legal risk than 99% of your potential customers will be willing to take on. A paragraph of "explanation" written by a non-lawyer and posted next to the formal license is not going to reassure your customers as to how a court will interpret the formal license component. But again how you chose to license your software is your choice, just as whether to allow EUPL'd or GPL'd software into their website is your customer's choice. The business of software is hard, frequently much harder than the writing of software.
I find it interesting how vehemently folks will argue about hypothetical legal issues with licenses like EUPL, when we have actual real-world examples of companies taking advantage of liberally licensed software projects at the authors' expense.
Have you read the copyright license for google analytics? Their license and EULA has not been tested in court either. There is nothing that proves that google can't claim copyright infringement for all sites using their web products.
"You will not (and You will not allow any third party to) (i) copy, modify, adapt, translate or otherwise create derivative works of the Software"
As you typed, The interpretation of what is or is not a derivative work in this context is a subject that is legitimately complex. Its not tested, beyond the fact that the companies of 1/3 of the largest websites has had their lawyers green light to use software with such language in the license. So far the bet that a website does not constitute a derivative work of the analytic software it is using is holding.
Again, it's your call. You'll hear lots of input from customers and potential customers. Some of it you should listen to, others you should not. The one thing that's rarely worth doing is trying to convince an individual customer they're wrong, because even if they have a demonstrable misconception it doesn't scale to try to convince your customers one-by-one that they are wrong. You need to do that at scale, and sometimes that means buying into how they view your product even if it's not how you view it. In the meantime I, like many others, will continue to not integrate GPL code into my website, even if google analytics uses the words derivative product in their EULA.
Note that it was someone else who replied to your last comment, not me.
I actually have a local branch that I made after you last comment to change the license of count.js to MIT, but then I thought about it some more and wasn't sure if that was the correct thing to do. My concern is that "EUPL with clarifications/exceptions" would be more complex than "just EUPL".
While "telling customers they're wrong" would not be good, changing stuff at a whim after singular complaint would not be best for the product, either.
Also, providing feedback by calling stuff "crazy" is probably not the best way to get people to listen ;-)
Some irrationality will always exist and some people can afford to have a phobia against a software license if they work in a industry with little or no competition. It a similar to the trade off in using google analytics, where some companies are can afford to allow google to data mine the traffic in return for analytics, while others either value their user data as being too valuable to give away or have legal obligation that prevents them to send it to a third party outside their jurisdiction and control.
There is a growing trend in EU that giving over traffic data is not really acceptable (or legal) if they represent personal data. Here in Sweden there was a lot of embarrassing leaks where classified information got mishandled and government contracts with IBM broke the law as data left the country. Just a few months later a major medical scandal happened where audio recordings of patients slipped out and the medical confidentiality was broken. The cost is climbing high, and together with GDRP it is really pushing demands that data do not leave the border.
Naturally one can always spend the money to develop a custom system, but then we come back to the problem of competition and budget constrains. It is not easy to get such projects green lighted, especially if some engineer comes up and suggest that they can just use some free software and put that developer time on more important things.
If this tool could also show you anonymized aggregates of click trails through your site, I'd be down in a heartbeat. Raw visitor counts and some metrics on where they came from are sometimes useful, but for a web app the much more useful info comes from the pathways people take through the app.
I applaud anyone one makes an effort to avoid thirdparty analytics products. Analytics in general does nothing to help the users of your site while pushing additional work onto the client and leaking information.
But looking at data is fun so I ended up creating my own super light counter that I run on my site so I can see hits. My goal was to store as little information as possible - only hit counts as stored, and no cookies are used at all.
> Analytics in general does nothing to help the users of your site
I find this really interesting, and am amazed that more sites and services don't surface some of their analytics data to users. Look at the success of yearly "wrap up" campaigns (disclosure: I work for a company with one of the most famous versions of that mechanic, but don't work on it).
You'll get users opting into some data and tracking if there's some tangible benefit to them on the other end. It seems like people love learning about their usage of products, and there's a lot of data that people would be happy to share if they got some benefit too.
For example, I know Google tracks when I click a link in a SERP - but now that they surface the "you've visited this X times, last time on Y", I'd happily opt into that data collection because of the pseudo-utility/interest factor of it.
I think about this often while I work on small data sets and reporting, mostly lead and customer data (think PPC reporting or CAC:LTV reports) and I have a couple theories.
The one that seems most natural is that organizations don't want people to know how much data they have on them. If too much of it was customer-facing and not wrapped up in a cool "2019 Wrap Up" video, then pressure would mount to be even more transparent, and eventually accountable for, the data organizations collect.
I think there are a few others, like the value to the bottom line that it offers. Most companies optimize heavily there so the only real applications are the ones that would like to drive more revenue, such as "Only 2 seats left!" or "Last One In Stock!" messaging based on urgency and fear. One-dimensional stuff.
I also look at it from the resources perspective. I think lots of companies are spending time and resources pretty poorly. Companies I've worked with outside of startups often forget how and why they make money and end up spending lots of resources on things that might not matter. Service professionals, for example, usually rely on a network connection like the local Chamber of Commerce for business. Despite 80%+ of business coming through that channel, they insist of trying social media or PPC ads instead of doubling down or identifying a similar network when they explore growth. This is natural ignorance that they can learn to overcome.
I really hope we get more data-sourced initiatives in the future. I use a few apps that do a little bit of it but leave a lot to be desired: Goodreads, Strava, Nike Run Club, Spotify, Audible, Kindle, & YouTube come to mind.
My dream is to have a Life Dashboard. I had designed it with some of these apps in mind but the API's and the output I'd get weren't enough to pursue when life got busy.
I like seeing general analytics about a site because I am nosy so I enjoy seeing "This blog post was visited 300 times" type information.
But I would hate to start seeing "You, personally, have visited this site 14 times" start cropping up because it would remind me how much information on me is available. Intellectually I know this data exists in Google Analytics, but actually seeing it would creep me out.
I use to keep a cookie that was only used client side with the previous pages the visitor visited. It grew a menu in the side bar. I never got around to it but it could be interesting to generate a tiny tag cloud for the visited pages and say 3 article suggestions based on those. I didn't build it because the "visited = interesting content" doesn't seem real to me. Its more of a top 10 of click bate headlines.
Any site is constantly being accessed by bots, only some of whom announce themselves in the user agent. Some are deliberately designed to mimic human browsing and you can only tell by carefully following their access pattern.
The most obvious one I miss (I don't currently only use server-side analytics) is something like screen size. You can do user-agent sniffing to _guess_ what the size of a mobile device is, but it doesn't tell you whether or not you can stop wasting time making your content responsive on a tiny screen that no-one uses anymore.
Are you talking about mobile ? Sure there are less pixels but the size if my phone (i.e. the size of my pocket) is between 15 (laptop) and 25 times (desktop screen) smaller in area. My pocket will be smaller than my arms or my desk. So a different site presentation will be in order.
The disadvantage there is that sometimes you want to give the user a totally different site if their client is mobile. CSS queries are indiscriminate in that a smaller browser window may trigger the “mobile” css. Likewise, many tablets have similar screen sizes to some laptops, yet often you don’t want to present the same UI to a tablet and laptop.
So, so many bots. 100x might be hyperbole but only just.
We host websites and the bots are super annoying, because even the well-behaved ones throttle requests per domain, which means they just hit all of our customers at once. If our cache architecture were a little more rotten, like I’ve seen on other jobs, then bot-driven evictions would get ugly, instead of just spiking our traffic, increasing our overhead, and making it harder to get clear metrics.
I agree with you, however the only popular static site host (other than having your own vps or whatever) to add an analytics solution that I've seen is Netlify. I use it and I'm pretty happy with it so far, but it's very barebones and pretty expensive ($9/month for just the analytics)
A few reasons have been pointed out by others, but let me include another.
others have pointed out:
- Client side SPAs sometimes don't hit server logs
- Some static sites are hosted places where you don't have access (github pages, netlify, etc)
- Bots are sometimes defeated by a simple js file
This seems like a perfect solution for a portfolio, blog, or new project. I like that it's open-source, lightweight, and has a self-hosted option!
Elevator pitch from `rational.markdown`
GoatCounter aims to give meaningful privacy-friendly web analytics for business purposes, while still staying usable for non-technical users to use on personal websites. The choices that currently exist are between freely hosted but with problematic privacy (Google Analytics), hosting your own complex software or paying $19/month (Matomo), or extremely simplistic "vanity statistics" (Fathom).
GoatCounter attempts to strike a good balance between various interests. Major features include a free hosted version so people can easily add analytics to their personal website, an easy to run hosted option, an intuitive user interface, and meaningful statistics that go beyond "vanity stats" but still respect your users' privacy.
Looks nice. I’m somewhat surprised we haven’t seen an obvious alternative to Google Analytics yet. It’s got a wide and deep surface area. But feels like for the majority of eg B2B SaaS apps there’s a much simpler solution to be built. Something that conves mainline scenarios like:
- what channels / sites / campaigns is my traffic coming from?
- what pages are people landing on?
- what pages are driving conversions?
- what do my conversion goals look like (percentage and total conversions)
GoatCounter doesn't do a lot of these things ... yet, but it's definitely planned.
I'm a little bit hesitant to look too much at GA, since I don't to just make an "open source GA". In a lot of jobs I worked at we were essentially just "making a shit copy of a shit product", to put it crudely. I really want to avoid doing that.
So the way may be quite different, but the goal of providing meaningful business insights is definitely there.
I used it for a while, and found the UX really hard. YMMV, but I wasn't happy with it.
Also, the hosted version isn't free, and self-hosting is also comparative expensive (vs. free) and time-consuming. IMHO any serious GA alternative should have a free hosted option. I wrote about that a bit more in-depth yesterday over here: https://lobste.rs/s/ooag4u/goatcounter_1_0_release#c_o76csv
In past many companies wanted to do "free analytics for people", and we don't see them anymore, you can't compete with Google offering having just free option. Google runs on scale and they can keep it free, and sell or use data from it.
If you want to build analytics software on moral grounds for privacy and stuff you will just bleed out or just run very niche or indie business. It's great for nomadic makers, but not for serious business.
Look on Matomo, Simple Analytics or Fathom. They are all great(besides Matomo) but they can't compete on other market than small business. And yes, I know that Matomo has enterprise clients, but they are also small comparing to GA. :)
Want to compete with them? Have a great plan and support from major search engine like DDG. If not, then you can make another Mixpanel(which is great!).
Then you've got to give people a server-side component they can run, so that the infrastructure (not to mention security/privacy) load is handled by the right party. I personally think it's horrible that it's become industry standard to sell out customers like this.
How badly do you want to avoid tracking your users?
It's hard to get statistics on conversion when you don't even track users across pages on your own site. It looks like GoatCounter can't show you unique visitors or how they move around on your site because it doesn't track them. There are no cookies on the main page! This seriously limits the kinds of features that can be implemented.
Yeah, I'll add some sort of "cookie tracking". It's just not done yet. There is some prior art at Fathom and Simple Analytics on how to do that while still preserving anonymity. I'm not entirely sure yet which approach I'll take.
I might make it an optional feature, too. Again, need to look in to it in-depth.
There are a zillion-and-one things to do, and thus far other things have taken priority :-)
You can't optimise conversion if you remove the ability to track users. If you see 8 hits on page A and 4 on page B, is it that 4 people left or that the first 4 people visited the page twice before they moved to page B? You can't tell without the unique cookie.
That said, having a unique but anonymous cookie within a single site isn't the end of the world but Google only provide GA for free because it is useful for determining other things.
The main reason is that it took tens of years to build Google Analytics as it is and Google has the advantage to be able to provide more information about the user such as demographic data (gender, age, etc) since they have all the data in-place.
Having said that, there is no need to create an exact copy of Google Analytics because most of the people probably use only 20% of the features anyway. Each business has its own use-case and data source so it would be much more convenient to ingest all the raw event data into your data warehouse either using third-party tools such as Segment or open-source tools such as Snowplow and Rakam. This is the only way to have full control over your data.
1. If you don't want to store sensitive user-data, just don't send it to your servers.
2. Create the reports either using SQL or something like Rakam that provides you an interface similar to Amplitude / Mixpanel but on top of your data-warehouse so that you don't need to share your data with a third party service.
Can it also run in "server log parsing" mode so that no JS scripts are required? This makes it even easier to avoid processing personally identifiable information (if you don't collect the full IP) and it works even if people have JS disabled (or if the clients are automated scripts).
No, but this wouldn't be too hard to add. I'm not planning to work on it soon, but I'll be happy to review PRs/provide guidance on how to build it.
I will add docs on how to run it in "server mode", where instead of using a JS script you add a HTTP request in your apps middleware. This is an idea I had the other day and I did some research on it, and it should work quite well (haven't started work on it yet though).
There's no definitive answer to this as it would take a court ruling (that hasn't happened), but my own I am not a lawyer but deal with GDPR/CCPA professionally understanding does not match the "You don't need to deal with GDPR" pitch of these services.
Say you run a SAAS and install this on your marketing site. You're still sending IP address and potentially identifiable information to a third party processor.
We (on HN) don't consider IP addresses as PII, but from a purely practical standpoint ad data brokers are selling/bidding on IP addresses all the time which makes them more than nothing.
You (as the controller) also need to validate that processors (services that you are using) are in fact doing what they're saying. You'd still need a Data Processing Agreement in place with GoatCounter because otherwise Goat could start collecting additional information without your knowledge, start generating more metadata (GeoIP/Company) from the IP, etc.
I'm just saying it's not only about not collecting data, but the processes that surround it and safeguarding users and their privacy.
Nice work. I'm currently in the process of building something similar to this and Fathom - mostly out of curiosity - I will ping a link up here when it's ready for a test-drive.
The current setup is similar to Fathom in that I temporarily track a user session by generating a unique hash for the user, then, if that hash has already been seen in the past 30 mins, we move hash to the latest page view instance and can increment a pages viewed counter for the session. We can't tell which pages you've been on in the past, only that you started a session at X time, and viewed N pages, with the last view at Y time.
Incidentally, I had a PoC for the data ingest running as a Cloudflare Worker using their KV storage. What could be interesting about that is that there'd be zero third-party widget code to inject into a webpage: You'd log the pageview in the worker and pass on the request.
But the market for those Wordpress users who want to paste a few lines of JS snippet into their site would be lost. And it would add a few 100ms to each request you want to log.
The only thing that stops me from switching from Google is pricing. I have 2 niche blogs (does that count as commercial?) as a side project that barely makes any money and paying $180/year is completely unrealistic. I would switch without blinking with a more friendly pricing.
I'm not too fussy about it; if you have a small side project with a reasonable amount of traffic that earns you a little bit of pocket money then that's just "personal" as far as I'm concerned. It's really hard to codify these kind of things, so I just made it "commercial/non-commercial" which is simple and clear.
Hit me up on email and we can arrange something: firstname.lastname@example.org
It was actually written as an alternative to Fathom! My original plan was to contribute to Fathom until it suited my needs, but the Open Source version of Fathom is on indefinite hiatus and the maintainers are working on a (closed) rewrite. I decided that starting from scratch would be better, as there were some things I would have preferred to do fundamentally different. What I really wanted was to add analytics to another idea I was working on, and while it's a cool idea that everyone I pitched it to seems to like, it doesn't have any good monetisation options, so I decided to work on this first.
What's wrong with just parsing the webserver logfiles like we used to do (I still do this)? Is that too old-fashioned now or something? Doesn't require any account or paying anything or setting any cookies (and therefore you don't need to have that annoying cookie warning your users hate so much).
Ye I did that (also I had to anonymise the ips) but someone decided to run a continuous pen-test against the site over the course of days (I swear they forgot to turn it off or smth) and tragically I had to rework my infrastructure to accommodate much greater traffic than expected (BIG BIG BIIIIIIIG log files) as it made some of my processing fall over.
Its nice if someone has just done some of that sort of stuff for you.
I've been a happy user since you demoed this here last August or so. Have also been pitching it to some friends for their personal sites. I donated for the free tier. Really appreciated your writeup and rationale for providing a free alternative to GA. Thank you for your efforts!
There is no API, and unless a lot of people ask for it I'm not planning to build it any time soon either. Making an API isn't too hard, but providing a stable API would slow down development quite a bit as this project is really still in its early days. I don't think it's worth it right now. Sorry :-(
Better data export facilities is something that I intend to do soon-ish, and you could build your own UI with that, but it'll be based on data/DB sync rather than querying an API, which is quite a different workflow.
How do you know it doesn't need a consent notice? Anything that tracks people uniquely, via a personal identifier, is in-scope and counts as personal data. Is the assumption here that it counts as 'legitimate interest'?