I have presented the 5 fundamental axioms that define web analytics in a previous blog and also have given examples of how analyst and tools attempt to skirt these assumptions. I maintain that many of the differences and confusions that arise from comparing vendors comes from deviations in addressing these five assumptions. The following gives a more detailed illustration of this point.
One of the most fundamental and one would expect least challenged would be determining from whom client data is being collected (Axiom 2). Here is a blog that announces a feature that was added to Google Analytics in September 2009 that allows users to now incorporate unique visitors as segments in their custom reports. The author, Justin, rightly indicates why this is such a F***ing big deal. Previously these counts only showed up in one report – Absolute Unique Visitors, as though they could not be more unique. He is too kind to excuse GA for not providing this feature earlier – arguing that tracking unique visitors “takes a lot of data processing power.” The implication is that one of fundamental assumptions of web analytic collection, the ability to uniquely identify and segment visitors, can be skirted if it is expensive or resource intensive. Really there is no excuse even for a “free” offering much less for an alleged “enterprise” offering.
The more interesting point is actually in the comments, when a commenter wants to know what is the difference between visitors and unique visitors and in his case the unique visitors are greater than visitor counts. Justin replies maybe there is a confusion between visits and visitors. The commenter says nope, here is the data:
Unique Visit0rs 141533.
Absolute Unique Visitor count is greater than Visitors??? This sort of brings into question the meaning of absolute.
The confusion is that more often than not, users of the tool wonder how this can be and what did they do wrong. Its as though they fell through a rabbit hole and ended up in the world of web analytics where logic has been turned on its head. No wonder analytics seems difficult to comprehend. In this case it is.
What is happening here? The basic rule of thumb is that if you measure the same thing in two different ways you will get two different answers. Here there are two different answers therefore two different definitions of visitor. What are the possible ways that a visitor can defined? One clue comes from Google Help. There are two ways that a visitor can be identified and counted – using cookies or combining IP Address + user-agent The latter is essentially a cookies-disabled identifier that can be formed and applied on the fly for anyone and any user-agent. The Unique Visitor count is based upon a global unique identifier (GUID) that separates the user-agents that may be rolled up in the visitor identifier.
To make things even a little more interesting (meaning even more confusing) the help results includes a paper from Avinash Kaushik, from over 4 years ago – Standard Metrics Revisited: #1 : Visitors. Avinash is an important contributor to our understanding of web analytics and in this paper argues:
It is a disservice to the world that so many names exist for the same metric. When we mention visits and unique visitors depending on the tool it is either the same thing or not, it confuses decision makers and sometimes it means that if we rip out Google Analytics and replace it with CoreMetrics we have to unlearn old definitions for the same thing. Quite sub optimal.
Avinash Kaushik, Occam’s Razor, 11 September 2006
I believe that names are in most cases different because the metrics are different and problems arise when the same metric is implemented in different ways by different tools but I see his point. Interestingly CoreMetrics has a very strong implementation of unique visitors that handles in-network and out-of-network identification continuity as well as in-network cookie churn, where as GA apparently has a different take that we are discussing here. The difference is not necessarily in the definition but the implementation.
In this blog Avinash suggests that visitor be the number of sessions within a report period identified by a transient cookie and unique visitors be identified by a permanent cookie. I would argue, as does one of the commenters, that the former is actually a visit and the latter a unique visitor. A visitor in general without adjectives and qualifiers would include visitors with permanent cookies and those unable to receive cookies, or more accurately cookies-enabled visitors vs cookies-disabled visitors. With cookie identification, the two kinds of visitors should add up to the total number of visitors, with far more uncertainty in the number of cookies-disabled visitors. In all cases without exception total visitors will always be greater than unique visitors and the ratio gives a metric for the efficacy of the identification process.
So what is happening in the GA data? Clearly visitors and visits are counted differently with latter over 2 times the visitor count, meaning that a visitor on average during the reporting period visits the site 2.3 times. Also clearly there is no positive number that can be added to the unique visitor count that results in the total visitor count. So I would conclude that visitor counts and unique visitor counts are completely unrelated. What has happen is that GA uses the cookies-disabled (IP-Address + user-agent) identifier over all visitors for the visitor count and the unique cookie for the unique visitors where cookies-disabled visitors have been removed (because there is no cookie identifier). The difference we are seeing is the difference between these two identification systems where some of the unique visitors get rolled up into the same IP-Address + user-agent identifier. There is no way of determining how many visitors have disabled cookies – a useful measure to monitor.
So what is illustrated here? First I would direct the reader to the Web Analytics Association (WAA) Standard (pdf here) where visits (sessions) and unique visitors are defined with a good set of notes and questions to ask your vendor. Essentially the user of the analytics data needs to understand this basic concept in detail for their specific tool since every metric derived will trace back to this measure. Counts that are session based should be defined as visits (not visitors) and cookie-less identification should be separate from unique visitor identification. My preference is that number of visitors include both unique visitors and cookie-disabled visitors so that unique-visitors will always be less than the number of total visitors and from which one can determine the number and behavior of cookies-disabled visitors. This is especially true if you plan to do testing or adaptive optimization that require periods longer than a session to complete. If you plan to do predictive analytics and targeting then you need visitor identifications that are even longer duration. If you believe that cookie churn is a major issue, then measure and monitor it.
For most of what you will want to do with regards to site and channel optimization, start with unique visitors and segment from there to get the most highly correlated data. Again people argue that cookies identify browsers and not users. Matt Belkin VP of Best Practices at Omniture has even argued that Unique Visitors is a flawed metric. I don’t agree with Mr. Belkin especially preferring session metrics over visitor metrics – see Visitors vs Visits – but that can be a topic of another discussion. If your site has log-in then you can track individual users resulting in data stream that are even more correlated to individual behavior. However even in these cases you still need to track users from when they come to the site anonymously to when they log-in.
Your ability to maintain visitor identities among these tree regimes will be a significant part of your analytic planning and hopefully the vendor you select can provide help in this by supporting setup for first party cookies; properly linking identifiers to handle cookie churn and seamless movement between anonymous and user-ids; and providing monitoring of users that have disabled cookies or user-agents that don’t support cookies. As I have noted before, some vendors take this process more seriously than others.
For a major vendor such as Google Analytics to provide the capability to segment unique visitors years after the original tool has been introduced argues for vigilance in determining how well your analytic tool adheres to the fundamental axioms of web analytics. Otherwise, the reports will be terribly confused and the user feel much like Alice in Wonderland – “It would be nice if things would make sense.”
I have specifically called out Google Analytics here. But my experience with Yahoo Analytics is the same with respect to visitor counts. I am shocked and dismayed that this issue still persists to this day. If you are new to analytics or are a veteran, if it does not make sense, go back to basics and ask the “simple” questions again so that you know precisely what is being measured. Don’t assume the vendors have it right. Insist on logical answers without the spin.