Building Reports that Cover All Behavior
In my line of work consulting with various customers I have been able to see common threads in how different companies approach web analytics and apply it to their business. One puzzling aspect that becomes immediately obvious to see but takes longer to understand is how many reports seem to cover only a small portion of web behavior with most visitors and visits pushed into the general category of ‘None’. None meaning that visitor does not belong to any of the categories for the dimension being reported.
A Puzzling Predicament
What is puzzling is: How do such reports provide useful information on visitor behavior? The next question that comes to mind is who are those people that belong to ‘None’ or for the person that implements the report: Did I set up the metric correctly? In brief, How many of that are None should be Something? This has lead me to the concept of “None is not an Option” reporting, meaning None is not a category of behavior but an indication of analytic failure or incompleteness.
As a consultant I can typically give advice but too often not work through the details implied in the advice. In my current position I have had the opportunity to be the designer and implementer and put this concept to practice. As with most theory, implementation of the theory presents its own challenges and insights. Here I will share what I have learned.
Because we will be dealing with implementation I will focus on a specific tool – Adobe (Omniture) SiteCatalyst 14. The concepts can be applied to Google Analytics though it has a different scheme for defining custom variables or data dimensions. Google forces one into schemes of grouping variables into limited number of tag plans (called slots) that one has to consider the strategic implications of the data being collected.
With SiteCatalyst one can quickly get into the mode of treating page properties (sprops) or event dimensions (eVars) as simple fields that one collects and then stitches together later. The unfortunate out come of this approach is that one quickly runs out of the available number of custom variables that Omniture allows, though at first they may seem inexhaustible. On top of this, one has the rude awakening that ability to relate these various fields in reporting has limits to number of correlations or full relations that allow one to pivot or drill down from one dimension to another.
When I confront the customer with the fragmented segmentation and non-strategic value of the SiteCatalyst reports, the response back is “We can get most of what we want from Discover”. Discover is Omniture’s data warehouse exploration tool that removes most of the irritating limits of SiteCatalyst and allows one to cross correlate all dimensions. With SiteCatalyst 15 many (not all) of these limitations go away so one will be tempted to continue this approach without further reflection.
The problem is that though Discover and SiteCatalyst 15 remove limitations on how variables can be correlated and queried, they do not address the underlying concern – fragmented segments. These are segments that may or may not have relation to each other in that they clearly represent segments that most visitors do not belong. This shows up as a large percentage of visits or visitors that belong to “None”. On a practical level, these variables represent a waste of a perfectly good slot that could be used to collect data that covers more visitors. So we got to do some heavy data minding in SiteCatalyst before we can do data mining in Discover.
Analytic Capabilities as Strategic Plans
Clearly there are situations where the data collected will apply only to a small portion of visitors or for specific tasks that the visitor performs (acquisition, on boarding, marketing, etc.) So as an analytic solution designer one is attempting to deal with a large number of different demands from different business units for data specific to their special needs. Without a strategic plan or master solution one is forced to service these request on a one by one basis leading to ad hoc solutions for each case. This may satisfy the customer’s immediate need but rarely provides a lasting solution because there is always the next question that arises from the data.
The Fundamental Question
As a data architect (web analytic professionals are called upon to perform many roles) one has to consider once the requested data is collected how will it be used and integrated into the business process. So a good followup question back to the customer when presented a list of data fields that should be collected is: If you had this data, what would you expect to find and what decisions would you expect to make resulting from the findings?
The back and forth in addressing this question will usually lead to development of a web analytic capability that addresses a business need rather than describe an application or sequence of pages. Since business processes are more formalized than applications, there is good chance that the resulting analytic capability can be applied in other aspects of the business and across all the web properties of the company.
How Analytic Capabilities Pop Up
Analytic capabilities are the foundation of all current web analytic tools. Web analytic providers have from the very beginning looked for common needs across all their customers to provide reports that meet those needs. Whether it is tracking marketing campaigns to the customer’s site, tracking shopping cart activity across the various stages of the acquisition process or simply evaluating the effectiveness of forms most tools provide “canned” reports that address these needs. More recently special reports suites have popped up to assist companies in evaluating their effectiveness in social media marketing!
Plug-In or Customize?
This is all well and good if your business fits the model represented by the analytic capability and the reports satisfies your business needs. However in application one still needs to adapt the capability to the specific needs of the business. So to hedge the market, web analytic providers provide custom variables and reports to allow businesses to define their own analytic capabilities.
Omniture’s offering is particularly good in this respect allowing even their standard packages such as the shopping cart to customize names of variables and events to their customer’s specific requirements. They also provide a rather large collection of ‘plug-ins’ that add specific capabilities to their standard instrumentation (collection) script. One that I would highly recommend is their Channel Manager plugin that results is an All Sources Report for all external referrals (paid and non-paid) to a site!
Up and Down Sides of Flexibility
With such flexibility one is provided great opportunity but with the very real downside of providing enough rope to hang yourself. When using custom variables and reporting one must continue the discipline of identifying analytic capabilities that address general needs that can be adapted to specific business requirements.
For example, the company I am working with now has an acquisition process but does not follow a shopping cart model. The events and special variables associated with shopping cart model can be adapted to a degree. There are products and purchases of products but the business does not refer to them as products and purchases, they have their own business terms.
Special KPIs (success events) and segments (eVars) must be defined which may provide answers to overall performance but the next questions of coarse are “Why are the KPIs what they are?”, “Can we improve them?”. Therefore, in this case, this means tying funnel reports to pathing reports and ensuring drill down (correlation) between KPIs and segments.
The result has been collection of web analytic capabilities designed specifically for the business needs but sufficient to cover all acquisition behaviors and visitors in SiteCatalyst (focused on understanding normal and daily monitoring) and Discover (focused on … well .. discovery).
How to Define and Design a Web Analytic Capability
As was mentioned above, developing analytic capabilities is a discipline that has to be carried over from the web analytic providers to the custom reporting. As a discipline we should be able to describe the various activities involved in its development. To this end let us start by formalizing the concept of Web Analytic Capability.
A Web Analytic Capability has a well defined purpose that is usually expressed in business (not application) questions that can be addressed by the capability. This is then translated into sets of reports and the data that must be collected to support these reports. Another essential element that adds power to the reports are the decisions or actions that can be taken based upon the data and/or reports. In other words, design of actionable reports.
Business Requirements without Analytic Jargon
The requirements are done independent of the specific analytic tool jargon. In the case of Omniture, talk of eVars, sprops, success events, correlations, relations, pathing etc. must be expunged from the conversation. Campaigns, call to action, key performance indicators, channels may be appropriate depending on the business. The discussion is with business owners that make decisions and data analysts that support those decisions. Therefore the requirements should be expressed in their business terms and translated later into web analytic jargon in the implementation.
Universal Rather Than Particular Solutions
In the solution design you are attempting to develop a capability that is “universal”. In this case universal means capabilities that cover all visitors or segments of visitors. If the request is to track a specific link, the response should be why not track all links and provide means of selecting the specific links the customer is interested in. If the request is to segment visitors by specific actions provide a mechanism to expand and identify other actions in the future. In the case the of Channel Manager mentioned above, if the request is to track marketing campaigns to your site, why not track all of the sources to the site providing an All Sources Report to track everything.
In the back and forth between need and solution the result should an analytic capability that is understood by the business and the solution designer to the extent that the capability can be used to meet the business need beyond the specific concern of the immediate project.
Universality’s Relation to None
One of the implications of universality is that all visitors are categorized and None is not an option. Even if the visitor does not belong to any of the specific segments of interest currently, place them into a segment such as ‘Unknown’, ‘Currently Not Categorized’, ‘Not of Interest at this Moment’. Anything but ‘None’ or ‘Unspecified’ which are terms used by SiteCatalyst to bucket all the counts that cannot be attributed to the specific classification or variable.
This “Unknown” category will serve two important purposes. First it will identify the scope of the specific parameter or segment. That is to say the total number of opportunities that the data can be collected compared to the specific data of interest. Second it separates the opportunity from non-opportunity represented by ‘None’. In others words ‘None’ represents not being able to collect the desired data and hence a valuable diagnostic tool in ensuring quality data.
Even if you don’t expect to collect the specific data for all visitors there are likely cases where you do expect the data to be a necessary prerequisite for a latter user action or event. In these cases the KPI should have no attributions to ‘None’ and if some are found you have your first clue that something is wrong and a need to track down the cause.
On the other side, the ‘Not Categorize’ category can be correlated to other dimensions allowing for potential identification of new segments that can be tracked, eventually moving visitors into other categories and providing more insight into the process being observed.
The Concept Put To Practice
By defining web analytic capabilities you will progress from simply collecting behavior data to understanding how behavior should be reflected in the data collected. The capability is sufficiently abstract to provide a guide for how to address not only a specific problem but the entire class the problem represents. This in turn provides a framework for categorizing behavior and actions that is more complete and where None is not an option.
Get the Questions Straight
For example, a frequent request starts with “We have a form that the customer must complete …”. So one might respond with “Form Analysis” but as the discussion evolves the issues have much greater scope and questions more probing than “Where is the user having troubles and abandoning the form?” Not that it is an important question but just not the first question. The larger question is task completion: Those who start a task (form, wizard, web flow) how many complete the task? Or task relevancy: Is the task we gave them the one they wanted to perform?
What type of task are they performing? Acquisition (Cart Checkout) and On-boarding (Registration) are tasks where the main objective is getting the visitor from Start to Finish as quickly and as simply as possible without diverting and distracting the customer with other tasks. Service tasks on the other hand are dynamic where the user has more freedom to preform many actions within a task process. These are user convenience features that need to be tracked and understood.
So looking at the task as a process where the form may be an element, the more relevant and immediate questions are:
- How and where do users enter the task?
- What scenarios can the user perform within the task process?
- What has been accomplished when the task is completed?
- Finally, where and why do users abandon the task once started?
Start with the First Question
The first question is addressed by tracking both external and internal channels so that when a user starts a task we know how they came to the site and what they viewed prior to starting the task. One may go a step further and provide click placement tracking to understand where on the referring page the user clicked to start the task. Here the “None is Not an Option” solution is obvious: track all sources coming to the site (Paid AND Organic) and track all channels within the site. If you are doing click placement tracking consider tracking all links. When this is accomplished, None means a channel or action that is not tracked.
Triage Business from Analytics
The second question is more specific and takes more effort to analyze and abstract into an analytic capability. The business owner wants to know everything and the significance of every field entered and action taken. So triage most be done to separate what represents business concerns and what reflects user behavior. The data for the latter can be collected with web analytics with the former best left to back end transaction engines, CRM and other business data warehouses. The issue here for the solution designer is establishing a way to connect the data on the backend to “color” business facts with user behavior data.
Establish Realistic Expectations
Categorizing the scenarios preformed within a task (especially a service orient task) has additional challenges. Is this first time the visitor performed the task? If they are returning how do their previous visits effect the current visit? When is the intent of the visitor revealed – at the start, during or at completion? If they select certain options or perform specific actions within the task, is it truly of strategic importance to capture these machinations or can this be part of what is reported at the completion of the task?
There are many more questions that can be asked but from a web analytic capability perceptive, the NINAO options are:
- At the start of the task, have I captured everything I know about the user that will define or effect the user experience during the task process?
- At the completion of the task, have I captured everything that was accomplished during the process hopefully available on the confirmation of the successful completion of the task?
- For actions within a task process, how can these be best captured: Milestone (Step) events, click events, or changes in content generating page view events (even if the action does not result in an HTML page transition)?
Mockup Reports to Confirm Design
Once having defined what you believe covers all entries and exits from the process cross check this with your measures (events) and how they will be segmented and broken down. This exercise will result in mockup reports that can be reviewed with the business owner and analysts.
In the back and forth exchange, the reports can be tuned and simplified such that what is important to understand is clear and presented succinctly in the proposed report and not encumbered by user machinations that can divert attention from what is important to understand and follow at a high level.
Now the Other Stuff
Now you can consider Form Analysis, Fallout Reports, Page Pathing Reports that will help to answer questions that will arise from the reports you have defined for task analysis. From my experience, I provide Funnel (or Waterfall) Reports that provide a means of understanding the overall performance to business owners but then can be broken down and analyzed at the next level by data analysts using these more detailed reports.
From Solution Design to Implementation
Once solution design is in place, you can now consider how the design will be implemented in a particular web analytic tool. The solution design should identify how the data will be collected and the options for configuring the web analytic capability to a specific solution, in this case task. The analytic capability will have to make public to the web designer or application developer what is expected for the data collection.
For example, with Form Analysis it is expected that the web designer use the standard HTML elements that make up a form. But with the Dojo and other web toolkits this is not as straight forward so the standard web analytic provider form analysis plug-in will not work. Even if it does, how are the forms identified for analysis and the fields named for reporting?
Application Execution Context API
This leads to identifying and formalizing the API between the web application and web analytics and how the web analytics gets access to the application execution context (AEC) on the server or client. With standard HTML the AEC is the Document Object Model (DOM) and HTML Event Model but in most cases this not sufficient to collect all the data required so there must be mechanism to bring the relevant business data into web analytic stream. Traditionally this is done by including static analytic tags on each page that the web analytic instrumentation collects and sends to the web analytic provider (WAP).
Dynamic Instrumentation and Data Collection
However more and more it is necessary to dynamically collect and process the data on the client and at times maintain state between page transitions in the web analytic instrumentation. How this is addressed is a much larger topic that is beyond the scope of this post. However it is in this environment of static tags that we often must consider our tag plans.
When considering the number and configuration of variables and properties that make up a tag plan for a web analytic capability, one has to consider how these will all be brought together in the various reports to the customer. The application or web developer is usually willing to provide fields of data that describe a page or context for an action, but it is the web analytic implementation that must stitch these various fields into meaningful tag variable assignments.
Now the Trouble Begins
For example, in the task web analytic capability defined above, clearly each task must be identified by a task name that can be attributed to Task Start and Task Completion events. For SiteCatalyst this means an eVar that can be attributed to Start and Completion success events assigned to the appropriate pages – start form and confirmation page. Now we have different segments we need to attribute to these same events but also must relate to the task name. Of course each task name will have their own list of scenario segments as well as task completion out comes.
Now the vulgarities of SiteCatalyst come into play. I can make the Task Name eVar have full relations and then define eVars for each of the segments (or stitch everything together is Discover where all eVars have full relations). Then from a Task Name report I can break down the results by any of the segments (eVars) I have defined. With this approach I quickly run out of eVars. Even if I am ‘strategic’ in my eVar assignments such that multiple tasks can share the same segment, the eVar will typically cover only a portion of the visitors and introduces fragmentation and None as an Option for those segments.
A Way Out of Trouble
Fortunately SiteCatalyst provides an alternative solution – classification. This is a great feature that exploits the Fifth Axiom of Web Analytics: “Data that is different is counted as different.” In this case, the Task Name eVar includes not only the Task Name but everything we know about the user at the start of the task as well as at the completion of the task!
Using Omniture’s SAINT, these variables are keys that can be classified on the backend to represent all of the segments I have identified in my solution design. So instead of dozens of eVars I have one and now have reports that drill down from the task name. With additional discipline all the tasks can be reported and broken down in the same report!
The trick here is to declare the eVar with each event (they are after all “event variables” meaning variables specific to the event). Then classification ties the segments that include both all events. For example, task_name | user_senarios is declared at task Start and task_name | user_senarios | task_outcomes declared at task completion. Then through classification task_name and user_senarios that are known at the start can be attributed to task Completion along with the task_outcomes.
Now Pile On
The implementation approach is straight forward with what I call Piling On. All the parameter values that one would normally assign to different variables in a static tag (or AEC) are concatenated together (with proper separators) into one value assigned to an eVar or sprop. The key is to make the Key differentiate all the possible variations in segments you need.
Once you have defined the classification structure in SiteCatalyst (ADMIN > conversion > classification), you can retrieve the template and key values from SAINT in a tab delimited file; separate into your desired segments via Exel (via those cleaver separators you inserted at collection), and then upload back to SAINT. SAINT then classifies all of the data collected regressively (meaning all the data all the way back to the beginning) so you can change or modify segments through all report periods.
The result is that every visitor and visit gets assigned to proper segments, the segments are correctly correlated and None is Not an Option!!! In fact, None in this case means keys (eVar values) that have not been classified. So occasionally you need to retrieve the unclassified keys from SAINT (using the Null field option in the Export), rinse and repeat.
“None” is a telling symptom of a problem and “None is Not an Option” encapsulates an approach to diagnosing and resolving the problem. When reports are designed such that None is Not Option and “None” appears in a report, it becomes an important tool for diagnosing the problem. The resulting reports not only cover all user behaviors but with discipline allow the reports to address multiple questions and with extra diligence support multiple actions.
In the case of Omniture’s SiteCatalyst, the reports are available to a wider audience than what can be provided by Discover. Though most serious analysts work exclusively within Discover to avoid the rather arbitrary limits of SiteCatalyst, by providing strong reports in SiteCatalyst the work performed in Discover is even further empowered.
First, fragmented segments that are likely loosely coupled are now tightly correlated in both environments, allowing drill-down in SiteCatalyst that can be further broken down in Discover. Second, insights discovered in Discover can be more directly related to SiteCatalyst reports, allowing business owners to more quickly grasp and exploit the insight. Finally, the data mining is more effective once the data has been mind, because the data has the facts (keys) already rather than stitching the facts together from potentially disparate fields. Hence, Mind Before You Mine.