Visitors vs. Visits

One would think that Visitors and the Visits they make would go hand in hand, but in some cases that is not true as implemented in many web analytic solutions. With visitor metrics we are trying to understand how many and who is coming to our site. With visits we are trying to understand patterns in the visitor’s activity to answer when and how many times. Visits, also referred to as Sessions, have their place in web analytics but not as prominently as one might think. When sessions become the central focus of the analytics and reporting then there are distortions in how visitor behavior is viewed and understood. Caution: This is a no spin zone! Let us take a new and honest look at this fundamental concept in web analytic analysis.

Motivation

Sessions arise from a natural desire to understand how visitors are engaged with a web site. It goes beyond counting events to attempting to answer how long a visitor stays and interacts with the web content. In the real world observing real visitors in a lab for example, we would start our clocks when the visitor came to the web site, then note how long they stayed on each page and stop the clock when the visitor left the site, moved to another task, or got up to get a cup of coffee. In the reality of web analytics, very little of this is directly observable. In particular, we miss when the visitor’s attention is adverted elsewhere that ends the session.

To handle the end of session problem we use a technique originally used in processing web logs. We look for a gap of 30 minutes inactivity and then declare the last page viewed as the session exit page. The Web Analytics Association (WAA) Standards (pdf here) define session duration as measured from the time of the first page of the session (called the session entry page) to the time of the last page viewed by a visitor during the session. This metric assumes contiguous activity by the visitor where the amount of time spent on each page is the time between page starts and the last page (the session exit page) has no duration.

Another related measure is the session path length, which is the number of page views within a session. I also like to count the number of unique page or content loads during a session. These are detected when the hash-code of the Referer-URL and Request-URL pair combination change. I call this the content path length since it can apply to either main topic, the marketing message, or ad creatives served on a page. It gives an indication of the actual content that a visitor views vs the number of times the visitor views the same content within a session. The ratio of the session and content path lengths indicates the potential churn in the users navigating the web site (how often the visitor has to return to previously loaded content).

Session length and duration are useful standard measures when comparing engagement among web sites that may use different analytic tools since even the simplest tool can make these measurements. The average number of visits per visitor within a reporting period is a key metric for gauging loyalty as well plotting visit returns relative to day and week times may expose habits in visitor behavior that should be understood and used to enhance user experience. These are all valuable metrics associated with visit patterns. However in providing insight of the visitor’s true engagement there a number of short comings and it is useful to have these in mind before drawing conclusions and deciding actions.

Bounce and Single Page Sessions

The first problem is that the last page of a session is assigned zero duration for the view. As a result a session that has one page-view (session-length = 1) has a session duration of zero. The standard definition of a bounce, a visitor coming to a website and immediately leaving the site, is a single page session of zero duration. When WAA Standards Committee was attempting to define Bounce and Bounce Rate, they tried to find consensus among all of the web analytic tools rather than define a standard that none of the tools have implemented. There were no tools that measured the page view duration of single page session. Hence Bounce is equivalent to Single Page Session counts in the standard. The single page view – which is both a session entry and exit page – is attributed the bounce. The ratio of the page’s bounce count to its number of session entry page counts is the bounce rate for the page.

Bounce rates are markers, like when in my last checkup my High sensitivity CRP was twice what it should be and the report says I am at high risk. High risk of what? The doctor goes through a list of things that can elevate the measure and then says it is also a marker for risk of heart disease. Great! Is there another measure that can better quantify the risk? Bounce as it is defined is just a marker for a potential problem. Is there another measure that can better quantify the risk? Let’s see.

Like the doctor above, we need to list all of the situations (behaviors) that can give rise to what we are observing. Some situations may be good behaviors and other bad behaviors we want to change. For example, someone coming to a site from a search and getting the information that they needed (indication of search relevance) and leaving would be a good thing. Another user coming to the site seeing that this was not what she wanted and pressing the back button before the page even completes loading – a bad thing. Two things from these cases that would be good to know is how did they come to the site and how long did they stay. Also for those that stayed, where they happy with what they found (satisfaction). Bounce is more a direct measure of relevance than satisfaction. Relevance is meeting expectations of the visitor when he or she clicked the link by maintaining the contextual promise implicit in the referring link. Satisfaction is a measure that the visitor once considering the content eventually found what they were seeking. Let us deal with the additional factors that distinguish our two cases one at time. As you build out all of the various possibilities you may uncover other cases you want to distinguish such as long load times or click thrus to other sites.

Where the bounce’s came from can be easily done regardless of the analytic tool. For the pages with high bounce rates, segment the bounce counts by referrer domain or type of channel (search, email, internal, etc) to see if one of the domains or channels pops out. Here you might see that there is a propensity of bounces from specific referrals indicating a potential problem of content relevance for that specific channel. However this would not be the whole story. As in the search channel case, we would expect single page sessions. So how long was the page viewed?

To measure page duration, one needs to instrument the client browser to report events. There are three events that are important: page load start (when the browser begins to download the page HTML header), page load end (the 0nLoad() event handler on the Body tag indicating that all content has been loaded and rendered), and page unload (the onUnload() event handler on the Body tag indicating that the currently viewed page is about to be replaced by another page view). All these events can be observed and reported by an instrumentation script.

Bounce or Opportunity

If you instrument to capture and report the onLoad() event, then pages that don’t have this event most likely were interrupted, incompletely rendered or are a true bounce back before the page could complete the load. To further differentiate these cases, it is helpful to also capture the browser interval between page load start and load end. Both these events can be captured and reported efficiently and reliably so that this forms a method for differentiating zero session duration bounces from single page sessions with greater the zero duration while at the same time determine if there are load problems for the page. As a point of interest, using the page load event is how Yahoo! qualifies for IAB client side impression counts standard, where ad impressions are counted only for completely rendered page views on actual browser user agents. This involves billions of page views per day with imperceptible impact on user experience so there should be little concern from IT – that is not to say there won’t be but that’s another topic.

One can also capture and report the page unload event however this measure has lower efficacy of reporting because of the browser shutdown of the page prevents at times this event being reported. Though it may be worth collecting this data for any number of reasons, in this case an alternative would be to address visitor satisfaction. There are number of ways of gauging satisfaction. One way is to simply ask – was this information helpful, or give them an opportunity to rank the information. Another is to add a call to action that is appropriate for the content to encourage them make a two page session. For search channels perhaps provide alternative choices that the visitor can select to further focus their search leading them to specific information and providing better insight as to the intention of the visitor. In other words, if the visitor has allowed the page to complete loading then the single page session is an opportunity rather than a bounce. Careful that visitors doesn’t believe you are gaming them or your bounce rate will go up.

Session exit or continuation?

What happens if the visitor never really left the site but continued on after doing a number of other activities including viewing other sites? How would this show up in the data stream? At some latter time greater than 30 minutes, there will be a session start page that has the previous session exit page as a referral. From the external internal boundary axiom (4) this is an internal referrer page URL and the session is a continuation of a previous session that timed out. To be fair one should now decrement the session exit page count for the referring URL.

Typically this is not done in these measures. If you go to a page entry and exit reports in the session path group, these should be able to stitch together the sessions so that one gets a better view of how visitors actually exit the page and the actual count of page exits.

If you want to drive yourself really crazy – compare the session exit page counts for page with actual exit counts in the entry / exit page reports. They should be different because they are measured in two different ways. If not, then you need to look further and determine if the entry / exit page report only covers session paths which means that continuation is not considered, or that the tool has made corrections to the session exit page counts to cover session continuation. If the latter case is true or the counts are different then remove the session reports and rely on the path reports. Having both sets of reports will lead to confusion particularly if the counts are different. No one likes seeing the same thing with two different counts.

If the WAP is super conscience in enforcing session rules and maintaining consistency among the reports, then go look for another solution for you will not get useful insight in visitor path behavior with that tool. The primary reason is that sessions are an artificial construction that break the trail of evidence necessary to follow visitor behavior.

Alternative Kinds of Sessions

A method that groups pages and page views by visitor and time is called a sessionization method. There are many that can and in some cases should be considered. For example a web site session may track a visitor through all the properties of the site while a property session could be either the properties contribution to the web site session or its own path from another property. As the visitor winds in and out of a property during a site session, is it more important to understand the first time they come to the property (session entry page) or every time they come to the property from another property (introduction)?

If your site implements sessions where the visitor must re-authenticate themselves after suspension of activity – should these be 30 minutes or a different? In this the case the perhaps the analytic session should be the same as the site session.

One session that is useful in tracking how content is loaded or cached is the browser session. This is a session cookie that is assigned on the browser and removed when the browser is closed. The motivation here is that one can construct an accurate stack of the content that has been loaded from the web site and cached by the browser. Once the content stack has been established for a visitor then any of the content can be brought up by the visitor in any order through all of the various means of navigation available to the user – tabs, history, navigation, plug-ins, etc. This forms a session path that is more related to the site architecture than the user’s behavior and therefore aggregates better to understand site navigation issues independent of the near infinite ways that visitors can view content. Of course the browser session can become rather long in duration. The longest duration I found was over 500 days traced to a library in Dade County!

Session Entry Pages or Introductions?

One has to ask, is it really important to know the session entry page or is there something else that is more important that is captured by this measure? One of the reasons why the session entry page is important is that the visitor had to either select the page from their bookmarks or type in the URL or come to the site from another site. So capturing that user had to perform an intentional act to come to your site to initiate a session (not absentmindedly navigate through cached pages) is an important observation. This however can only be done using browser sessions not standard sessions.

What is important to capture is how the visitor came to the site and what did they do while on the site. For many sites where the visitor is expected to come and then leave, the session entry page represents the crossing of the web site boundary. As to where they came from and their intent, that is captured in the request header for the entry page that includes the referring page URL, user-agent, cookies as well as the request URL. This is called an introduction when the referring page URL is external to the requested URL (via Axiom 4). If the visitor remains within the site for the entire session, then the introduction and session entry page are the same. But who comes and navigates through sites like that any more?

Would it not be more important to capture and order every introduction regardless how they occurred during a session? Typically web analytic solutions do not track introductions out of the box. Referring domain reports are typically session based and linked to the session entry page. But you can declare them as events and treat them as conversions within your own custom reports. Then to answer the question as it pertains to your site, you can count the number of introductions (external to internal boundary crossings) that occur during a session. Taking the ratio of introductions to number of sessions initiated by external referrals (removing session continuations with internal referrals), if the ratio is equal to 1.0 indicates that Introductions and Session Entry Pages align and your tool should work just fine, but if it is greater than 1.0 then you are literally missing opportunities to target your customers. (If less that 1.0, your data is not conforming to the fourth axiom).

Touch Point Opportunity Funnel

Each introduction is an opportunity to market, inform, and engage the visitor. It is a touch point between the channel and your site or property. The stack of introductions for each visitor from the first time they came to the site to when they converted to a valued customer by purchasing or registering or performing the goal objective of your site represents the touch point path to the goal. If you relate these touch points to the events that define your conversion funnel then you can begin to understand how specific content and channels contribute to visitor’s path to conversion.

An important overall metric to compute is the average number of introductions for conversion. Research indicates that on average a user will perform 7 to 14 searches before committing. You can capture how many of these come to your site by segmenting by channel. If the channel has a length near one, then perhaps you are over optimizing limiting the channel to sure bets or closing terms. If a channel seems to convert but the number is larger then perhaps the channel contributes early in the visitor path and there is an opportunity to widen that part of the funnel to increase conversions at the end of funnel.

There are a great number of strategies you can pursue when this view of the data is available. This can be implemented in all of the major web analytic solutions but does require defining conversion rules, event to event attribution rules and custom reports. Once this is accomplished you will have alternative session reports where sessions are initiated by introductions and last as long as the next introduction. Events that occur within such as session are attributed to the introduction, which gives the visitor state at the time of the session start. In this scheme, the introduction becomes the precursor of the events within the session and in aggregate over many visitors gives the likelihood of the types of events occurring within the session or in other words, the conversion rate for the channel or marketing campaign. You will also have for each visitor the stack of introductions leading to conversion that supports more sophisticated attribution and segmentation analysis.

Most web analytic solutions today have converted to these view of sessions in support of search marketing, but it has only been recently and limited to their search marketing tools. So if you find that your web analytic tools are focused primarily on session measures and metrics that make it difficult to track all introductions, then see if the latest upgrade corrects problem; implement your own set of custom session reports; or look to other solutions that can handle reporting and analysis of custom dimensions and attributes.

Again I must say that my views on this may be controversial and at least contrarian to what is presented normally. I know this in part because time spent with all the major web analytic providers discussing these issues as they pertained to Yahoo! marketing channels including search and display. Also from discussions with the Web Analytics Association (WAA) Standards Group we have tried to identify the common ground of web analytic solutions, and have had to spend much time on session based measures and metrics.

However the purpose of this series is to present how behavior is extracted from the data and look behind the measures and metrics that have become standard practice. It is important that any Web Analytic Professional acting as the scout or tracker for marketing or business have an understanding of how to bring out the most reliable information from all the evidence in the data itself regardless of these standard practices. The intent is to build best practices. These unfortunately don’t come right out of the box.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31