Web Analytics

Digital Marketing Tutorial

Executive Summary

The phrase “Web Analytics” can refer to several different topics, anything from putting a page counter on several key pages on your Web site to the use of sophisticated Web server log analysis software to analyze Web site visitor navigation patterns on your site.

There are basically two approaches to collecting on-site Web analytics data. The first, “page tagging”, uses a small bit of JavaScript code placed on each Web page to notify a third-party server when a page has been viewed by a Web browser. An example of the use of JavaScript-based page tagging is the Google Analytics tool we’ll discuss in some detail below.

The second and more traditional approach to Web analytics, is “log file analysis”, where the log files that Web servers use to record all server transactions are also used to analyze Web site traffic. This topic is also discussed in some detail below in the “Log File Analysis” section.

Overview

The following section was taken from the Wikipedia “Web Analytics” entry:

On-site web analytics technologies

Many different vendors provide on-site web analytics software and services. There are two main technological approaches to collecting the data. The first method, log file analysis, reads the log files in which the web server records all its transactions. The second method, page tagging, uses JavaScript on each page to notify a third-party server when a page is rendered by a web browser. Both collect data that can be processed to produce web traffic reports.

In addition other data sources may also be added to augment the data. For example; e-mail response rates, direct mail campaign data, sales and lead information, user performance data such as click heat mapping, or other custom metrics as needed.

Web server log file analysis

Web servers record some of their transactions in a log file. It was soon realised that these log files could be read by a program to provide data on the popularity of the website. Thus arose web log analysis software.

In the early 1990s, web site statistics consisted primarily of counting the number of client requests (or hits) made to the web server. This was a reasonable method initially, since each web site often consisted of a single HTML file. However, with the introduction of images in HTML, and web sites that spanned multiple HTML files, this count became less useful. The first true commercial Log Analyzer was released by IPRO in 1994.

Two units of measure were introduced in the mid 1990’s to gauge more accurately the amount of human activity on web servers. These were page views and visits (or sessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes. The page views and visits are still commonly displayed metrics, but are now considered rather unsophisticated measurements.

The emergence of search engine spiders and robots in the late 1990’s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.

The extensive use of web caches also presented a problem for log file analysis. If a person revisits a page, the second request will often be retrieved from the browser’s cache, and so no request will be received by the web server. This means that the person’s path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

Page tagging

Concerns about the accuracy of log file analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or ‘Web bugs’.

In the mid 1990’s, Web counters were commonly seen — these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.

The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits.

With the increasing popularity of Ajax-based solutions, an alternative to the use of an invisible image, is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company. This is in some ways flawed by browser restrictions on the servers which can be contacted with XmlHttpRequest objects.

Log file analysis vs page tagging

Both log file analysis programs and page tagging solutions are readily available to companies that wish to perform web analytics. In some cases, the same web analytics company will offer both approaches. The question then arises of which method a company should choose. There are advantages and disadvantages to each approach.

Advantages of log file analysis

The main advantages of log file analysis over page tagging are as follows:

  • The web server normally already produces log files, so the raw data is already available. To collect data via page tagging requires changes to the website.
  • The web server reliably records every transaction it makes. Page tagging relies on the visitors’ browsers co-operating, which a certain proportion may not do (for example, if JavaScript is disabled, or a hosts file prohibits requests to certain servers).
  • The data is on the company’s own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later, use several different programs, and analyze historical data with a new program. Page tagging solutions involve vendor lock-in.
  • Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is useful information for search engine optimization.

Advantages of page tagging

The main advantages of page tagging over log file analysis are as follows.

  • The JavaScript is automatically run every time the page is loaded. Thus there are fewer worries about caching.
  • It is easier to add additional information to the JavaScript, which can then be collected by the remote server. For example, information about the visitors’ screen sizes, or the price of the goods they purchased, can be added in this way. With logfile analysis, information not normally collected by the web server can only be recorded by modifying the URL.
  • Page tagging can report on events which do not involve a request to the web server, such as interactions within Flash movies, partial form completion, mouse events such as onClick, onMouseOver, onFocus, onBlur etc.
  • The page tagging service manages the process of assigning cookies to visitors; with log file analysis, the server has to be configured to do this.
  • Page tagging is available to companies who do not have access to their own web servers.

============================

Page (JavaScript) Tagging / Google Analytics

YouTube Video: “Authors@Google – Avinash Kaushik“. Avinash Kaushik is the Google Analytics Evangelist and Author of, “Web Analytics 2.0“. An excellent, and painless (although a bit dated), overview of Web Analytics by one of the true innovators in this field. The video is 55 minutes long, so I suggest viewing/absorbing it in 10-15 minute segments.

============================

Page Tagging

A great way to gain an understanding of page tagging – the technology as well as how it’s implemented – is to take a very close look at Google (Web) Analytics.

In order to do this, I’m going to assume that you have a Web site (personal or business) and that you know how to access and modify the pages on that Web site. If you do NOT have these skills or Web site accessibility, then the person/company who does these tasks for you can easily follow these directions.

The quickest way to get up to speed on Google Analytics is to take the tour of their free service. Click on the following link (make sure your sound is turned on) to begin the tour:

Google Analytics Overview

Now let’s go to the Google (Web) Analytics Support Center and follow their instructions for creating, installing and using their Analytics tools:

Google Analytics Support Center (www.google.com/support/analytics/)

============================

Log File Analysis

What I describe here is how I used to use the Unica NetTracker “On Demand” Web Server Log Analysis Service  (Unica is now part of IBM) to monitor key Web site traffic statistics on a monthly basis, as well as several other page-specific parameters I monitor using this service.  (We switched to Google Analytics on this site in mid-2009.)

You can get a comprehensive, linked list of commercially available Web server log analysis software packages on the following Open Directory (dmoz.org) page:

Computers: Software: Internet: Site Management: Log Analysis: Commercial

I oversee the marketing activity on a fairly large commercial Web site. For a fee, the Web hosting company for this site used to provide Web server log analysis using NetTracker. Here’s a list of the statistics I kept track of on a month-to-month basis:

Monthly Statistics:

Number of pages viewed:

Number of estimated visits:

Number of unique visitors:

Weekly, Daily Statistics (Monthly Averages):

Number of pages viewed per day:

Number of pages viewed per visit:

Length of visit (minutes):

Number of visits per day:

Number of visits per week:

Ave. # unique visitors per day:

Ave. # new visitors per day:

Ave. # repeat visitors per day:

Ave. visitor repeat rate:

All of these statistics are generated by the NetTracker “On Demand” Service as it analyzes the Web site server logs (“Web logs”) for this site.

Quite simply, what I am looking for in these set of statistics on a month to month basis is growth – growth in number of pages viewed, growth in total number of visitors, growth in number of unique visitors, growth in number of new visitors, growth in average amount of time spent on the site, etc. (Note: much of this measuring is made possible by placing “tags” or “cookies” on your Web site visitors’ computers when they first visit the site, so these stats are vulnerable to mistakes if visitors regularly delete these cookies on their computers. Because deletion of cookies is not a common procedure for most computer owners at this point, the month-to-month stat comparisons I use are valid for the type of growth patterns I’m looking for.)

Another important use of this Web server log analysis software/service is the ability to observe Web site traffic on a page-by page basis. Let me give you a recent example.

On this particular Web site that we used to monitor with NetTracker, the most important visitor activity we encourage is requesting more detailed information on the highlighted products. If visitors want to perform this activity while on the site, as you would expect most Web visitors would want to do, there is a specific Web page-based form they are asked to complete. Upon completion, they click on the “Submit” button, the form is sent in email form to our email server, and a “Thank You” page is displayed to the visitor so that s/he knows that the form was successfully submitted.

Of course, we are never satisfied with the total number of Web site visitors who complete this process, so we used NetTracker analysis results to look at the number of visits to the “Request More Info” Web page and compared that to the number of “visits” to the “Thank You for Requesting More Info” page. What we saw was the typical “shopping cart abandonment” pattern you have probably read about with online e-commerce sites. In other words, the Web site content was compelling enough to get a fair number of visitors to go the “Request More Info” form, but based on the number of “Thank You” page displays, we could see that many visitors “abandoned” the “Request More Info” exercise before completing it.

Based on this analysis, we guessed that one of the main causes of this visitor pattern was that too much of the info on the Info Request Form was required to be filled in before the form was considered completed and the submission process could work. We reduced by over half the required info on the form – the requested form info remained the same, but only certain info was required to be filled in. Since making that change, the number of completed forms submitted has gone up significantly.

Another important use of this software/service is helping you understand where your visitors are coming from and how they got to your site. In particular, you can not only see which search engines are sending you the most traffic, but you can also see what search phrases were used by those visitors to find your site.

Finally, because this NetTracker “On Demand” Web server log analysis service was “hosted” by the same company that hosts the Web site, I could access NetTracker statistics from any computer that has Internet/Web access.

Digital Marketing Tutorial Table of Contents Page