Real Data Is Always the Best When It Comes to Analysis and Benchmarking

I’m a big fan of Google Analytics. Nay, I’m a huge fan.

It’s become a key tool for measuring all sorts of data for my university’s website. They weren’t tracking much with it before I got here, save general traffic and trends, and now just about everything we do has a campaign attached to it, as well keywords and medium. It really has allowed us to see, especially with our online advertising efforts, what’s working and what’s not.

This data is shaping our goals and processes as we redesign and roll out a CMS. Real data, such as what you get from Google Analytics or other tools such as Webalizer that parse your actual server logs, is always the best tool to use.

If you use other sites like Alexa or Compete as your traffic monitoring tools, you’re getting a decent idea of traffic, but it’s not real data. Why? Because those sites use a small sample of the total web, not all the visits your site gets.

Alexa, for example, gets its data from users who have installed its toolbar. I haven’t installed that toolbar. Have you? It follows that Alexa uses a relatively small sample size to make its best guesses as to how the rest of the web surfs. Same goes for Compete.com. Small sample size doesn’t yield accurate data.

Here’s why this data is inherently flawed.

This past Friday, I went to Alexa and got the traffic rankings for this website as well as my University. First, my blog:

Alexa has this site with a 107,164 ranking, and it says its the 54,714th most visited site in the US. Not bad, eh?

Now let’s look at the stats for John Carroll University. Alexa gives it a rank of 220,515 and says its the 59,545th most visited site on the web.

Wait a minute. That’s not right. This site absolutely does not get more visitors and page views than John Carroll University. I know because the real data proves it. JCU does traffic twenty to thirty times what this blog does.

Compete gets it a little better, but its still a guesstimate, since its not measuring real traffic.

One other site that people use is Quantcast. It too makes its own best guess, but with a twist. If you want, you can sign up with Quantcast and install a little javascript bug on your site and Quantcast will measure real traffic data for your site. Here’s the Quantcast report for this site, with real data.

The point is if you want to do benchmarking of your site with other sites, use real data. I know that we don’t always have access to web stats and data from say, our respective comparison group of schools, but that’s changing. The first place this is changing is in Google Analytics.

From inside Google Analtyics, you can do some benchmarking of your site’s traffic against other sites of similar size. Google describes it as:

Based on the number of visits each site receives, sites of similar sizes are grouped together under three classifications: small, medium, and large. This way, you can compare yourself to other similarly sized sites. You are not able to view benchmarking data for sites in other size classifications.

This type of benchmarking is somewhat helpful, but there’s no telling what type of sites you’re looking at. They could be blogs, corporate sites or e-commerce stores. What we need is to be able to drill down.

If you open the category list on the benchmarking page, you can drill down a few levels to select education. To see the “colleges and universities” section, choose Law and Government, then Education and then College and Universities. Once you do that, you will see benchmarking data that’s quite a bit more relevant. You can now benchmark against college sites of a similar size. You don’t know who they are, or if they’re in your comparison data set, but its better than using Alexa’s or Compete’s best guess.

The other big thing I’m looking forward to is Karine Joly’s Higher Ed Analtyics project, which she’ll be kicking off at the EduWeb conference this week. She’ll be collecting key metrics from higher ed institutions and sharing that data. That means we’ll be able to benchmark against real data from other schools, not best guesses. I sat in on a talk that Karine gave about this a few weeks ago at the EduComm conference and I’m really excited to see what comes of it.