Good Analytics Starts with Good Data
Analytics have long been an important part of content management, with companies like Omniture and Quantcast building nice businesses helping media sites understand how people are finding and using their sites. In contrast, the analytics focused on the revenue side of the business, particularly the display advertising, are surprisingly primitive.
One big reason for this is how much data is thrown away. Take DoubleClick DART for Publishers (DFP), for example. If you are a DFP user, you have access to some very basic information, like delivery count for each ad yesterday. But if you want to know which zones of your site a run-of-site ad was delivered on, you’re out of luck, they don’t keep that around. If you want that kind of data, you need to turn on “data transfer” and analyze the logs yourself, which despite their protestations of “it’s your data”, is an extra cost service (more on that below). And here’s a little-known fact about DoubleClick’s data transfer: it still doesn’t give you all the data. The DFP ad server strips out any publisher-specific tag attributes that aren’t used to target an ad. So, if you have your site divided into categories, with a key-value in the tag specifying the category, even with the detailed logs you still have no idea what categories a run-of-site ad was run on. They do have a workaround: duplicate your entire tag into a new parameter “u=”, with a new set of delimiters, and then you can re-parse it on the other side. In summary: re-tag your entire site, to get back “your data” that you are paying them to give you. The other ad servers are not much better.
If you ask your ad serving provider for the raw logs, they will claim that this is a lot of data, so they need to charge you for storage, bandwidth, etc. Let’s take a look at cost for a minute. If you are a pretty large site, you might generate 10GB of logs a day. For a reasonable comp, let’s look at what it would cost to use Amazon S3 for this service. 10GB of data transfer each day is $1/day to get it in, and $1.70/day to get it back out. To store it for 30 days would be $1.50. So, about $150/mo to move and keep 30 days of data, at the top end, with a healthy margin for Amazon. Remember that you pay extra to Amazon for the flexibility of scaling up and down easily, a dedicated hosting center would almost certainly be cheaper. So, if your ad serving provider is charging you much more than that (one DoubleClick customer I know was quoted $3500/mo for about 100MB of data per day, they negotiated it down but are still paying way too much), you should push back. Memo to ad server companies: storage is a lot cheaper than it was 10 years ago.
Ad sales and operations groups are starting to realize that they need detailed analytics to do their jobs. The first step of that is good underlying data, and getting that is way too hard. This needs to change.
August 25th, 2008 at 4:18 pm
Agreed.
A huge number depend on Google Analystics, which barely captures 70% of the impressions. Other services have similar problems.
With multi-tab, window, panel, media use of browsers – it's becoming harder to gather relevant data. My estimates show that most sites are showing 50% too many UU. On top of that, there is little relevancy of half the hits to true use.
Big problem when the fundamental data is questionable.