April 22nd, 2009, by alex

Using the Google Analytics API - getting total number of page views

At long last, Google released the Google Analytics API.  The timing couldn’t be better, since I was just trying to get to some information through screen scraping… which is never fun.

The API is pretty easy to use, and other than a typo which slowed me down way too much, it didn’t take long to write a simple PHP script to get the total number of page views across all my Analytics profiles.  This is a quick tutorial for using the API for this simple purpose.  Also check out the official API documentation.

The basic steps involved are:

1. Authenticate the user and get a one-time token from Google
2. Exchange the one-time token for a session token, which does not expire
3. Retrieve and parse a list of the user’s Google Analytics accounts and profiles
4. Retrieve and parse the page view count for each profile
5. Done!

1. Authenticate the user and get a one-time token from Google

The first step is authenticating the user.  Google offers several authentication methods, and the simplest to use for my purposes is AuthSub, which asks the user to login on Google’s site, and sends the user - along with an authentication token - back to my script. This means I never have to directly handle the user’s login and password.  The link presented to the user can be something like this:

<a href="https://www.google.com/accounts/AuthSubRequest?next=http://www.alexc.me/pageviewcounts.php
&amp;scope=https://www.google.com/analytics/feeds/
&amp;secure=0&amp;session=1">Click here to authenticate through Google.</a>

(my blog mangles some of this code; until I get it sorted out, there’s a complete version of the script at the end of this post).

The “next” parameter in this link specifies where the user should be forwarded after authenticating - here, I set it to the URL of my script. Google forwards to the address in the “next” parameter, and adds a “token” parameter with the authentication token. This means the user will be sent back to something like

http://www.alexc.me/pageviewcounts.php?token=CNK******__8B

The authentication token is used by adding an “Authorization” field to the header of all the GET or POST requests sent to Google’s API. A simple way to do this in PHP, using the cURL library, is:

	function make_api_call($url, $token)
	{
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, $url);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		$curlheader[0] = sprintf("Authorization: AuthSub token=\"%s\"/n", $token);
		curl_setopt($ch, CURLOPT_HTTPHEADER, $curlheader);
		$output = curl_exec($ch);
		curl_close($ch);
		return $output;
	}

2. Exchange the one-time token for a session token, which does not expire

Now, the token returned above is only valid for one API call - our script will make several, so the next step is to exchange the one-time token for a session token, which does not expire. This can only happen if the “session=1″ parameter was set in the original URL which sent the user to Google.

	function get_session_token($onetimetoken) {
		$output = make_api_call("https://www.google.com/accounts/AuthSubSessionToken", $onetimetoken);

		if (preg_match("/Token=(.*)/", $output, $matches))
		{
			$sessiontoken = $matches[1];
		} else {
			echo "Error authenticating with Google.";
			exit;
		}
		return $sessiontoken;
	}

We now have everything we need to start using the Google Analytics API!

3. Retrieve and parse a list of the user’s Google Analytics accounts and profiles

Since the user can have a number of different accounts and profiles, and we need to know the IDs for these profiles before we can do anything, the first API call should retrieve the list of accounts and profiles:

		$accountxml = make_api_call("https://www.google.com/analytics/feeds/accounts/default", $sessiontoken);

As specified in the Google Analytics API docs, this should return an XML response similar to the following:

<?xml version="1.0" ?>
<feed xmlns='http://www.w3.org/2005/Atom'
  xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'>
  <id>http://www.google.com/analytics/feeds/accounts/liz@gmail.com</id>
  <updated>2008-09-13T16:12:49.000-07:00</updated>
  <title type="text">Account list for liz@gmail.com.</title>
  <link href="http://www.google.com/analytics/feeds/accounts/liz@gmail.com"
        rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
  <link href="http://www.google.com/analytics/feeds/accounts/liz@gmail.com"
        rel="self" type="application/atom+xml"/>
  <author>
    <name>Google Analytics</name>
  </author>
  <generator version="1.0">Google Analytics</generator>
  <openSearch:totalResults>4</openSearch:totalResults>
  <openSearch:startIndex>1</openSearch:startIndex>
  <openSearch:itemsPerPage>4</openSearch:itemsPerPage>
  <entry>
    <id>http://www.google.com/analytics/feeds/accounts/ga:4321</id>
    <updated>2008-09-03T10:55:54.000-07:00</updated>
    <title type="text">Darcy's Blog</title>
    <link href="http://www.google.com/analytics/feeds/accounts/liz%40gmail.com"
          rel="self" type="application/atom+xml"/>
    <dxp:property name='ga:accountId' value='12345'/>
    <dxp:property name='ga:accountName' value='Pride and Prejudice'/>
    <dxp:property name='ga:profileId' value='4321'/>
    <dxp:property name='ga:webPropertyId' value='UA-12345-1'/>
    <dxp:tableId>ga:4321</dxp:tableId>
  </entry>
  <entry>
    <id>http://www.google.com/analytics/feeds/accounts/ga:5555</id>
    <updated>2008-09-03T10:55:54.000-07:00</updated>
    <title type="text">Jane's Blog</title>
    <link href="http://www.google.com/analytics/feeds/accounts/liz%40gmail.com"
          rel="self" type="application/atom+xml"/>
    <dxp:property name='ga:accountId' value='12345'/>
    <dxp:property name='ga:accountName' value='Pride and Prejudice'/>
    <dxp:property name='ga:profileId' value='5555'/>
    <dxp:property name='ga:webPropertyId' value='UA-12345-2'/>
    <dxp:tableId>ga:5555</dxp:tableId>
  </entry>
 <entry>
    <id>http://www.google.com/analytics/feeds/accounts/ga:2222</id>
    <updated>2007-02-14T14:10:07.000-08:00</updated>
    <title type="text">Austen's Most-Adored Website</title>
    <link href="http://www.google.com/analytics/feeds/accounts/liz%40gmail.com"
          rel="self" type="application/atom+xml"/>
    <dxp:property name='ga:accountId' value='54321'/>
    <dxp:property name='ga:accountName' value='Jane Austen'/>
    <dxp:property name='ga:profileId' value='2222'/>
    <ga:webPropertyId>UA-54321-1</ga:webPropertyId>
    <dxp:tableId>ga:2222</dxp:tableId>
 </entry>
 <entry>
    <id>http://www.google.com/analytics/feeds/accounts/ga:3333</id>
    <updated>2007-02-14T14:10:07.000-08:00</updated>
    <title type="text">The Jane Austen Bookstore</title>
    <link href="http://www.google.com/analytics/feeds/accounts/liz%40gmail.com"
          rel="self" type="application/atom+xml"/>
    <dxp:property name='ga:accountId' value='54321'/>
    <dxp:property name='ga:accountName' value='Jane Austen'/>
    <dxp:property name='ga:profileId' value='3333'/>
    <dxp:property name='ga:webPropertyId' value='UA-54321-2'/>
    <dxp:tableId>ga:3333</dxp:tableId>
 </entry>
</feed>

This can be processed through whatever XML means you’re comfortable with. I’m using the following PHP code to extract the parts I need into an array:

	function parse_account_list($xml)
	{
		$doc = new DOMDocument();
		$doc->loadXML($xml);
		$entries = $doc->getElementsByTagName('entry');
		$i = 0;
		$profiles = array();
		foreach($entries as $entry)
		{
			$profiles[$i] = array();

			$title = $entry->getElementsByTagName('title');
			$profiles[$i]["title"] = $title->item(0)->nodeValue;

			$entryid = $entry->getElementsByTagName('id');
			$profiles[$i]["entryid"] = $entryid->item(0)->nodeValue;

			$properties = $entry->getElementsByTagName('property');
			foreach($properties as $property)
			{
				if (strcmp($property->getAttribute('name'), 'ga:accountId') == 0)
					$profiles[$i]["accountId"] = $property->getAttribute('value');

				if (strcmp($property->getAttribute('name'), 'ga:accountName') == 0)
					$profiles[$i]["accountName"] = $property->getAttribute('value');

				if (strcmp($property->getAttribute('name'), 'ga:profileId') == 0)
					$profiles[$i]["profileId"] = $property->getAttribute('value');

				if (strcmp($property->getAttribute('name'), 'ga:webPropertyId') == 0)
					$profiles[$i]["webPropertyId"] = $property->getAttribute('value');
			}

			$tableId = $entry->getElementsByTagName('tableId');
			$profiles[$i]["tableId"] = $tableId->item(0)->nodeValue;

			$i++;
		}
		return $profiles;
	}

4. Retrieve and parse the page view count for each profile

All that’s left now is going through each account and getting the number of pageviews. This is done through a call to https://www.google.com/analytics/feeds/data. The parameters of interest are “ids”, which should correspond to the dxp:tableId node in the above XML; “metrics=ga:pageviews”, which specifies that we’re interested in page views; and start-date and end-date. There is more information on the possible parameters in the API docs.

I am simply making an API call for each profile - I’m keeping the code simple here, but Google currently has a limit of 100 requests every 10 seconds, so any production code should consider the case of accounts with large numbers of profiles. I haven’t yet tried specifying more than one profile id in the “ids” parameter.

		$totalviews = 0;

		foreach($profiles as $profile)
		{
			// For each profile, get number of pageviews
			$requrl = sprintf("https://www.google.com/analytics/feeds/data?ids=%s&amp;metrics=ga:pageviews&amp;start-date=2007-06-01&amp;end-date=2009-04-21", $profile["tableId"]);
			$pagecountxml = make_api_call($requrl, $sessiontoken);

			$doc = new DOMDocument();
			$doc->loadXML($pagecountxml);

			$metrics = $doc->getElementsByTagName("metric");
			$views = $metrics->item(0)->getAttribute('value');
			$totalviews = $totalviews + $views;

			echo $profile["title"] . ": " . number_format($views) . "<br />";

			// echo $output2."<br />";
		}

		echo "Total views: " . number_format($totalviews);

5. Done!

And that’s it! With some minor tweaks, the entire script is:

Right-click and “save as”

You can try the live version here:

Clicky clicky

Note that this will send you to a link from Google asking you to grant a session token to the script, and it will show the information in YOUR Google Analytics account; the script doesn’t actually store the token, but if it did, that would provide it with unlimited read access to the data in your Google Analytics acccount, until you revoke the token from your Google Accounts page. If you’re not comfortable with this, download the script from the link above instead, and run it on your own server.

Coming up next: converting this script into a WordPress plugin.


59 comments Subscribe Comments

  1. Yeah parse_account_list doesn’t return a god damn thing


Add your comment