This week I'm am just thinking out loud about what do you do about Stats and digital collections particularly if your collection is dispersed throughout the web and not necessarily housed locally.
The Association of Research Libraries (ARL) stats have come up once again this year. They are a big deal if you want to maintane your standing for bragging rights and accredation but as usual they are a challenge.Historically, ARL stats were about how many books and journals your library held. Now entering the digital realm we are asked about how many files we have. The challenge begins in one respect because we never know what they are going to ask for since the questions can change from year to year. We then have to start throwing these numbers together in a near panic. But how meaningful are they? How do we determine what files (how many there really are).
Currently we disseminate a lot of material via web services such as the
Internet Archive,
YouTube,
Vimeo,
iTunes U, and of course provide syndication through our blog and
Feedburner (for podcast optimization). We also use
Google Analytics to track statistical information about viewership and hits on our web pages and catalog. However, that only works for things we have control over and that we have already hooked into Google Analytics and let run over time.
Each of these services provide their own specific forms of statistical feedback. Google owns YouTube and Feedburner and both provide dynamic graphic maps of users and views. Feedburner shows you subcriber info such as what tools they use to get to subscribe (Google Feedfetcher, iTunes, Sage etc.) and gives you a rough sketch from the past day to the past year of your traffic. Vimeo gives you a basic view count and like YouTube will even tell you who is subscribed to your materials through their interface.
The Internet Archive also provides a download count on your item page but that doesn't always relfect the number you see in the browse interface. I've noticed the same information for files that filter from the Internet Archive through Feedburner's RSS engine don't always match as well. So if you use an outside party to host and assist in the serving of your materials, how accurate are their statistics. Are they padding or hiding results? Is the statistical analysis outdated?
On a simple file count issue, how do you handle derivatives? On a basic local level you might have an archival tiff or JP2 file with a corresponding set of transport versions. Do you count them all or just the ones made public? If the files are essentially the same content but modified for playback ability are they counted together or separately? Our files loaded externally to the Internet Archive also have format derivatives created by the IA itself. Do we count each of those and what about the separate files we additionally load to iTunes U.
We were recently asked asked about a particular digital collection we put online several years ago as part of a collaborative grant with GWLA which would have been no problem except after we put in our share of content (housed on a local server) we did nothing to promote it and certainly were not tracking the searches within our collection. There was no way for us to tell what was going on with it. Our catalog had a link to the material but only pointed to the
consortium's main web page. (Note: The consortium uses an OAI-PMH Open Archives initiative protocol for metadata harvesting to access our materials in their search engine). We can see if anyone put in a search for the materials via our Google Analytics tracking of searches in the cataloge but as far as that goes it shows a big zero. So as far as we can tell the collection is an vastly unused assortment documents.
On a related note, we also are unable to retrieve statistics from our files we house in iTunes U. iTunes U servers are owned and operated by the Apple company but our the client brands our "channel" with our specific look and feel developed by ASU programmers. Unfortunately they have not had dedicated resources to hook into the API's which would give us stats.
Statistical analysis of digital collections is far, far more complex than counting bound materials on a shelf and the number of times they are checked out. We have to take into consideration servers, hits, views, downloads, searches, and file derivatives just to skim the surface. What does it all really mean and what is the information that really gives you the best feedback about your collections and their usefulness? The lesson learned here is to make sure when you are designing and developing your online collections that at some point you are going to be asked for stats and you had better have a system in place to provide real data. Of course anticipating exactly what those questions are is key.