Recent Events

Capturing, Using, and Analyzing Data

Capturing Data Panel

Capturing Data Panel (L-R): Lei Jin, Dana Thomas, David Stern

Lei Jin, Electronic Resources Librarian, and Dana Thomas, Assessment Librarian at Ryerson University described an attempt to link demographic information to database use using OCLC’s EZProxy system that reports library usage for off-campus users (Ryerson is largely a commuter school).  They wanted to identify usage patterns by database usage, affiliation, and other characteristics and began by analyzing commercial EZProxy analyzers, but decided that they were not satisfactory for their purposes. So they partnered the university’s computer science department, registrar’s office, and computing services to produce the data. In busy months on campus, over 5 gigabytes of data had to be analyzed, cleaned, and anonymized in a complex and tedious process. Finally, these reports were generated:

Reports Generated

Reports Generated

Here is a sample of the results:

Database Usage by Faculty

Database Usage by Faculty

This graph shows that business users are using resources at a lower rate proportional to the size of the department. Other analyses showed the number of hits by faculty and department, top databases (vendors and publishers) used by department, average number of sessions by department. Some results were to be expected, but there were some surprises; for example, business students made heavier use of Ebrary than nursing students. Here are the overall results of the project; they provided guidance for future marketing efforts by the library:

Overall Project Results

Overall Project Results

Because of the volume of data and the number of providers, it was not possible to extract detailed usage information for specific books and journal articles used. Challenges of the project included messy data, difficulty of mapping vendor URLs to resources, interface design of the reports, and data being in multiple systems within the university.

David Stern, Library Director, Saint Xavier  University, described how personal and organizational repositories were built. Considerations include:

Considerations in Building a Knowledge Database

Considerations in Building a Knowledge Database

Tools for capturing personal data have historically existed on phones and other platforms, but linking them together was a challenges.  For research, the complexity increases.  Now, we have more integrated support that provides seamless functionality; for example, Zotero or IFTTT (If This Then That). IFFF permits linking to other tools to get the data in a single tool.  Organizations have scaling problems; then need controlled vocabularies and taxonomies and sharing files on a single server.

Some useful tools:

  • Zotero can capture citations, full-text articles, web pages, images, sound files, and some personal feeds, then annotate records with tags, thus producing an integrated repository.
  • Diigo can create groups of URLs and tag and share them.
  • Outwit captures URL links, images, or text, and creates a spreadsheet of links, and downloads the files linked to. It can also scrape unstructured data from a website and generate a spreadsheet from it. It learns from the data if desired.
  • Contactout is a simple browser extension that helps you find email addresses and phone numbers of anyone on LinkedIn.

RDA triplets can be used to build a relational database that can be used to analyze the data downloaded using these tools. Visualizations can be created to present the data that has been mined.

Don Hawkins
Conference Blogger

Comments are closed.