View on GitHub

IRStats2

Statistical tools for EPrints

Installation Configuration API

API

This section presents a few examples on how to get data out of IRStats2 for embedding data on pages or for re-use in analysis scripts (for instance).

There are two ways to get data out:

Core concepts

Datatype

Which data to provide: IRStats2 also the processing of any data on your repository. The typical use of IRStats2 is however for usage statistics so this is the main dataset. But data on deposits, open access, full text (etc) are also processed. Some repositories even include data from scopus (citation counts).

Main datatypes:

Sets

By default, IRStats2 returns data over the entire repository ie. the entire set of eprints is assumed. You can however restrict which "set" to use: the publications of an author, of a university division, of a subject, etc.

Dates and ranges

You can also restrict by dates or by a range. By default, all the stats are returned without any dates restrictions.

Dates can be set as YYYYMMDD or YYYY-MM-DD or YYYY/MM/DD (eg. 20140101, 2013-11-04 etc). Dates is a hash containing two keys: from and to (either can be omitted to say: from that particular date, or up to that particular date).

Ranges follow a %d%c format and the upper limit is "now" or "today", for instance:


Only "m" (months), "d" (days) or "y" (years) may be used. You can see that 12m is the same as 1y.

Groupings

This tells IRStats2 how to group data and is generally only used for things like "give me the TOP eprints", "give me the TOP authors".

So having a "grouping" set to "eprint" means the top eprints. If set to "authors", the top authors etc. The grouping must be a valid set except for when it equals to "eprint".

Misc

It is possible to limit the amount of records being returned (for when this is relevant: if you want the top downloads, since the beginning of time, then you'd only get one data row back, which is that count). But for queries which ask for, say, the top authors, it is then interesting to be able to get only the first 10 authors. 10 here is the limit.

It is also possible to ask IRStats2 to return certain data field in queries. For top eprints, you generally want the "eprintid" field. To draw timeline graphs (eg. evolution of downloads over-time), you'd want the "datestamp" field. More examples are illustrated below.

Data from scripts

Main API


# get the IRStats2 handler, required to query IRStats2
my $handler = $repo->plugin( "Stats::Handler" );

# ask IRStats2 to show debug statements (SQL queries)
$handler->debug(1);

# Create a Context object
my $ctx = $handler->context( { datatype: "downloads" } );

# Retrieve data rows
my $data = $handler->data( $ctx )->select();

# How many rows returned:
printf "I got %d data rows back\n", $data->count;

# Get stats for divisions "uos-ecs":
$ctx->set( { set_name => 'divisions', set_value => 'uos-ecs' } );

# Get stats over the last 6 months:
$ctx->dates( { range => '6m' } );

# Get stats between 1st January 2012 and 31st March 2012:
$ctx->dates( { from => '20120101', to => '20120331' } );

# Data may be exported (see Stats/Export/ for a list of currently supported plug-ins):
my $export = $repo->plugin( "Stats::Export::CSV" );
$data->export( { export_plugin => $export } );

Full Examples

Actually those are not really full examples. They assume you can write the beginning of a PERL script and that you have already instantiated the Stats Handler (cf. above) as $handler.

# How many downloads in total over the entire repository

my $ctx = $handler->context( { datatype => "downloads" } );
printf "I got %d downloads\n", $handler->data( $ctx )->select->sum_all;

# How many downloads in 2013 over the entire repository

my $ctx = $handler->context( { datatype => "downloads", range => "2013" } );
printf "I got %d downloads\n", $handler->data( $ctx )->select->sum_all;

# The top 5 EPrints over the entire repository

my $ctx = $handler->context( { grouping => "eprint", datatype => "downloads" } );

my $stats = $handler->data( $ctx )->select( fields => ["eprintid"], limit => 5 );

foreach( @{ $stats->data } )
{
        printf "EPrint %d got %d downloads\n", $_->{eprintid}, $_->{count};
}

# The top 10 Subjects (let's assume LoC) for deposits (not downloads!!)

my $ctx = $handler->context( { set_name => "subjects", datatype => "deposits" } );

my $stats = $handler->data( $ctx )->select( fields => ["set_value"], limit => 10 );

my $i = 1;
foreach( @{ $stats->data } )
{
        printf "%d) %s with %d items deposited\n", $i++, $_->{set_value}, $_->{count};
}

# The top 5 downloaded EPrints for LoC Subject "D1"

my $ctx = $handler->context( { set_name => "subjects", set_value => 'D1', datatype => "downloads" } );

my $stats = $handler->data( $ctx )->select( fields => ["eprintid"], limit => 5 );

my $i = 1;
foreach( @{ $stats->data } )
{
        printf "%d) EPrintd %d with %d downloads\n", $i++, $_->{eprintid}, $_->{count};
}

Embedding data

This is similar to retrieving data from scripts (cf. section above) but with a few extra options:

Then there exists a number of options proper to each View plug-in. See the provided examples below.

Graphs

The typical example is to embed the global downloads graph. This is usually the first displayed item on the IRStats2 main report page (/cgi/stats/report).

This will basically insert the downloads graph into the "mygraph" div element. Note that it's using the supplied
     "irstats2_googlegraph" CSS class.
      
      Graph options:
      - graph_type: either "column" or "area"
      - show_average: either 1 or 0 - displays the average graph
      - date_resolution: either "year", "month" or "day" - groups data by year, month or day (be careful: selecting day may genearate LOTS of data points)


<div id="mygraph" class="irstats2_googlegraph"/>

<script type="text/javascript">
document.observe("dom:loaded",function(){
         new EPJS_Stats_GoogleGraph( { 
                'context': { 'datatype': 'downloads' }, 
                'options': { 'graph_type': 'column', 'container_id': 'mygraph', 'view': 'Google::Graph', 'show_average': '1', 'date_resolution': 'month' } 
        });
});
</script>

Tables

The example below displays the top 10 downloaded eprints in the repository.

This will insert the top table into the "mytable" div element. Note that it's using the supplied
     "irstats2_table" CSS class.
      
      Table options:
      - top: the top "thing" to display - similar to the "grouping" parameter when using scripts
      - limit: the max number of items to retrieve
      - show_count: 1 or 0 - display the counts or not
      - show_order: 1 or 0 - display the ordering (1,2,3...) or not
      - show_more: 1 or 0 - shows the "show more" options or not (to retrieve more results)
      - human_display: 1 or 0 - separate 1000 with a comma (as done in English): 10000 becomes 10,000

<div id="mytable" class="irstats_table"/>

<script type="text/javascript">
document.observe( "dom:loaded", function() {

        new EPJS_Stats_Table( {
                'context': { 'datatype': 'downloads' },
                'options': { 'container_id': 'mytable', 'top': 'eprint', 'view': 'Table', 'limit': '5' }   
        } );

});
</script>

Misc

Graphs and Tables are the most common displays - but there exists a few other ones which I let you explore. The javascript classes are in 90_irstats2.js and the associated PERL Class in Stats/View/

The View prefixed by "Google" means that they are rendered by the Google Chart Javascript library. Important note: no data is sent to Google!! The data is, instead, drawn by the browser client using SVG.