Strategies for Updating Citation Data [Updated]

To give you a better idea of how we try to get citation data up to date, I describe here the types of queries sent by the LIVE SHINE plugin to GS on the user’s behalf to update the citation data. Please send questions and sugestions.

The plugin sends two types of queries: Venue Queries (V-Queries) and Paper Queries (P-Queries).

V-Queries are queries sent to GS using a *conference name*  and an *year interval* as arguments.  The conference name we use is pre-stored in SHINE’s DB and corresponds to the name used in GS meta data records for the conference. In general, this name is the same we use as conference title in SHINE’s DB, but this is not always the case. In fact, we can change the conference name we use to issue V-queries. The year interval is the same selected by the user through the interface.   When the plunging receives the result of a V-query, it extracts the snippets from the answers pages and matches the title of each paper with the title of the papes we have in the database for the same conference. It then updates the citation count of the paper in SHINE DB.  An important issue here is that GS returns only the first 1000 top ranked papers, divided into answer pages with 20 snippets each. Thus, the plugin browses 50 answer pages, making 50 requests to GS. The limit of 1000 answers is per query set to GS and, thus, is independent of the year interval. It is the same for 1 year and for 10 years!  Also, the ranking tends to privilege the most cited papers. A consequence,  conferences with many papers are hard to have the complete set of papers updated (but they will, eventually!! see [P-queries]). Notice that issuing queries with a short year interval can potentially update many papers in the interval. For instance, in our database, ICSE has 1466 papers from 2005 to 2015. A single 10 years V-query will  update at most 1000 papers from the 1466. Thus, V-queries with shorter intervals, say 3-years, can potentially update more papers in the interval.  For instance, all 65 papers in the DB for ICSE in 2008 to 2010 are likely to be covered  in this example. On a extreme case, 10 one-year V-queries, one for each year from 2005 to 2015, can be used. Notice that, V-queries are only issued if more to 10% of the papers covered by the user query have not been updated for at least 2 months.

P-queries: Queries sent to GS using a single paper title. Currently, the plug n issues  P-queries based on filters marked to select a subset of the papers being shown in the interface.

This entry was posted in How To. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s