|
Overview
OAI
is becoming widely accepted and many archives are currently or
soon-to-be OAI-compliant. A federated search service as
efficient as Google, which provides a unified interface to all
the libraries, is useful to a wide variety of audience. Google
does an incredible job at providing discovery services of the
'shallow' web' to the general public, we envision a similar
quality, sustainable, free discovery service for students and
researchers for parts of the 'deep' web. The parts of the deep
web we refer to in this vision are digital libraries and
collections that are exposing their metadata using OAI-PMH
(Protocol for Metadata Harvesting). A high performance federated
search service that exploits the resources of a Grid will make
available a large amount of information which is distributed
amongst heterogeneous digital libraries. A search user will be
able to access a research paper, preprint, a technical report,
an image of a great painting, or a performance of a musical
piece in a few seconds from thousands of libraries scattered all
over the world. As part of this project we propose to build a
testbed that will use 3 grid nodes to perform the high-latency
tasks of harvesting and indexing from 3 data providers. We will
use the grid also to transmit these indices and metadata to a
small cluster (3 nodes) of search engines each of which will be
working on one or more indices it receives from the harvesting
nodes.
We
will develop the software tools to:
-
Adapt
existing OAI-PMH harvesting (Arc) and Lucene indexing
software to the grid
-
Deploy
a cluster to do parallel, high performance search based on
Lucene engine
-
Develop
software support to move indices and metadata between low
and high latency nodes.
|