OAI-PMH feed Agglomeration
Description of problem
The National Library of Australia can only harvest a single OAI-PMH feed from us. They can accept this as a feed or as a single file for upload using Ceres. The NLA process is described in the attached document.
We have multiple repositories that we wish to expose. Each repository creates its own OAI-PMH feed. To provide a single we need to produce a single agglomerated feed for the NLA post processing services.
Approaches
- use a feed concatenation tool such as Moai
- develop an in house script to combine feeds. Such a script must be flexible and able to provide multiple feeds from multiple inputs. As the technology is basically XML feed manipulation and concatenation should be relatively straightforward.
Solution
Develop a java servlet to to a periodic http GET operation on the collection OAI-PMH feed and then do simple text concatenation and re-export the feed
Advantages
- Relatively simple and quick to implement
- Original OAI-PMH feeds remain available for separate harvest
- no requirements for specific VM for agglomeration service
Disadvantages
- Potential lack of flexibility if we wish to do multiple agglomerations providing separate feeds specifying a suitable url allows a custom agglomeration
- need to run multiple servlets for multiple feed generation custom url specification bypasses this. May still be appropriate for performance reasons
- out of the box servlet did not support resumption tokens need to modify servlet to buffer and respond to a resumption token
Technical
- Servlet is currently located at http://dspace.anu.edu.au/FeedAggregatorServlet/
- Input feeds are arbitrarily numbered as 1, 2, 3, et seq
- To display first three feeds use http://dspace.anu.edu.au/FeedAggregatorServlet/1/2/3.xml or to simply display feeds 1 and 3 use http://dspace.anu.edu.au/FeedAggregatorServlet/1/3.xml etc . The XML produced is automatically validated.
page revision: 5, last edited: 19 Jul 2010 05:02