Archiving Blogs
Conceptually simple.
- listen to RSS feed for each blog being archived
- use workflow pipeline to:
- convert text to pdf file
- include source URL of blog post and other provenance information
- auto ingest pdf with provenance information into a repostory
- convert text to pdf file
- create one collection of pdf's per blog
Lateral thought
- use e-pub as archival format
- encode rdf triples for authorship and provenance information (see http://ptsefton.com/2010/07/07/awe-presentation-for-open-repositories-2010.htm for examples)
page revision: 0, last edited: 15 Jul 2010 06:14