The National Library of New Zealand and British Library have announced the release of the Web Curator Tool as an open-source project. The tool, and its manuals, FAQs, mailing lists, source code, developer documentation, and other information, including a presentation, are available from http://webcurator.sourceforge.net/.
The Web Curator Tool is a tool for managing the selective web harvesting process. It is designed for non-technical users in libraries and other collecting institutions who need to capture web material for archival purposes. The tool’s workflow encompasses the following tasks:
- Harvest Authorisation: seeking and recording permission to harvest web material, and to make it accessible to the general public.
- Selection and scoping: determining what material should be harvested, be it a web site, a web page, a partial web site, a group (or collection) of web sites, or any combination of these.
- Scheduling: determining when a harvest should occur, and when it should be repeated.
- Description: describing harvests with basic Dublin Core metadata, and other specialized fields (or a by a providing a reference to an external catalogue).
- Harvesting: the Web Curator Tool will download the selected web material at the appointed time using the Internet Archive’s Heritrix web crawler — each installation can have multiple harvesters on different machines, each which can perform several harvests simultaneously.
- Quality Review: tools are provided for making sure the harvest worked as expected, and correcting simple harvest errors.
- Endorsing and submitting: if the harvest was a success, it is endorsed then submitted to an external digital archive.
[See my post on web archiving and quality assessment tools]