ANDS strategic requirements

Strategic Aims

  • enable access to research data hosted at ANU
        • identify research data sets
        • identify appropriate access mechanisms based on dataset characteristics
        • align with other ANDS initiatives (esp CSIRO)
        • separate from standard ANU repository solutions for
              • scholarly outputs (eg scholar's keep)
              • university archive
                    • noel butlin
              • epress
              • anu theses online
              • oddities hosted on campus
                    • obituaries australia
                    • any special collections
                    • develop suitable infrastructure plan informed by
                          • data management plan
                          • metadata management plan

Data management

  • each entity (faculty, school, college) to develop research data curation plan
    • data to be hosted on platform appropriate to data use (disk, MAID, tape)
    • data to be retrievable on request
          • via institutional metadata repository
          • data in institutional repository is by reference to URL
          • plan to state period of time data to be available for
                • align with overseas best practice (NSF, JISC)
          • plan to state access restrictions
            • plan to describe data curation techniques to be used to maintain integrity of data set
            • plan to identify funding model for data curation and migration
                  • ANU central repositories to be treated as if simply another entity data store
                        • allow colleges to host own specialist repositories for subject specific collections
                          • require enforcement of metadata standards
                          • require federation with other repositories

Metadata management

  • use of standards
  • metadata schema to be discipline appropriate
  • repository front end
      • allows search of content available according to metadata
      • allows harvesting of metadata (OAI, Google Scholar)
            • harvesting of data allows content reuse
      • repository to federate with other ANU repositories

Architecture

  • Split model
        • data hosted separately from metadata
        • objects/datasets ingested on a one by one basis or by harvesting from existing repository
          • Software
          • build round Fedora
                • institutional standard
                • permits alignment with csiro
                  • allow common metadata schema for astronomical and environmental data
                  • well supported
                  • in use by NLA
                  • flexible extensible metadata schema
                  • repository as meta repository - only handles links to data sources
                          • need to understand federation mechanisms more fully - OAI/PMH
              • Hardware
                    • standard server hardware
                    • red hat linux
                    • reasonably high performance required
                    • copy best practice (various national libraries)

Project plan

  • Initial
        • scoping, develop formal project plan with timeline and resources allocated
  • Discovery
        • identify datasets
        • identify appropriate metadata schema
        • identify any required crosswalks
  • Protoype
          • select appropriate subset to provide first pass feasibility study
                • include ANU centrally hosted and college/research centre hosted data
                  • include data sources on campus network and elsewhere
              • Build initial solution and refine offering
                      • use as an exercise to build expertise and team
  • Production
        • expand range of data sources offered
        • transit from development to service
        • develop long term service strategy independent of resources individual data curation strategies

Issues

  • Does DoI wish to become a provider of archival data storage on a fee for service basis ?
      • how does this align with our storage strategy
      • what is the role of cloud based storage in this?
    • Identification of researchers past and present
    • Restrictions/access control to sensitive data
    • do we wish to cache frequently accessed but slow responding datasets (effectively build a data mirror)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License