One of the major deliverables of the RDS Omics project is to build a data management platform. This platform will be utilised for all streams of data generated through the BPA Sepsis project. The primary activities of this data management platform are:
- Securely hold primary and processed (from analysis applications) data and metadata
- Manage (organize, protect, provide) data and metadata
- Provide a means to select data based on metadata and other characteristics of the content
- Inter-operate with the GVL to:
- Provide a means to supply selected data to GVL applications
- Provide a means to receive processed data from GVL applications and store and manage it
- Inter-operate with the BPA repository at the CCG
- Inter-operate with international repositories (for collection publishing and re-use)
- Provide a means to directly access data (independent of using the GVL for analysis)
Through the discoverability/requirements gathering phase which took place from December 2015 to April 2016, the project team identified requirements for a data management platform based on discussions with BPA Sepsis project generators.
From these requirements, three options, Mediaflux, MyTardis and GenomeSpace, were considered and scored against a list of requirements. Mediaflux was the successful option for the RDS Omics project for the several reasons.
- Mediaflux, a core product of Arcitecta, is a meta-data focused, very capable and secure data management platform. It can be used for curating, managing, protecting, and controlling all types of data, through all phases of the data life cycle.
- There has been substantial national investment in Mediaflux with servers operating in many Australian research environments. In particular, the Research Data Storage Infrastructure (RDSI) project invested in unlimited license Mediaflux systems at eight different nodes across Australia including the RDS Omics partners (VicNode, Intersect and QCIF). The VicNode Mediaflux service is operated by The University of Melbourne’s Research Platform Services which provides centrally provisioned storage, data management, compute (cloud and HPC) and training.
- Through extant skills and experience at the nodes, we can leverage Mediaflux rapidly to create a the specialised interfaces and capabilities.
- Mediaflux is highly extensible and inter-operable.
- Mediaflux scales up in all dimensions as required. In particular, it has capability around big data.
- As the project matures, distributed capability will become relevant, and Mediaflux has sophisticated federative (distributed) capability.
The data management platform is currently being developed by teams at VicNode, Intersect and QCIF. If you have any questions please email us at firstname.lastname@example.org.