GenomeSpace is the data-centric middleware which enables researchers and general users to access their omics data from the platform through a web-based file system interface and analyse it using a number of different genomics, proteomics focussed tools, including (but not limited to) the microbial GVL. Genomespace can be considered a true data-centric ‘virtual laboratory’ that will allow general users to store and access their data and data analyses on cloud resources, linking seamlessly to the microbial GVL analysis environment, in addition to other tools.
It has been developed at the Broad Institute of Harvard and MIT and is used extensively around the world for research and in industry. One of our contributions from the RDS Omics project has been to extend GenomeSpace functionality to allow RDS Swift storage to be used, a function that has been requested for GenomeSpace for some time.
In some ways GenomeSpace can be considered analogous to Google Drive, in that it provides a web accessible file system perspective on cloud based data, to which various different tools can be applied. Files can be uploaded to GenomeSpace or imported from a number of sources. The DMP is one of those sources.
GenomeSpace has several defining characteristics:
- Highly flexible. Different data sources can be connected, files can be uploaded through a simple web interface; files can be manipulated via simple file system functions.
- Highly accessible. GenomeSpace is web-based middleware that exposes a user’s research data in the context of a palette of analysis and visualisation tools.
- User-centric: users manage their own data and construct their own virtual analysis environment. The screenshot below depicts a typical GenomeSpace interface, in which the user’s data is visible through a simple file system interface and a suite of tools (including the GVL) appear as options across the top. Users can send data to and from tools, and upload and download data to their data portal. GenomeSpace does not store data, but rather exposes cloud data stores through a web interface and brokers direct data transfers to and from analysis tools/platforms and the cloud data stores.
On the design of the platform
The major components of the RDS Omics platform are loosely coupled by design. This is a direct result of the driver to provide a general platform for Omics analysis, of which the SEPSIS reference data is one use case.
The SEPSIS reference data, made available through the DMP, should be considered only one of a number of potential data sources for analysis; and the microbial GVL considered only one of a number of potential tools for analysis. GenomeSpace middleware precisely fits this paradigm
As and when other Omics data sets are ingested, a data model and associated metadata would need to be defined to suit that dataset. Once the metadata and data is in place, then all the existing tools and workflows within the microbial GVL can be used.