EP2DC Documentation Page
EP2DC allows, for the first time, datasets to be stored in a remote data store alongside the article they pertain to . EP2DC comprises several services that allows users to upload data alongside their EPrints article submission. It then allows users to find data via the MDC (Materials Data Centre -
www.materialsdatacentre.com) service. The project is designed around REST interfaces and is extensible to other data centres.
This is a fedarated system that requires service software at each end:
-
Data Centre. This is for holding your datasets. In this case we have implemented a basic Materials Data Centre built on top of Windows SharePoint Services. This is a subset of MIcrosoft SharePoint 2007 that is built into Windows Server 2003 and 2008 -
http://technet.microsoft.com/en-us/windowsserver/sharepoint/default.aspx. You can install your own instance of this on top of Windows Server by downloading the MDC software on this site.
-
EPrints. The document repository used here is EPrints (
www.eprints.org). We have developed a plugin for EPrints that allows you to submit datasets at the same time as submitting your article, and also allows you to view datasets related to articles in an EPrints repository.
MDC API layer
The MDC API layer provides a REST interface for uploading and downloading datasets into the data centre. These are described @
https://mdc-s1.soton.ac.uk:8080/mdc_bin/EP2DC/ServiceDocumentation.aspx - and in the
ServiceDocumentation.htm file attached here. Access to the test MDC server currently requires either a University of Southampton account, or guest account that can be obtained by
emailing Dr Kenji Takeda
EPrints plugin
The EPrints plugin for EP2DC is described on the EPrints website @
http://wiki.eprints.org/w/EP2DCOverview
Installing and using the MDC layer
In order to use this service a data centre must be available. Here we have setup a materials data centre on top of Microsoft SharePoint. Here is a full description of how to do that.
- Install Windows SharePoint Services. The EP2DC service requires the following SharePoint features:
- One document library
- One custom content type
For our example, we created a SharePoint site collection using the “Blank site” template (STS#1) which looks like this:

You may prefer to use a different template to get some of the other features in SharePoint such as announcements lists, calendars and contacts list. We don’t have any of those in this site. You can also customise the pages and styles to make it look less like SharePoint if required.
We then created two document libraries:
mdcdata and
mdcschemas. mdcdata is for storing the files passed by EPrints and mdcschemas is somewhere for us to hold schemas although if the schemas are held on a publicly accessible site this document library would be redundant.
Here are the document libraries after creation:

The next requirement is to create a custom content type for materials tests. Eventually, in later versions of the MDC, custom content types could be one of techniques used in searches to find related data. We currently only have one custom content type which is called
MatDBTest as we are only using XML data for our use case that matches the MatDB schema. The custom content type has been created at the site level as below. It is based on the Document content type.

The custom content type needs to be enabled for use in the mdcdata document library, as well as the built in content type called “Link to a document” which is used to link to other datasets and external pages. This is done through the document library settings. Note that you may need to enable content types on the document library by clicking on
Advanced Settings and enabling
Allow management of content types first.

When XML data is uploaded it is validated against a schema. The schema to use is defined at the top of the XML file such as in the following example:
...<MatDB:MatDB xsi:schemaLocation="http://odin.jrc.ec.europa.eu https://odin.jrc.ec.europa.eu/MatDBXMLschema/matdb.xsd" xmlns:MatDB="http://odin.jrc.ec.europa.eu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">..._
As you can see, it will try and validate against the schema stored at
https://odin.jrc.ec.europa.eu/MatDBXMLschema/matdb.xsd_ which is a a schema for describing engineering materials test data, developed by domain experts and available in the public domain. If a schema is not available online, then one can be stored in the data centre in the mdcschemas document library and the URI to this can be referenced in the XML file.
Here are a couple of schemas stored in the mdcschemas document library. Note the Upload button which is used to send a file to the document library:

The XML would then reference the schema as follows:
<MatDB:MatDB xsi:schemaLocation="http://odin.jrc.ec.europa.eu https://mdc-s1.soton.ac.uk/mdcschemas/matdb-v3.xsd" xmlns:MatDB="http://odin.jrc.ec.europa.eu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">The EP2DC plug-in can call the MDC service to upload files. It is required that a container will be created for every publication. Because EPrints does not have a name for the container, it is created with a GUID to guarantee uniqueness. Here are some uploads created during testing. The names of the folders could be anything and do not need to be GUIDs but they do need to be unique:

After the EP2DC plug-in has created a container, it can upload files to it. When an XML file is uploaded, this is recognised by the MDC (and provided validation is successful) it is stored in the document library as a MatDBTest file (notice the Content Type).

The EP2DC plug-in can request information from the MDC via the Web service, such as:
- A URI for an uploaded item given an MDC ID
- Items that are related to a given item which is currently done by comparing content types
Here is an example of how related data is displayed in the EPrints interface. Currently only the filenames of the items are displayed, but this could be any information available; this section is actually just HTML so a list of related items could be replaced by graphics or any useful feedback that could generated based on the file that was uploaded.