Uploading a Dataset

  1. To upload a dataset, You need to Sign In to Bio2vec application.
  2. Click on the Account menu button on top left and select My Datasets option.It will display the list of Datasets uploaded by the user.
  3. Click on Create Dataset Button. It will display the Create Dataset Form.
  4. Creating a dataset is a two step process including filling the fields describing the dataset and uploading the dataset file.
  5. Dataset Description fields are following:
  6. Field Name Required Description
    Name required Name of the dataset
    Description optional Describe the dataset. What features are encoded?
    Measurement Technique optional What kind of method was used to generate embeddings?
    Original Dataset optional Link to the original dataset
    Original Description optional Provide small description of the original dataset
    Evaluated In optional An Experiment in which the embeddings are evaluated. URL to OpenML.
    Creators optional url/orcid/text separated by commas
    Contributors optional url/orcid/text separated by commas
    Publisher optional URL
    Keywords optional Types of entities used in the dataset
    Citation optional A published reference for the publication
  7. Fill the description fields and click on Create button.
  8. Once the dataset is created, the dataset upload form will be displayed.
  9. Fill the following fields and Click on the Save button to upload the selected dataset file.
  10. Field Name Required Description
    Version required Fill the field in format: major.minor.patch
    Embeddings file required uploads the embeddings file in formats including *.tsv or *.tsv.gz. Description of the embedding file is given in below Embedding File Format section.
    License required Select the license from given license types

Embeddings File Format

Bio2vec have created a common format for uploading the embeddings file. It accepts embeddings file in either tsv (tab separated values format) or tsv.gz (compressed tsv) file fromat. Here is the discription of the columns and their example values:

Field Name Required Description Example
Entity ID required Identifier for the entity or object for which the vector is computed http://purl.obolibrary.org/obo/HP_0410289
Label optional Display label for the entity. It is an optional field, however including labels in the source file will allow user to search entities using specified label through bio2Vec dataset browser Hypoamylasemia
Alternative IDs optional Alternate IDs of the entity or term if any obo:HP_0410289 (“,”-separated list)
Synonyms optional Synonyms for the entity or term Decreased circulating amylase level (“,”-separated list)
Type optional Type of the entity or term. Adding type information could be very useful as Bio2vec utilize this information to display more meaningful visualization of entities and also allows user to search similar entities based on their types through SPARQL queries http://phenomebrowser.net/Phenotype
Vector Elements required List of comma-separated computed vector coordinates or elements 0.2888383,-0.38942948,-0.09588139,-0.1310513 (“,”-separated list)