Uploading a Dataset
- To upload a dataset, You need to Sign In to Bio2vec application.
- Click on the Account menu button on top left and select My Datasets option.It will display the list of Datasets uploaded by the user.
- Click on Create Dataset Button. It will display the Create Dataset Form.
- Creating a dataset is a two step process including filling the fields describing the dataset and uploading the dataset file.
- Dataset Description fields are following:
- Fill the description fields and click on Create button.
- Once the dataset is created, the dataset upload form will be displayed.
- Fill the following fields and Click on the Save button to upload the selected dataset file.
Field Name | Required | Description |
---|---|---|
Name | required | Name of the dataset |
Description | optional | Describe the dataset. What features are encoded? |
Measurement Technique | optional | What kind of method was used to generate embeddings? |
Original Dataset | optional | Link to the original dataset |
Original Description | optional | Provide small description of the original dataset |
Evaluated In | optional | An Experiment in which the embeddings are evaluated. URL to OpenML. |
Creators | optional | url/orcid/text separated by commas |
Contributors | optional | url/orcid/text separated by commas |
Publisher | optional | URL |
Keywords | optional | Types of entities used in the dataset |
Citation | optional | A published reference for the publication |
Field Name | Required | Description |
---|---|---|
Version | required | Fill the field in format: major.minor.patch |
Embeddings file | required | uploads the embeddings file in formats including *.tsv or *.tsv.gz. Description of the embedding file is given in below Embedding File Format section. |
License | required | Select the license from given license types |
Embeddings File Format
Bio2vec have created a common format for uploading the embeddings file. It accepts embeddings file in either tsv (tab separated values format) or tsv.gz (compressed tsv) file fromat. Here is the discription of the columns and their example values:
Field Name | Required | Description | Example |
---|---|---|---|
Entity ID | required | Identifier for the entity or object for which the vector is computed | http://purl.obolibrary.org/obo/HP_0410289 |
Label | optional | Display label for the entity. It is an optional field, however including labels in the source file will allow user to search entities using specified label through bio2Vec dataset browser | Hypoamylasemia |
Alternative IDs | optional | Alternate IDs of the entity or term if any | obo:HP_0410289 (“,”-separated list) |
Synonyms | optional | Synonyms for the entity or term | Decreased circulating amylase level (“,”-separated list) |
Type | optional | Type of the entity or term. Adding type information could be very useful as Bio2vec utilize this information to display more meaningful visualization of entities and also allows user to search similar entities based on their types through SPARQL queries | http://phenomebrowser.net/Phenotype |
Vector Elements | required | List of comma-separated computed vector coordinates or elements | 0.2888383,-0.38942948,-0.09588139,-0.1310513 (“,”-separated list) |