Introduction
Cytoscape is an integrated network visualization tool and analysis platform 1, 2. Within its common workflows, identifier mapping remains a challenge when working with biological data from different sources. This problem has been addressed by the BridgeDB project 3, which created clients and services to translate between various identifiers. The original BridgeDb app 4 for Cytoscape was written to provide an exhaustive set of functions to match the full capabilities of BridgeDb. Though this provided the needed functionality, its basic usage was unnecessarily complex. The idmapper app is a useful alternative, providing a subset of critical features with a simplified interface bundled into Cytoscape. Now, without any installation or configuration, Cytoscape users can right-click on a table header to map that column’s data to a different namespace ( Figure 1). Although, the breadth of coverage is smaller than the full-featured BridgeDb app, it still covers over a dozen identifier data sources, including Ensembl, EntrezGene, HGNC, KEGG, Uniprot and various specied-specific sources. Because idmapper supports Cytoscape’s new CyREST interface, identifier mapping can be included in scripted workflows, and driven from R or python programs.
Figure 1.
Simplified dialog for ID Mapping.
Four options are presented to the user when accessing idmapper from within the Cytoscape GUI, each with common default or inferred values to reduce the number of steps required of the user.
Implementation
Inferring the data source
From within Cytoscape, a user initiates an ID mapping operation by right-clicking on the header of a column containing identifiers in the Table Panel. In the most common cases the type of identifier can be guessed by idmapper based on the its format. Table 1 shows the supported data sources and example identifier formats. The app looks at the first ten entries and choose the source from the option that matches corresponding regular expressions. This number of identifiers iteratively sampled is set by a static variable called N_Iterations. The algorithm for inferring the data source is implemented in IdGuess.java.
Table 1.
Supported Data Sources.
Currently supported identifier databases, their BridgeDb system codes, their species specificity and an example identifier.
Data Source | Code | Species | Example |
---|---|---|---|
Ensembl | En | Any | ENSG00000139618 |
Entrez Gene | L | Any | 11234 |
FlyBase | F | Drosophila
| FBgn0011293 |
HGNC | H | Homo sapiens | DAPK1 |
KEGG Genes | Kg | Any | syn:ssr3451 |
MG | M | Mus musculus | MGI:2442292 |
miRBase | Mbm | Any | MIMAT0000001 |
RGD | R | Rattus norvegicus | 2018660 |
SGD | D | Saccharomyces
| S000028457 |
TAIR | A | Arabidopsis
| AT1G01030 |
UniGene | U | Any | Hs.553708 |
Uniprot
| S | Any | P62158 |
WormBase | W | Caenorhabditis
| WBGene00000001 |
ZFIN | Z | Danio rerio | ZDB-GENE-041118-11 |
Cytoscape tasks
There are two different tasks supported by the idmapper app. ColumnMappingTask is activated by the right-click mouse event on a table header. It infers the current table and column from the information that comes from the mouse event. In order to support automation, we added MapColumnCommandTask as an analog that is exposed specifically for Commands and CyREST access. These tasks eventually result in the same algorithms being invoked.
Use cases
Cytoscape graphical user interface (GUI)
The idmapper app provides the same basic functionality of the BridgeDb app with less fuss. Users do not have to install it, launch it, make configuration decisions or think about which database they are accessing. The app comes bundled with every Cytoscape release. As such it usage in Cytoscape via the interactive GUI (graphical user interface) is documented in the Cytoscape manual, http://manual.cytoscape.org/en/stable/Node_and_Edge_Column_Data.html#mapping-identifiers.
To map an identifier from one source to another, right click on the column header of your identifier. Select the option to Map Column to bring up the idmapper dialog ( Figure 1).
The idmapper dialog presents a few choices the user can override before performing ID mapping. The default Species is determined by the previous selection made per network, providing a "smart and sticky" behavior. The available choices for the identifier data sources are determined by the species. The Map from data source is automatically selected based on an inspection of the first ten identifiers found in the column clicked on by the user. This can easily be overridden by the pull down menu. The To data source must be selected by the user; Ensembl is presented by default. Finally, the Force single checkbox offers to simplify the results of ID mapping by ignoring one-to-many cases and only keeping the first result. If the option is off, a list of results will appear in the column. This can easily be overridden by clicking the toggled checkbox.
Cytoscape command line interface
The command interface does not use the same tasks as the GUI. In the GUI use case, the app knows the current context of where the command was activated, i.e., the network, table and column. This information must explicitly be provided as paramaters to the command interface to perform the same operation. Thus, in addition to species, mapFrom, mapTo and forceSingle, the command line operation of idmapper also requires networkName, table and columnName (see next section for more details).
Cytoscape automation
In the scripting environment, idmapper provides all of its functionality in a single call ( Figure 2). This means that identifier mapping can be incorporated into Cytoscape automation workflows with a single additional command.
Figure 2.
Swagger documented function.
The functionality of idmapper is contained in this singular function: map column.
The map column function takes the following parameters:
columnName (string): Specifies the column name where the source identifiers are located
forceSingle (string, optional): When multiple identifiers can be mapped from a single term, this forces a singular result
mapFrom (string): Specifies the data source describing the existing identifiers
mapTo (string): Specifies the data source identifiers to be returned as a result in a new column
networkName (string, optional): Which network is used in the mapping.
species (string): The common or latin name of the species to which the identifiers apply, e.g., Human, Homo sapiens, Mouse, Mus musculus, Rat, Rattus norvegicus, Frog, Xenopus tropicalis, Zebra fish, Danio rerio, Fruit fly, Drosophila melanogaster, Mosquito, Anopheles gambiae, Arabidopsis, Arabidopsis thaliana, Yeast, Saccharomyces cerevisiae, E. coli, Escherichia coli, Tuberculosis, Mycobacterium tuberculosis, Worm, Caenorhabditis elegans
table (string, optional): Which table is used as the source of the identifiers, e.g., "node" for the default node table
With Cytoscape running, the map column function can be called from any scripting environment or programming language that supports REST calls. In the case of R and Python scripts, there are dedicated packages to make this even easier. The RCy3 package wraps this command in an R function called mapTableColumn to conform to other table functions ( https://www.bioconductor.org/packages/release/bioc/html/RCy3.html). The py2cytoscape library similarly provides this command as a python function, cyclient.idmapper.map_column ( https://github.com/cytoscape/py2cytoscape).
A sample script demonstrates how to map identifiers via RCy3, covering the most common use cases ( https://github.com/cytoscape/RCy3/blob/master/vignettes/Identifier-mapping.Rmd).
Case 1: Species-specific considerations
The Yeast Perturbation sample network provided with Cytoscape can be loaded from the Starter Panel and provides gene identifiers of the form “YDL194W”. These are actually Ensembl-supported identifiers for Yeast, distinct from the typical “ENSXXXG00000123456” form as presented in Table 1. This presents a special case that users will need to be aware of when selecting species and source database or mapFrom in the GUI. In terms of automation, you could generate a new column of Entrez Gene IDs in this network with these calls:
Case 2: From proteins to genes
When working with protein interaction networks, for example those from the STRING database (see https://apps.cytoscape.org/apps/stringapp), you may want to translate to gene identifiers. The idmapper app supports this case as well, but one should be aware of the assumptions involved when making this translation. Since most genes encode for many proteins, you may have many-to-one mappings in your results. For all human networks imported from STRING using the StringApp 5, the following commands will perform an ID mapping from Uniprot-TrEMBL (proteins) to Ensembl (genes):
Limitations
The idmapper app provides easy access to a critical subset of ID mapping functionality originally covered by the BridgeDb app. When users run into the limitations of idmapper, they still have the option of installing and using the full-featured BridgeDb app from https://apps.cytoscape.org/apps/bridgedb. Examples of limitations include support for additional species or data sources. The BridgeDb app includes more of both as well as means to access custom data sources.
Data and software availability
1. Software available from the Cytoscape App Store: https://apps.cytoscape.org/apps/idmapper
2. Latest source code: https://github.com/cytoscape/idmapper
3. Archived source code as at the time of publication: https://doi.org/10.5281/zenodo.1246814 6
4. License: Apache License, Version 2.0
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright: © 2018 Treister A and Pico AR. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Identifier Mapping, the association of terms across disparate taxonomies and databases, is a common hurdle in bioinformatics workflows. The idmapper app for Cytoscape simplifies identifier mapping for genes and proteins in the context of common biological networks. This app provides a unified interface to different identifier resources accessible through a right-click on the table's column header. It also provides an OSGi programming interface via Cytoscape Commands and CyREST that can be utilized for identifier mapping in scripts and other Cytoscape apps, and supports integrated Swagger documentation.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer