Make a table with ligands and receptors in R with OmnipathR

2020-11-24

Curated lists of genes help computational biologists to focus analyses on a subset of genes that might be important for a research question. For example, we might be interested to focus on the genes encoding the signals and receptors for cell-to-cell communication. OmnipathR is a new R package that provides access to a vast database of genes called OmniPath, organized and curated by the Saez-Rodriguez Lab. Let’s try to use OmnipathR to create a simple table with ligands and receptors.

Install OmnipathR#

Let’s get the latest development version from GitHub. For me, that was 1.3.7 when I wrote this post.

devtools::install_github("saezlab/OmnipathR")

Make a table of ligands and receptors#

Load the package.

library(OmnipathR)
library(dplyr)

OmniPath has a lot of annotations and functions for accessing them.

Let’s try the import_intercell_network() function.

icn <- OmnipathR::import_intercell_network()

The returned icn object is a data frame with 30,265 rows and 44 columns. It’s a lot of information!

Download the table#

Download the entire table with 30,265 rows:

💾 omnipath-intercell-network.tsv (12.2 Mb)

Explore the first 100 records#

Lets have a look at the first 100 records:

There are 44 columns to explore:

sort(colnames(icn))
##  [1] "aspect_intercell_source"                       
##  [2] "aspect_intercell_target"                       
##  [3] "category_intercell_source"                     
##  [4] "category_intercell_target"                     
##  [5] "category_source_intercell_source"              
##  [6] "category_source_intercell_target"              
##  [7] "consensus_direction"                           
##  [8] "consensus_inhibition"                          
##  [9] "consensus_score_intercell_source"              
## [10] "consensus_score_intercell_target"              
## [11] "consensus_stimulation"                         
## [12] "curation_effort"                               
## [13] "database_intercell_source"                     
## [14] "database_intercell_target"                     
## [15] "dip_url"                                       
## [16] "entity_type_intercell_source"                  
## [17] "entity_type_intercell_target"                  
## [18] "genesymbol_intercell_source"                   
## [19] "genesymbol_intercell_target"                   
## [20] "is_directed"                                   
## [21] "is_inhibition"                                 
## [22] "is_stimulation"                                
## [23] "n_references"                                  
## [24] "n_resources"                                   
## [25] "parent_intercell_source"                       
## [26] "parent_intercell_target"                       
## [27] "plasma_membrane_peripheral_intercell_source"   
## [28] "plasma_membrane_peripheral_intercell_target"   
## [29] "plasma_membrane_transmembrane_intercell_source"
## [30] "plasma_membrane_transmembrane_intercell_target"
## [31] "receiver_intercell_source"                     
## [32] "receiver_intercell_target"                     
## [33] "references"                                    
## [34] "scope_intercell_source"                        
## [35] "scope_intercell_target"                        
## [36] "secreted_intercell_source"                     
## [37] "secreted_intercell_target"                     
## [38] "source"                                        
## [39] "source_genesymbol"                             
## [40] "sources"                                       
## [41] "target"                                        
## [42] "target_genesymbol"                             
## [43] "transmitter_intercell_source"                  
## [44] "transmitter_intercell_target"

Show how many references support each record#

There are 13,126 records with 0 references, and 17,139 records with at least 1 reference.

This histogram summarizes the number of records (y-axis) with each number of supporting references (x-axis).

library(ggplot2)

ggplot(icn) +
  geom_histogram(aes(n_references)) +
  scale_y_continuous(trans = "log10")

plot of chunk unnamed-chunk-9

I’m not sure what the ID numbers mean, but it looks like there are some references and databases listed here:

icn$references[which(icn$n_references > 2)[1:5]]
## [1] "Baccin2019:1006133;Baccin2019:98194281;CellPhoneDB:22353464;HPRD:11006133;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801"                         
## [2] "Baccin2019:1006133;Baccin2019:98194281;CellPhoneDB:22353464;HPRD:11006133;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801"                         
## [3] "Baccin2019:1006133;Baccin2019:98194281;CellPhoneDB:22353464;HPRD:11006133;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801"                         
## [4] "Baccin2019:10958687;BioGRID:9244302;CellPhoneDB:22353464;HPRD:10958687;LRdb:10958687;NetPath:10958687;SIGNOR:16140393;SIGNOR:23111325"                                                 
## [5] "Baccin2019:11006133;Baccin2019:12482954;CancerCellMap:11006133;CellPhoneDB:22353464;HPRD:11006133;HPRD:12482954;LRdb:11;LRdb:12482954;NetPath:11006133;SIGNOR:11006133;SIGNOR:16140393"

Filter the table#

Let’s filter the table to source-target pairs where the consensus score for the source is greater than 4. Then we can take a subset of the columns, to simplify the table.

omni <- icn %>%
  dplyr::filter(consensus_score_intercell_source > 4) %>%
  dplyr::select(
    target_genesymbol,
    source_genesymbol,
    is_stimulation,
    consensus_score_intercell_source
  ) %>%
  unique
head(omni)
## # A tibble: 6 x 4
##   target_genesymbol source_genesymbol is_stimulation consensus_score_intercell_…
##   <chr>             <chr>                      <int>                       <int>
## 1 NRXN1             NLGN3                          0                           5
## 2 NRXN2             NLGN3                          1                           5
## 3 NRXN3             NLGN3                          1                           5
## 4 NRXN1             NLGN3                          1                           5
## 5 MUSK              AGRN                           1                           5
## 6 LRP4              AGRN                           0                           5

Suppose we have a few genes of interest:

my_genes <- c(
  "CD274", "CXCL1", "CXCL13", "CXCR3", "CXCR5"
)

Are the genes in this table?

Yes, and it looks like CXCR3 and CXCR5 are labeled as “target” genes:

my_genes[my_genes %in% omni$target_genesymbol]
## [1] "CXCR3" "CXCR5"

While CD274, CXCL1, CXCL13 are labeld as “source” genes:

my_genes[my_genes %in% omni$source_genesymbol]
## [1] "CD274"  "CXCL1"  "CXCL13"

Learn more#

Please see the OmniPath website for more details: https://omnipathdb.org/

There is a lot to explore:

© 2025 Kamil Slowikowski