Make a table with ligands and receptors in R with OmnipathR

Curated lists of genes help computational biologists to focus analyses on a subset of genes that might be important for a research question. For example, we might be interested to focus on the genes encoding the signals and receptors for cell-to-cell communication. OmnipathR is a new R package that provides access to a vast database of genes called OmniPath, organized and curated by the Saez-Rodriguez Lab. Let’s try to use OmnipathR to create a simple table with ligands and receptors.
Install OmnipathR#
Let’s get the latest development version from GitHub. For me, that was 1.3.7 when I wrote this post.
devtools::install_github("saezlab/OmnipathR")
Make a table of ligands and receptors#
Load the package.
library(OmnipathR)
library(dplyr)
OmniPath has a lot of annotations and functions for accessing them.
Let’s try the import_intercell_network() function.
icn <- OmnipathR::import_intercell_network()
The returned icn
object is a data frame with 30,265 rows and
44 columns. It’s a lot of information!
Download the table#
Download the entire table with 30,265 rows:
💾 omnipath-intercell-network.tsv (12.2 Mb)
Explore the first 100 records#
Lets have a look at the first 100 records:
There are 44 columns to explore:
sort(colnames(icn))
## [1] "aspect_intercell_source"
## [2] "aspect_intercell_target"
## [3] "category_intercell_source"
## [4] "category_intercell_target"
## [5] "category_source_intercell_source"
## [6] "category_source_intercell_target"
## [7] "consensus_direction"
## [8] "consensus_inhibition"
## [9] "consensus_score_intercell_source"
## [10] "consensus_score_intercell_target"
## [11] "consensus_stimulation"
## [12] "curation_effort"
## [13] "database_intercell_source"
## [14] "database_intercell_target"
## [15] "dip_url"
## [16] "entity_type_intercell_source"
## [17] "entity_type_intercell_target"
## [18] "genesymbol_intercell_source"
## [19] "genesymbol_intercell_target"
## [20] "is_directed"
## [21] "is_inhibition"
## [22] "is_stimulation"
## [23] "n_references"
## [24] "n_resources"
## [25] "parent_intercell_source"
## [26] "parent_intercell_target"
## [27] "plasma_membrane_peripheral_intercell_source"
## [28] "plasma_membrane_peripheral_intercell_target"
## [29] "plasma_membrane_transmembrane_intercell_source"
## [30] "plasma_membrane_transmembrane_intercell_target"
## [31] "receiver_intercell_source"
## [32] "receiver_intercell_target"
## [33] "references"
## [34] "scope_intercell_source"
## [35] "scope_intercell_target"
## [36] "secreted_intercell_source"
## [37] "secreted_intercell_target"
## [38] "source"
## [39] "source_genesymbol"
## [40] "sources"
## [41] "target"
## [42] "target_genesymbol"
## [43] "transmitter_intercell_source"
## [44] "transmitter_intercell_target"
Show how many references support each record#
There are 13,126 records with 0 references, and 17,139 records with at least 1 reference.
This histogram summarizes the number of records (y-axis) with each number of supporting references (x-axis).
library(ggplot2)
ggplot(icn) +
geom_histogram(aes(n_references)) +
scale_y_continuous(trans = "log10")
I’m not sure what the ID numbers mean, but it looks like there are some references and databases listed here:
icn$references[which(icn$n_references > 2)[1:5]]
## [1] "Baccin2019:1006133;Baccin2019:98194281;CellPhoneDB:22353464;HPRD:11006133;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801"
## [2] "Baccin2019:1006133;Baccin2019:98194281;CellPhoneDB:22353464;HPRD:11006133;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801"
## [3] "Baccin2019:1006133;Baccin2019:98194281;CellPhoneDB:22353464;HPRD:11006133;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801"
## [4] "Baccin2019:10958687;BioGRID:9244302;CellPhoneDB:22353464;HPRD:10958687;LRdb:10958687;NetPath:10958687;SIGNOR:16140393;SIGNOR:23111325"
## [5] "Baccin2019:11006133;Baccin2019:12482954;CancerCellMap:11006133;CellPhoneDB:22353464;HPRD:11006133;HPRD:12482954;LRdb:11;LRdb:12482954;NetPath:11006133;SIGNOR:11006133;SIGNOR:16140393"
Filter the table#
Let’s filter the table to source-target pairs where the consensus score for the source is greater than 4. Then we can take a subset of the columns, to simplify the table.
omni <- icn %>%
dplyr::filter(consensus_score_intercell_source > 4) %>%
dplyr::select(
target_genesymbol,
source_genesymbol,
is_stimulation,
consensus_score_intercell_source
) %>%
unique
head(omni)
## # A tibble: 6 x 4
## target_genesymbol source_genesymbol is_stimulation consensus_score_intercell_…
## <chr> <chr> <int> <int>
## 1 NRXN1 NLGN3 0 5
## 2 NRXN2 NLGN3 1 5
## 3 NRXN3 NLGN3 1 5
## 4 NRXN1 NLGN3 1 5
## 5 MUSK AGRN 1 5
## 6 LRP4 AGRN 0 5
Suppose we have a few genes of interest:
my_genes <- c(
"CD274", "CXCL1", "CXCL13", "CXCR3", "CXCR5"
)
Are the genes in this table?
Yes, and it looks like CXCR3 and CXCR5 are labeled as “target” genes:
my_genes[my_genes %in% omni$target_genesymbol]
## [1] "CXCR3" "CXCR5"
While CD274, CXCL1, CXCL13 are labeld as “source” genes:
my_genes[my_genes %in% omni$source_genesymbol]
## [1] "CD274" "CXCL1" "CXCL13"
Learn more#
Please see the OmniPath website for more details: https://omnipathdb.org/
There is a lot to explore: