Find enriched GO terms

Create the Gene to GO Term mapping

find_enriched_go_terms(
  assignments,
  gene_id_to_go,
  ontology = "BP",
  weighted = FALSE,
  node_size = 10
)

create_go_term_mapping(genes, gene_col = "refseq_mrna")

Arguments

assignments	boolean named vector determining the gene subset to be tested for enrichment of GO terms. The names of the vector should be the gene names. Elements with TRUE will consist of the gene cluster.
gene_id_to_go	List giving the Gene ID to GO object required for topGO (see `topGOdata-class`). `create_go_term_mapping` can construct such a list from a data-frame.
ontology	string, optional, default: BP. specficies which ontology to use (passed to `ontology` argument in creating a new `topGOdata` object). Can be 'BP', 'CC', or 'NF'. See `topGOdata-class`.
weighted	boolean, optional, default: FALSE. Whether to use the weighted algorithm or not in `runTest`.
node_size	integer, optional, default: 10. Consider only GO terms with node_size number of genes, passed to `nodeSize` argument of `topGOdata-class`
genes	dataframe, with two required columns. The first gives the gene names, with column name by the argument `gene_col`. The other column must be named "go_id" and give the genes GO id. Genes will have multiple GO id that they map to, and each go mapping of a gene is a separate row. Thus genes will be in multiple rows of the input.
gene_col	the name of the column of the `genes` data frame that contains the correct gene reference. By default, is "refseq_mrna".

Value

Returns results in the format of GenTable.

create_go_term_mapping returns a list giving the gene to GO id in the format required by topGOdata-class.

Details

find_enriched_go_terms is a wrapper for running a GO enrichment analysis via the package topGO. This function creates a topGOdata-class object, runs the function runTest to test for enrichment using the statistic="fisher" option, and then runs GenTable. This function then does some post-processing of the results, returning only GO terms that satisfy:

BH adjusted p-values less than 0.05 using p.adjust
GO terms are enriched, i.e. the number of genes from the GO term found in the subset is greater than expected

Examples

data(exampleData)
head(testGenesGO) #gives the mapping of genes to GO
#>        go_id  refseq_mrna
#> 1 GO:0008376 NM_001001566
#> 2 GO:0032580 NM_001001566
#> 3 GO:0016020 NM_001001566
#> 4 GO:0016021 NM_001001566
#> 5 GO:0016740 NM_001001566
#> 6 GO:0005794 NM_001001566
geneId2Go <- create_go_term_mapping(testGenesGO)
#create fake assignment of genes to group based on TRUE/FALSE values
inGroup=rep(FALSE,nrow(testData))
inGroup[1:10]=TRUE
names(inGroup)<-names(geneId2Go)
find_enriched_go_terms(inGroup,geneId2Go)
#> 
#> Building most specific GOs .....
#> 	( 2212 GO terms found. )
#> 
#> Build GO DAG topology ..........
#> 	( 5277 GO terms and 12059 relations. )
#> 
#> Annotating nodes ...............
#> 	( 337 genes annotated to the GO terms. )
#> 
#> 			 -- Weight Algorithm -- 
#> 
#> 		 The algorithm is scoring 678 nontrivial nodes
#> 		 parameters: 
#> 			 test statistic: fisher : ratio
#> 
#> 	 Level 13:	4 nodes to be scored.
#> 
#> 	 Level 12:	9 nodes to be scored.
#> 
#> 	 Level 11:	14 nodes to be scored.
#> 
#> 	 Level 10:	21 nodes to be scored.
#> 
#> 	 Level 9:	38 nodes to be scored.
#> 
#> 	 Level 8:	60 nodes to be scored.
#> 
#> 	 Level 7:	98 nodes to be scored.
#> 
#> 	 Level 6:	140 nodes to be scored.
#> 
#> 	 Level 5:	140 nodes to be scored.
#> 
#> 	 Level 4:	89 nodes to be scored.
#> 
#> 	 Level 3:	47 nodes to be scored.
#> 
#> 	 Level 2:	17 nodes to be scored.
#> 
#> 	 Level 1:	1 nodes to be scored.
#> [1] GO.ID             Term              Annotated         Significant      
#> [5] Expected          resultFisher      resultFisher_padj
#> <0 rows> (or 0-length row.names)

Find enriched GO terms

Arguments

Value

Details

See also

Examples