Find enriched GO terms

Create the Gene to GO Term mapping

find_enriched_go_terms(
  assignments,
  gene_id_to_go,
  ontology = "BP",
  weighted = FALSE,
  node_size = 10
)

create_go_term_mapping(genes, gene_col = "refseq_mrna")

Arguments

assignments

boolean named vector determining the gene subset to be tested for enrichment of GO terms. The names of the vector should be the gene names. Elements with TRUE will consist of the gene cluster.

gene_id_to_go

List giving the Gene ID to GO object required for topGO (see topGOdata-class). create_go_term_mapping can construct such a list from a data-frame.

ontology

string, optional, default: BP. specficies which ontology to use (passed to ontology argument in creating a new topGOdata object). Can be 'BP', 'CC', or 'NF'. See topGOdata-class.

weighted

boolean, optional, default: FALSE. Whether to use the weighted algorithm or not in runTest.

node_size

integer, optional, default: 10. Consider only GO terms with node_size number of genes, passed to nodeSize argument of topGOdata-class

genes

dataframe, with two required columns. The first gives the gene names, with column name by the argument gene_col. The other column must be named "go_id" and give the genes GO id. Genes will have multiple GO id that they map to, and each go mapping of a gene is a separate row. Thus genes will be in multiple rows of the input.

gene_col

the name of the column of the genes data frame that contains the correct gene reference. By default, is "refseq_mrna".

Value

Returns results in the format of GenTable.

create_go_term_mapping returns a list giving the gene to GO id in the format required by topGOdata-class.

Details

find_enriched_go_terms is a wrapper for running a GO enrichment analysis via the package topGO. This function creates a topGOdata-class object, runs the function runTest to test for enrichment using the statistic="fisher" option, and then runs GenTable. This function then does some post-processing of the results, returning only GO terms that satisfy:

  1. BH adjusted p-values less than 0.05 using p.adjust

  2. GO terms are enriched, i.e. the number of genes from the GO term found in the subset is greater than expected

See also

create_go_term_mapping, find_enriched_pathway, GenTable, runTest, topGOdata-class, p.adjust

Examples

data(exampleData) head(testGenesGO) #gives the mapping of genes to GO
#> go_id refseq_mrna #> 1 GO:0008376 NM_001001566 #> 2 GO:0032580 NM_001001566 #> 3 GO:0016020 NM_001001566 #> 4 GO:0016021 NM_001001566 #> 5 GO:0016740 NM_001001566 #> 6 GO:0005794 NM_001001566
geneId2Go <- create_go_term_mapping(testGenesGO) #create fake assignment of genes to group based on TRUE/FALSE values inGroup=rep(FALSE,nrow(testData)) inGroup[1:10]=TRUE names(inGroup)<-names(geneId2Go) find_enriched_go_terms(inGroup,geneId2Go)
#> #> Building most specific GOs .....
#> ( 2212 GO terms found. )
#> #> Build GO DAG topology ..........
#> ( 5277 GO terms and 12059 relations. )
#> #> Annotating nodes ...............
#> ( 337 genes annotated to the GO terms. )
#> #> -- Weight Algorithm -- #> #> The algorithm is scoring 678 nontrivial nodes #> parameters: #> test statistic: fisher : ratio
#> #> Level 13: 4 nodes to be scored.
#> #> Level 12: 9 nodes to be scored.
#> #> Level 11: 14 nodes to be scored.
#> #> Level 10: 21 nodes to be scored.
#> #> Level 9: 38 nodes to be scored.
#> #> Level 8: 60 nodes to be scored.
#> #> Level 7: 98 nodes to be scored.
#> #> Level 6: 140 nodes to be scored.
#> #> Level 5: 140 nodes to be scored.
#> #> Level 4: 89 nodes to be scored.
#> #> Level 3: 47 nodes to be scored.
#> #> Level 2: 17 nodes to be scored.
#> #> Level 1: 1 nodes to be scored.
#> [1] GO.ID Term Annotated Significant #> [5] Expected resultFisher resultFisher_padj #> <0 rows> (or 0-length row.names)