| xGRviaGenomicAnno | R Documentation | 
xGRviaGenomicAnno is supposed to conduct region-based enrichment
analysis for the input genomic region data (genome build h19), using
genomic annotations (eg active chromatin, transcription factor binding
sites/motifs, conserved sites). Enrichment analysis is based on
binomial test for estimating the significance of overlaps either at the
base resolution, at the region resolution or at the hybrid resolution.
Test background can be provided; by default, the annotatable will be
used.
xGRviaGenomicAnno(data.file, annotation.file = NULL, background.file =
NULL,
format.file = c("data.frame", "bed", "chr:start-end", "GRanges"),
build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"),
resolution = c("bases", "regions", "hybrid"),
background.annotatable.only = T, p.adjust.method = c("BH", "BY",
"bonferroni", "holm", "hochberg", "hommel"), GR.annotation = c(NA,
"Uniform_TFBS", "ENCODE_TFBS_ClusteredV3",
"ENCODE_TFBS_ClusteredV3_CellTypes", "Uniform_DNaseI_HS",
"ENCODE_DNaseI_ClusteredV3", "ENCODE_DNaseI_ClusteredV3_CellTypes",
"Broad_Histone", "SYDH_Histone", "UW_Histone", "FANTOM5_Enhancer_Cell",
"FANTOM5_Enhancer_Tissue", "FANTOM5_Enhancer_Extensive",
"FANTOM5_Enhancer",
"Segment_Combined_Gm12878", "Segment_Combined_H1hesc",
"Segment_Combined_Helas3", "Segment_Combined_Hepg2",
"Segment_Combined_Huvec",
"Segment_Combined_K562", "TFBS_Conserved", "TS_miRNA", "TCGA",
"ReMap_Public_TFBS", "ReMap_Public_mergedTFBS",
"ReMap_PublicAndEncode_mergedTFBS", "ReMap_Encode_TFBS",
"Blueprint_BoneMarrow_Histone", "Blueprint_CellLine_Histone",
"Blueprint_CordBlood_Histone", "Blueprint_Thymus_Histone",
"Blueprint_VenousBlood_Histone", "Blueprint_DNaseI",
"Blueprint_Methylation_hyper", "Blueprint_Methylation_hypo",
"EpigenomeAtlas_15Segments_E029", "EpigenomeAtlas_15Segments_E030",
"EpigenomeAtlas_15Segments_E031", "EpigenomeAtlas_15Segments_E032",
"EpigenomeAtlas_15Segments_E033", "EpigenomeAtlas_15Segments_E034",
"EpigenomeAtlas_15Segments_E035", "EpigenomeAtlas_15Segments_E036",
"EpigenomeAtlas_15Segments_E037", "EpigenomeAtlas_15Segments_E038",
"EpigenomeAtlas_15Segments_E039", "EpigenomeAtlas_15Segments_E040",
"EpigenomeAtlas_15Segments_E041", "EpigenomeAtlas_15Segments_E042",
"EpigenomeAtlas_15Segments_E043", "EpigenomeAtlas_15Segments_E044",
"EpigenomeAtlas_15Segments_E045", "EpigenomeAtlas_15Segments_E046",
"EpigenomeAtlas_15Segments_E047", "EpigenomeAtlas_15Segments_E048",
"EpigenomeAtlas_15Segments_E050", "EpigenomeAtlas_15Segments_E051",
"EpigenomeAtlas_15Segments_E062"), verbose = T,
RData.location = "http://galahad.well.ox.ac.uk/bigdata")
| data.file | an input data file, containing a list of genomic regions to test. If the input file is formatted as a 'data.frame' (specified by the parameter 'format.file' below), the first three columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), and the ending chromosome position (3rd column). If the format is indicated as 'bed' (browser extensible data), the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the genomic regions provided are not ranged but only the single position, the ending chromosome position (3rd column) is allowed not to be provided. If the format is indicated as "chr:start-end", instead of using the first 3 columns, only the first column will be used and processed. If the file also contains other columns, these additional columns will be ignored. Alternatively, the input file can be the content itself assuming that input file has been read. Note: the file should use the tab delimiter as the field separator between columns. | 
| annotation.file | an input annotation file containing genomic annotations for genomic regions. If the input file is formatted as a 'data.frame', the first four columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), the ending chromosome position (3rd column), and the genomic annotations (eg transcription factors and histones; 4th column). If the format is indicated as 'bed', the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the format is indicated as "chr:start-end", the first two columns correspond to the chromosome:start-end (1st column) and the genomic annotations (eg transcription factors and histones; 2nd column). If the file also contains other columns, these additional columns will be ignored. Alternatively, the input file can be the content itself assuming that input file has been read. Note: the file should use the tab delimiter as the field separator between columns. | 
| background.file | an input background file containing a list of genomic regions as the test background. The file format is the same as 'data.file'. By default, it is NULL meaning all annotatable bases (ig non-redundant bases covered by 'annotation.file') are used as background. However, if only one annotation (eg only a transcription factor) is provided in 'annotation.file', the background must be provided. | 
| format.file | the format for input files. It can be one of "data.frame", "chr:start-end", "bed" and "GRanges" | 
| build.conversion | the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so) | 
| resolution | the resolution of overlaps being tested. It can be one of "bases" at the base resolution (by default), "regions" at the region resolution, and "hybrid" at the base-region hybrid resolution (that is, data at the region resolution but annotation/background at the base resolution). If regions being analysed are SNPs themselves, then the results are the same even when choosing this parameter as either 'bases' or 'hybrid' or 'regions' | 
| background.annotatable.only | logical to indicate whether the background is further restricted to annotatable bases (covered by 'annotation.file'). In other words, if the background is provided, the background bases are those after being overlapped with annotatable bases. Notably, if only one annotation (eg only a transcription factor) is provided in 'annotation.file', it should be false | 
| p.adjust.method | the method used to adjust p-values. It can be one of "BH", "BY", "bonferroni", "holm", "hochberg" and "hommel". The first two methods "BH" (widely used) and "BY" control the false discovery rate (FDR: the expected proportion of false discoveries amongst the rejected hypotheses); the last four methods "bonferroni", "holm", "hochberg" and "hommel" are designed to give strong control of the family-wise error rate (FWER). Notes: FDR is a less stringent condition than FWER | 
| GR.annotation | the genomic regions of annotation data. By default, it is 'NA' to disable this option. Pre-built genomic annotation data are detailed in the section 'Note'. Beyond pre-built annotation data, the user can specify the customised input. To do so, first save your RData file (a list of GR objects, each is an GR object correponding to an annotation) into your local computer. Then, tell "GR.annotation" with your RData file name (with or without extension), plus specify your file RData path in "RData.location". Note: you can also load your customised GR object directly | 
| verbose | logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display | 
| RData.location | the characters to tell the location of built-in
RData files. See  | 
a data frame with 8 columns (below explanations are based on results at the 'hybrid' resolution):
name: the annotation name
nAnno: the number of bases covered by that annotation. If
the background is provided, they are also restricted by this
nOverlap: the number of regions overlapped between input
regions and annotation regions. If the background is provided, they are
also restricted by this
fc: fold change
zscore: z-score
pvalue: p-value
adjp: adjusted p-value. It is the p value but after being
adjusted for multiple comparisons
expProb: the probability of expecting bases overlapped
between background regions and annotation regions
obsProb: the probability of observing regions overlapped
between input regions and annotation regions
The genomic annotation data are described below according to the data
sources and data types.
1. ENCODE Transcription Factor ChIP-seq data
Uniform_TFBS: a list (690 combinations of cell types and
transcription factors) of GenomicRanges objects; each is an GR object
containing uniformly identified peaks per cell type per transcription
factor.
ENCODE_TFBS_ClusteredV3: a list (161 transcription
factors) of GenomicRanges objects; each is an GR object containing
clustered peaks per transcription factor, along with a meta-column
'cells' telling cell types associtated with a clustered peak.
ENCODE_TFBS_ClusteredV3_CellTypes: a list (91 cell types)
of a list (transcription factors) of GenomicRanges objects. Each cell
type is a list (transcription factor) of GenomicRanges objects; each is
an GR object containing clustered peaks per transcription factor.
2. ENCODE DNaseI Hypersensitivity site data
Uniform_DNaseI_HS: a list (125 cell types) of
GenomicRanges objects; each is an GR object containing uniformly
identified peaks per cell type.
ENCODE_DNaseI_ClusteredV3: an GR object containing
clustered peaks, along with a meta-column 'num_cells' telling how many
cell types associtated with a clustered peak.
ENCODE_DNaseI_ClusteredV3_CellTypes: a list (125 cell
types) of GenomicRanges objects; each is an GR object containing
clustered peaks per cell type.
3. ENCODE Histone Modification ChIP-seq data from different sources
Broad_Histone: a list (156 combinations of cell types and
histone modifications) of GenomicRanges objects; each is an GR object
containing identified peaks per cell type and per histone modification.
This dataset was generated from ENCODE/Broad Institute.
SYDH_Histone: a list (29 combinations of cell types and
histone modifications) of GenomicRanges objects; each is an GR object
containing identified peaks per cell type and per histone modification.
This dataset was generated from ENCODE/Stanford/Yale/Davis/Harvard.
UW_Histone: a list (172 combinations of cell types and
histone modifications) of GenomicRanges objects; each is an GR object
containing identified peaks per cell type and per histone modification.
This dataset was generated from ENCODE/University of Washington.
4. FANTOM5 expressed enhancer atlas
FANTOM5_Enhancer_Cell: a list (71 cell types) of
GenomicRanges objects; each is an GR object containing enhancers
specifically expressed in a cell type.
FANTOM5_Enhancer_Tissue: a list (41 tissues) of
GenomicRanges objects; each is an GR object containing enhancers
specifically expressed in a tissue.
FANTOM5_Enhancer_Extensive: a list (5 categories of
extensitive enhancers) of GenomicRanges objects; each is an GR object
containing extensitive enhancers. They are:
"Extensive_ubiquitous_enhancers_cells" for ubiquitous enhancers
expressed over the entire set of cell types;
"Extensive_ubiquitous_enhancers_organs" for ubiquitous enhancers
expressed over the entire set of tissues;
"Extensive_enhancers_tss_associations" for TSS-enhancer
associations(RefSeq promoters only); "Extensive_permissive_enhancers"
and "Extensive_robust_enhancers" for permissive and robust enhancer
sets.
FANTOM5_Enhancer: a list (117 cell
types/tissues/categories) of GenomicRanges objects; each is an GR
object.
5. ENCODE combined (ChromHMM and Segway) Genome Segmentation data
Segment_Combined_Gm12878: a list (7 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the cell line GM12878 (a lymphoblastoid cell
line).
Segment_Combined_H1hesc: a list (7 categories of segments)
of GenomicRanges objects; each is an GR object containing segments per
category in the cell line H1-hESC (H1 human embryonic stem cells).
Segment_Combined_Helas3: a list (7 categories of segments)
of GenomicRanges objects; each is an GR object containing segments per
category in the cell line HeLa S3.
Segment_Combined_Hepg2: a list (7 categories of segments)
of GenomicRanges objects; each is an GR object containing segments per
category in the cell line HepG2 (liver hepatocellular carcinoma).
Segment_Combined_Huvec: a list (7 categories of segments)
of GenomicRanges objects; each is an GR object containing segments per
category in the cell line HUVEC (Human Umbilical Vein Endothelial
Cells).
Segment_Combined_K562: a list (7 categories of segments)
of GenomicRanges objects; each is an GR object containing segments per
category in the cell line K562 (human erythromyeloblastoid leukemia
cell line).
6. Conserved TFBS
TFBS_Conserved: a list (245 PWM) of GenomicRanges objects;
each is an GR object containing human/mouse/rat conserved TFBS for each
PWM.
7. TargetScan miRNA regulatory sites
TS_miRNA: a list (153 miRNA) of GenomicRanges objects;
each is an GR object containing miRNA regulatory sites for each miRNA.
8. TCGA exome mutation data
TCGA: a list (11 tumor types) of GenomicRanges objects;
each is an GR object containing exome mutation across tumor patients of
the same tumor type.
9. ReMap integration of transcription factor ChIP-seq data (publicly available and ENCODE)
ReMap_Public_TFBS: a list (395 combinations of GSE studies
and transcription factors and cell types) of GenomicRanges objects;
each is an GR object containing identified peaks per GSE study per
transcripton factor per cell type.
ReMap_Public_mergedTFBS: a list (131 transcription factors
under GSE studies) of GenomicRanges objects; each is an GR object
containing merged peaks per transcripton factor.
ReMap_PublicAndEncode_mergedTFBS: a list (237
transcription factors under GSE studies and ENCODE) of GenomicRanges
objects; each is an GR object containing merged peaks per transcripton
factor.
ReMap_Encode_TFBS: a list (155 transcription factors under
ENCODE) of GenomicRanges objects; each is an GR object containing
identified peaks per transcripton factor.
10. Blueprint Histone Modification ChIP-seq data
Blueprint_BoneMarrow_Histone: a list (132 combinations of
histone modifications and samples) of GenomicRanges objects; each is an
GR object containing identified peaks per histone per sample (from bone
marrow).
Blueprint_CellLine_Histone: a list (38 combinations of
histone modifications and cell lines) of GenomicRanges objects; each is
an GR object containing identified peaks per histone per cell line.
Blueprint_CordBlood_Histone: a list (126 combinations of
histone modifications and samples) of GenomicRanges objects; each is an
GR object containing identified peaks per histone per sample (from cord
blood).
Blueprint_Thymus_Histone: a list (5 combinations of
histone modifications and samples) of GenomicRanges objects; each is an
GR object containing identified peaks per histone per sample (from
thymus).
Blueprint_VenousBlood_Histone: a list (296 combinations of
histone modifications and samples) of GenomicRanges objects; each is an
GR object containing identified peaks per histone per sample (from
venous blood).
11. BLUEPRINT DNaseI Hypersensitivity site data
Blueprint_DNaseI: a list (36 samples) of GenomicRanges
objects; each is an GR object containing identified peaks per sample.
12. BLUEPRINT DNA Methylation data
Blueprint_Methylation_hyper: a list (206 samples) of
GenomicRanges objects; each is an GR object containing hyper-methylated
CpG regions per sample.
Blueprint_Methylation_hypo: a list (206 samples) of
GenomicRanges objects; each is an GR object containing hypo-methylated
CpG regions per sample.
13. Roadmap Epigenomics Core 15-state Genome Segmentation data for primary cells (blood and T cells)
EpigenomeAtlas_15Segments_E033: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E033 (Primary T cells
from cord blood).
EpigenomeAtlas_15Segments_E034: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E034 (Primary T cells
from peripheral blood).
EpigenomeAtlas_15Segments_E037: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E037 (Primary T helper
memory cells from peripheral blood 2).
EpigenomeAtlas_15Segments_E038: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E038 (Primary T helper
naive cells from peripheral blood).
EpigenomeAtlas_15Segments_E039: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E039 (Primary T helper
naive cells from peripheral blood).
EpigenomeAtlas_15Segments_E040: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E040 (Primary T helper
memory cells from peripheral blood 1).
EpigenomeAtlas_15Segments_E041: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E041 (Primary T helper
cells PMA-I stimulated).
EpigenomeAtlas_15Segments_E042: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E042 (Primary T helper
17 cells PMA-I stimulated).
EpigenomeAtlas_15Segments_E043: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E043 (Primary T helper
cells from peripheral blood).
EpigenomeAtlas_15Segments_E044: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E044 (Primary T
regulatory cells from peripheral blood).
EpigenomeAtlas_15Segments_E045: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E045 (Primary T cells
effector/memory enriched from peripheral blood).
EpigenomeAtlas_15Segments_E047: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E047 (Primary T killer
naive cells from peripheral blood).
EpigenomeAtlas_15Segments_E048: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E048 (Primary T killer
memory cells from peripheral blood).
EpigenomeAtlas_15Segments_E062: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E062 (Primary
mononuclear cells from peripheral blood).
14. Roadmap Epigenomics Core 15-state Genome Segmentation data for primary cells (HSC and B cells)
EpigenomeAtlas_15Segments_E029: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E029 (Primary
monocytes from peripheral blood).
EpigenomeAtlas_15Segments_E030: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E030 (Primary
neutrophils from peripheral blood).
EpigenomeAtlas_15Segments_E031: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E031 (Primary B cells
from cord blood).
EpigenomeAtlas_15Segments_E032: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E032 (Primary B cells
from peripheral blood).
EpigenomeAtlas_15Segments_E035: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E035 (Primary
hematopoietic stem cells).
EpigenomeAtlas_15Segments_E036: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E036 (Primary
hematopoietic stem cells short term culture).
EpigenomeAtlas_15Segments_E046: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E046 (Primary Natural
Killer cells from peripheral blood).
EpigenomeAtlas_15Segments_E050: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E050 (Primary
hematopoietic stem cells G-CSF-mobilized Female).
EpigenomeAtlas_15Segments_E051: a list (15 categories of
segments) of GenomicRanges objects; each is an GR object containing
segments per category in the reference epigenome E051 (Primary
hematopoietic stem cells G-CSF-mobilized Male).
xEnrichViewer
## Not run: # Load the XGR package and specify the location of built-in data library(XGR) RData.location <- "http://galahad.well.ox.ac.uk/bigdata_dev" # Enrichment analysis for GWAS SNPs from ImmunoBase ## a) provide input data data.file <- "http://galahad.well.ox.ac.uk/bigdata/ImmunoBase_GWAS.bed" ## b) perform enrichment analysis using FANTOM expressed enhancers eTerm <- xGRviaGenomicAnno(data.file=data.file, format.file="bed", GR.annotation="FANTOM5_Enhancer_Cell", RData.location=RData.location) ## c) view enrichment results for the top significant terms xEnrichViewer(eTerm) ## d) barplot of enriched terms bp <- xEnrichBarplot(eTerm, top_num='auto', displayBy="fc") bp ## e) save enrichment results to the file called 'Regions_enrichments.txt' output <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=TRUE) utils::write.table(output, file="Regions_enrichments.txt", sep="\t", row.names=FALSE) ## End(Not run)