Introduction
Glue Grant Human Transcriptome Array (GG-H) is a collaboration result between Stanford Genome Technology Center, Wing Wong’s lab at Stanford, Affymetrix Inc and the Inflammation and Host Response to Injury program (“Glue Grant”). The array has been comprehsively designed to interrogate various apects of the transcriptome, incuding gene expression, alternative splicing, detection of coding SNPs and non-coding transcription. With talored procotol to work efficiently with small amount of total RNA, the array provides a high-throughput but low-cost platform for clinical genomic studies.
Affymetrix is expected to make the GG-H array available commercially in January 2013. The commercial version of the GG-H array is named as Human Transcriptome Array (HTA).
Array Components and Probe Design
Various components of the array and their probe design strategies are summarized in the following table and illustrated in the figure.
Array Components | Number of targets | Number of Probes | Design |
Gene exons | 315,123 | 3,292,929 | On average ten probes per exon (~119 probes per gene) were selected based on high thermodyanmic scores, uniqueness and spreadness on targets |
Exon-exon junctions | 260,488 | 1,060,703 | Four probes per junction at (-3, -1, +1 , +3) relative to the splicing site |
Coding SNPs and DMET variations | 89,782 | 982,941 | Six probes per allele at -4, 0, and +4 positions on each of the two strands relative to the SNP |
Non-coding functional RNA (f-ncRNA) | 730 | 5,869 | Ten probes per ncRNA were selected based on high thermodyanmic scores, uniqueness and spreadness on targets |
Non-coding antisense expression (as-ncRNA) | 50,783 | 563,097 | Probes were selected at the density of one probe per 50 bp of UTR and with a minimum of six probes per region |
Un-annotated transcribed units (UTU) | 49,957 | 488,581 | Ten probes per UTU were selected based on high thermodyanmic scores, uniqueness and spreadness on targets |
Other probes including controls | 498,840 | Designed for quality control of the assay, background modeling, estimation of cross hybridization, and monitoring the ribosomal RNA | |
Total | 6,892,960 |
Libary Files, Annotation and Database
To support different kinds of analyses using GG-H array, we have developed a set of library and annotation files. Most important ones are summarized in the following table. In addition, a comprehensive database (http://gluegrant1.stanford.edu/~DIC/db) is also available for the query of array design and annotation information. Users can use the database to generate customerized library and annotation files.
File Name | File Type | Description | Download |
---|---|---|---|
hGlue2_0.r1.clf | CEL Layout File (CLF) | CLF along with PGF make up the core chip layout information for our array. The CLF contains the mapping of probe IDs to x/y positions in the CEL file. | hGlue2_0.r1.core.tar.gz |
hGlue2_0.r1.pgf | Probe Grouping File (PGF) | PGF along with CLF make up the core chip layout information for our array. The PGF groups specific probes (by probe ID) into probesets. | |
hGlue2_0.r1.antigenomic.bgp | BackGround Probes (BGP) | The BGP file lists what probes (by probe ID) are to be used in various background correction methods (e.g. GCBG method). | |
hGlue2_0.r1.qcc | Quality Control Content (QCC) | The QCC file lists probes serving various quality control purposes. | |
hGlue2_0.r1.pgf.tbl | Tab-deliminated | The file is used for GlueQC package for quality control summary. | |
hGlue2_0.r1.PSR.ps | Probeset List (PS) | The PS file lists probeset IDs for Probe Selection Regions (PSRs). | |
hGlue2_0.r1.TC.mps | Meta Probeset List (MPS) | The MPS file is used to group individual PSR (exon) level probesets into Transcript Cluster (gene) level meta probesets. | |
hGlue2_0.r1.TC_Annot.csv | Gene Annotation File | The annotation file links transcript cluster (gene) to chromosomal position information, gene information, functional annotation (gene ontology and pathway) and other information in public databases | |
hGlue2_0.r1.ASS | Alternative Splicing Structure (ASS) | The ASS file provides the alternative splicing structure based on design time knowledge. It describe how exons and junctions are connected in a transcript cluster. | |
hGlue2_0.r1.Probe.BED | BED File | Genome coordinate file for probes on hg18 | |
hGlue2_0.r1.PSR.BED | BED File | Genome coordinate file for Probe Selection Regions (PSRs) on hg18 | |
hGlue2_0.r1.TC.BED | BED File | Genome coordinate file for Transcript Clusters (TCs) on hg18 | |
hGlue2_0.r1.gene info Gene Ontology.xls | dChip Library File | Gene ontology frequency summary for GG-H genes | hGlue2_0.r1.dChip.tar.gz |
hGlue2_0.r1.gene info.xls | dChip Library File | Gene annotation information for GG-H genes | |
hGlue2_0.r1.genome info.xls | dChip Library File | Genome coordinate information for GG-H genes | |
component.ontology; function.ontology; process.ontology; | dChip Library File | cellular component, molecular function and biological process ontology mapping for GG-H genes |
Analysis Pipeline and Softwares
To support routine analyses of GG-H array, we have established a basic pipeline for quality control, expression indices calculation and detection of alternative splicings. For other compomnents of the array, the analysis methods are still exploratory and very customerized.
Aanlysis | Software | Description | Download |
Quality control | GlueQC (requires APT and R bioconductor) | Assess array quality through exploratory plots and summary statistics | GlueQC website |
Expression indices calculation | Affymetrix Power Tools (APT)JETTA | Background correction, normalization and calculatation of exon or gene expression matrices | APT website |
Detection of alternative splicing | Junction and Exon array Toolkits for Transcriptome Analysis (JETTA) | Detection of alternatively spliced exons with or without supporting junctions | JETTA website |
High-level exploratory analysis | dChip | Clustering of gene expression and enrichment analysis of ontogies, pathways and genome locations | dChip website |
Visualization | UCSC genome browser | Visualize probe/exon/gene on genome browser | UCSC genome brower |
1. Quality control
Ensuring high quality of data is crutial to genomic studies. GlueQC starts with CEL files and checks a few quality scores to filter out outliers. Quality statistics include probe-level foreground and background signal, area under curve using Norm Exons and Norm Introns as positive and negative controls respectively, probeset prensence call, and betwen-array correlation at both exon and gene level.
To run the script,
Rscript GlueQC.R celpath=CEL_PATH outpath=OUTPUT_PATH libpath=LIB_PATH
2. Expression indices calculation
Low-level analysis of microarray includes background correction, normalization and exon/gene expression indices calculation. Here we show examples of low-level analyses using APT.
To calculate gene-level expression using APT rma-sketch:
apt-probeset-summarize -a rma-sketch -c hGlue2_0.r1.clf -p hGlue2_0.r1.pgf -b hGlue2_0.r1.antigenomic.bgp -m hGlue2_0.r1.TC.mps -o gene_expr *.CEL
To calculate exon-level expression using APT rma-sketch:
apt-probeset-summarize -a rma-sketch -c hGlue2_0.r1.clf -p hGlue2_0.r1.pgf -b hGlue2_0.r1.antigenomic.bgp -s hGlue2_0.r1.PSR.ps -o exon_expr *.CEL
JETTA is also capable of performing low-level analyses. Please refer to its dedicated website for instructions (JETTA website).
3. Alternative splicing uing JETTA
With the addition of junction probes, GG-H can improve the accuracy of alternative splicing detection. To meet the need of including junctions into alternative splicing analysis, we have developed Junction and Exon array Toolkits for Transcriptome Analysis (JETTA), an integrated software tool for expression indicies calcaultaion and alternative splicing analysis. Please refer to its dedicated website for instructions (JETTA website).
4. High-level exploratory analysis using dChip
Biologists are oftentimes interested in clustering and functional enrichment analysis at gene level. For this purpose, we provide users a set of library files to support these kinds of analysis using dChip. Please refer to dChip website for more instrunctions on how to run dChip (dChip website).
Protocol
The GG-H procotol is based on Ambion Inc./Applied Biosystems (cat# 4411974) and has been specially modified to efficiently work with small amount of starting material. It uses two rounds of single-strand cDNA synthesis to amplify mRNA and Affymetrix GeneChip WT terminal labeling technology to label fragment cDNA for hybridization. The detailed proctol can be found here (GG-H protocol).
Availability
The array platform has been depsited to NCBI GEO under GPL11319. An example data set is accessible at GSE26072 (and GSE26109 for the RNA-Seq data used in the paper).
The GG-H array can be ordered from Affymetrix as a custom array. For more information, please contact dbowe@stanford.edu or dbowe@stanford.edu .
Reference
Xu W, Seok J, Mindrinos MN, Schweitzer AC, Jiang H, Wilhelmy J, Clark TA, Kapur K, Xing Y, Faham M, Storey JD, Moldawer LL, Maier RV, Tompkins RG, Wong WH, Davis RW, Xiao W; Inflammation and Host Response to Injury Large-Scale Collaborative Research Program. Human transcriptome array for high-throughput clinical studies. Proc Natl Acad Sci U S A. 2011 Mar 1;108(9):3707-12. doi: 10.1073/pnas.1019753108. Epub 2011 Feb 11.
Questions and Comments
For questions and comments, please join our discussion group at http://groups.google.com/group/GGHarray.
Last modified 12/22/2012. Webmaster: weihongx@stanford.edu