an integrated toolkit for analysis of cross-platform gene expression data

Getting started

Install and configure

1. Start the Bash shell
$ bash

2. Download the anyexpress executables, unzip it to your local computer and obtain the current path to AnyExpress (to be used in step 4 below).
$ unzip
$ pwd

3. Open .bash_profile file
$ vi ~/.bash_profile

4. Add the following four lines to .bash_profile file
export PATH

5. Reload the .bash_profile file
$ source ~/.bash_profile

6. Verify the environment variable and the path setting
$ env | grep ANYEXPRESS

Get the example data

Download the example data , unzip it and copy all the files to the directory ‘examples’.
$ cd $ANYEXPRESS_HOME/examples

Build a reference target

We refer to a target as a biologically meaningful expression unit against which tags(=probes/reads) will be matched using their genomic positions. Each target is a collection of five attributes: chromosome, strand, start position, end position, and identifier. AnyExpress accepts the target as a .BED format file where those five fields are separated by tabs. A separate .BED file is needed per unique combination of followings:

  • Organism (e.g. human,mMouse, Arabidopsis)
  • Expression unit (e.g. gene, isoform, exon)
  • Reference source (e.g. RefGene, Ensembl, UCSC KnownGene, MGI)
  • Build version (e.g. year 2004, 2006, 2009)

For example, following four .BED files of human are considered to be different targets:
  • Human_Ensembl_Feb2009_isoform.BED
  • Human_UCSCKnownGene_Feb2009_gene.BED
  • Human_RefSeq_Feb2009_gene
  • Human_RefSeq_Mar2006_gene.BED

To build a target in anyexpress, issue the following command.
$ anyexpress BuildTarget /path_to/human_refGene_2010May_gene.BED

You can build multiple targets with a single run of BuildTarget:
$ anyexpress BuildTarget /path_to/target_A.BED /path_to/target_B.BED /path_to/target_C.BED

Type the command below in an examples directory.

$ cd $ANYEXPRESS_HOME/examples
$ anyexpress BuildTarget Hg19_RefSeq2010_Gene.BED

Build an exclusion feature

A exclusion feature is a biologically meaningful entity against which probe/reads will be compared to be removed. Exclusion features allow users to apply a filter against the tags to select out undesirable ones. Previous studies have shown the negative effect of low quality microarray probes on measurement of gene expression abundance and consequently on the interpretation of the results. For example, the presence of SNPs within the probe sequence would cause incorrect estimation of mRNA abundance.Therefore, a user wants to eliminate it in a certain research context. Note: This tool can build more than one exclusion feature at a time.

To build an exclusion features in anyexpress, issue the following command.
$ anyexpress BuildExclusionFeature /path_to/dbsnp129-single.BED

You can build multiple exclusion features with a single run of BuildExclusionFeature :
$ anyexpress BuildExclusionFeature /path_to/ex_feature_A.BED /path_to/ex_feature_B.BED /path_to/ex_feature_C.BED

Type the command below in an examples directory.

$ cd $ANYEXPRESS_HOME/examples
$ anyexpress BuildExclusionFeature human_snp130_nonsynon.BED

Bind Affymetrix .cel files

We defined the input format of closed-platform samples for AnyExpress as a single column-bound, tab-delimited text file where the first column is a probe identifier followed by measurement values of the samples in the second column. This is a common data format for micrroarray in non-Affymetrix platforms. However, in Affymetrix, each sample is a .cel file and they need to get bound column-wise before main analysis. AnyExpress provides a scalable binding tool, AnyExpress BindAffyCel, to create a single column-bound file from a large number of Affymetrix .cel File.

To bind Affymetrix .CEL files in anyexpress, put all .cel files of the sample platform into a single direcotry and issue the following command.
$anyexpress BindAffyCel /path_to_CEL_files output_bound.txt

Type the command below in an examples directory.

$ cd $ANYEXPRESS_HOME/examples
$ anyexpress BindAffyCel . output_bound.txt

Normalize a probe-by-sample microarray file

Quantile-normalization is a rank-invariant transformation of measurement values to have the identical distribution of measurement values across all samples once they get processeds. The column-bound file can be directly used for an input to anyexpress Combine, but it is highly recommended to first perform between-sample normalization of this data to remove systematic bias and enable fair comparison among samples.

To peform quantile-normalization to a probe-by-sample text file of the same microarray platform in anyexpress, issue the following command.
$ anyexpress NormalizeColumnBoundSamples probe_by_sample.txt output.txt

Type the command below in an examples directory.

$ cd $ANYEXPRESS_HOME/examples
$ anyexpress NormalizeColumnBoundSamples MarioniAffy.bound output_norm.txt

Combine cross-platform files

A summarization file is created per platform during the < Summarize > process, then the multiple summarization files are merged into a gene-by-sample text file through a < Join > process during < Combine >.

To combine both open- and closed-platform gene expression data, issue the following command.
$ anyexpress Combine -c /path_to/U133A.BED /path_to/4samples.txt -o /path_to/SRR002322.BED -t human_refGene_2010May -e multiTarget dbsnp129-single -p /path_to/projectName

A detailed instruction on Combine is illustrated in a figure below:

Type the command below in an examples directory.

$ cd $ANYEXPRESS_HOME/examples
$ anyexpress Combine -c Affymetrix_HGU133PLUS2.BED MarioniAffy.norm -o SRR002322.BED SRR002324.BED -t Hg19_RefSeq2010_Gene -p ./myProject

Draw a coverage plot

The user can draw a plot by typing in five parameters: a directory in which the users Project is, chromosome, strand (‘forward’ or ‘backward’), start position, and end position. Based on user’s combined results in Project, each open-platform and each sample in a closed-platform are drawn as tracks (rows) in the .bedGraph file. The user needs to upload this file onto the UCSC Genome Browser through his/her own web-browser.

To draw a coverage plot (.bedGraph format), issue the following command.
$ anyexpress Plot projectName chrom strand start end

Type the command below in an examples directory.

$ cd $ANYEXPRESS_HOME/examples/myProject
$ anyexpress Plot ./myProject chr5 forward 150395999 150410551

Upload the plot file (‘coveragePlot.bedGraph’ under the ‘results’ directory of the project directory ‘./myProject’) to the custrom track in UCSC genome browser (See figure below).