Data analysis/Data mining

Workflow of Affymetrix GeneChip data analysis
Our global gene expression services offer Affymetrix GeneChips Whole-Transcript Expression arrays and 3’ IVT Expression arrays. 200 ng of total RNA is used in for the labelling reaction in accordance with manufacturer-protocol. The Hybridization cocktail is loaded to the array cartridge and hybridized in the Hybridization Oven for 16 hours on 45-Celsius degree. Washing steps are fully automated (Fluidics Station 450).
For scanning we use Affymetrix GeneChip Scanner 3000 7G System. Primary data are collected by Affymetrix GeneChip Command Console (AGCC) software. DAT file (image file) contains the raw data, AGCC software aligns grid on the image file and generates CEL file. CEL files can be used for further Quality control (QC) and statistical analysis.
Quality Control
For Quality Control analysis we use Affymetrix Expression Console (EC) software. CEL files are imported into EC software, quantile normalization Robust Multichip Analysis (RMA) is performed and CHP files are generated. CHP files contain RMA normalized data.
Quality of the data is determined by checking the internal spike controls (bac spikes and poly-A controls) and the signal distribution of the arrays (relative box plot).
Statistical Analysis
GeneSpring 7.3 and/or 11.5 software (Agilent BioTechonologies) are used for analyzing of the Affymetrix GeneChips. CEL files are imported into GeneSpring using RMA/GCRMA algorithm. Baseline transformation steps may be applied on samples: median of all samples or median/mean of a specific sample.
Determination of expressed genes
-
We add parameters of the samples
-
Create interpretation to determine experimental groups
-
Determine expressed genes: filter on expression of raw values between 20-100 percentile
-
Determine non-changing and changing genes: filter on expression of normalized values between -0.5 to 0.5 (non-changing genes). Remove list of non-changing genes from list of genes raw values between 20-100 percentile = changing genes
-
Statistical test: parametric/non-parametric test with/without Multiple Testing Correction (Benjamini-Hochberg Correction) ® differentially expressed genes
-
Data interpretation: Pathway analysis/Gene Ontology analysis
|
|

|
The workflow of the 384-well TLDA Micro Fluidic Card analysis
1. Basic analysis
Objective: to check the operability of the cards.
We monitor the realization of amplification after each runs, also the compliance of curve intensity assays. If a card has a problem of any sort, or one of the sample does not work well, we inform our partner immediately to give the chance to change the research plan in time.
When all data are collected, we create a detailed basic analysis. Its steps are as follows:
-
Importing data to SDS software from each card and clicking on analyze
-
Visual checking of the existence of amplification curves and shapes in every row
-
Recording the position of aberrant amplified assays
-
Performing the analyses, export data to txt format
-
Making an excel document from txt format
-
Picking the „Undetermined” assays. These are the so-called „failed Assays”, which failures may have several explanations:
a) „Failed assays1”: some assays on some plates create aberrant signals after analyzed together. Their dRn value is extremely high, usually over 100, but to these wells Ct value can not be attached. For further analyses it is strongly recommended to delete these Ct values.
b) „Failed assays2”: dRn < 1
The aberrant amplification curve of these assay types are well detected in the case of individual analysis. These assays are likely to be defective. For further analyses it is strongly recommended to delete these Ct values.
You have to be careful with assays which have generally smaller dRn values, even if they have Ct value. On the amplification curves the weak amplification is clearly detectable, which may result in an incorrect measurement.
2. Normalization
Objective: to normalize the raw Ct values obtained from basic analysis with using appropriate normalizing (housekeeping) genes.
Since it is usually unpredictable which are the most ideal genes for normalization in a given experimental system, we recommend to use a minimum of 3-5 genes. For information about the most commonly used normalizing genes click on ABI’s website.
To select the normalizing gene we use the algorithm published in Andersen et al. (2004), Cancer Research 64 5245-5250.
3. Biostatistics
Objective: to identify the significant different genes from normalized gene expression data.
To prepare the biostatistics based on the principles of Integrated Databases part. The result will be easily comprehended and based on our partner needs, the results may be formed to be ready for publishing and consultation.
|