You are here: CIDR>SNP Genotyping> General Information
Using Illumina GoldenGate and Infinium chemistries and Affymetrix Axiom chemistry, investigators have a wealth of genotyping study design options available to them. We currently offer production scale linkage, GWAS and custom SNP genotyping services which utilize LIMS tracking, robotic automation and strict QC standards.
Depending on access pathway, we include at no additional cost:
Clustering and Calling Genotype Data
For Illumina SNP genotyping services, genotype cluster definitions are determined using the Illumina Gentrain algorithm version 1.0 contained in GenomeStudio software. For Affymetrix genotyping services, Axiom Analysis Suite is used. We initially use the software to determine cluster boundaries using a project's samples. Sample call rate and quality metrics are evaluated and a small portion of samples will be marked for exclusion from project release due to poor data quality (call rate generally less than 97-98% for genomic DNAs). For Illumina projects, after exclusion of poor quality experiments, the clustering algorithm is run again for determination of final cluster positions. It is important to include only high quality raw data for accurate clustering.
Linkage and Custom Studies
For custom genotyping projects, a technical SNP filter is applied to data and a percentage of the data is manually reviewed, depending on the number of markers. Manually reviewed clusters are adjusted as necessary, using HapMap replicate and relationship status as a guide. Intensity data is released for any SNP that is technically filtered.
For linkage studies, a subset of linkage markers is chosen from the Illumina Core Array marker set, and that subset is manually reviewed. Manually reviewed clusters are adjusted as necessary, using HapMap replicate and relationship status as a guide. Intensity data is released for all SNPs on the array.
GWAS cluster definitions are determined with the same procedures with some modifications. A lower genotyping quality score is tolerated, manual review is only done for XY, Y and Mitochondrial SNPs and a SNP “technical filter” is applied to the GWAS data designed to remove genotypes only for markers that are complete assay failures. For any study that contains low frequency variants (exome array content or low MAF SNPs), CIDR performs additional manual review of some SNPs based on flags obtained from zCall.* For dbGaP posting purposes, the desire is to post a very raw form of the data thus aggressive genotype “dropping” is not performed.
Released Genotyping Data
SNP genotyping data released back to our investigators includes:
GWAS Data Cleaning
Additional assistance with post-release data processing is performed for many GWAS-level studies by personnel from the University of Washington Genetics Coordinating Center (UW GCC). This group provides assistance to the PI for data cleaning and posting of datasets to dbGaP as well as imputation to 1KGP.
The GWAS data cleaning process typically focuses first on resolving any sample identity problems identified at release (gender, Mendelian inconsistencies and cryptic relatedness issues). Samples are also identified that should be removed for some analyses but may be retained as part of the posting to dbGaP, such as unexpected relatives. Chromosome anomalies are identified, and genotypes are filtered from an anomalous region. Batch effects (samples processed together, DNA source or extraction method, substudy/site) are checked and differences in ethnicity are evaluated and controlled for in analysis. PCA is used to identify ethnic outliers and to calculate eigenvectors to adjust for population stratification in association analyses. SNP filters are developed including missing data filters, duplicate and Mendelian errors, minor allele frequency and Hardy-Weinberg equilibrium. A relatively simple association (“pre-compute”) analysis is performed to determine whether there is a problematic level of genomic inflation suggesting false positives. The pre-compute also allows investigators who access the data to verify they were able to download, merge the genotype and phenotype datasets and apply the filters correctly by repeating the pre-compute results. A QC report is prepared to be included on dbGaP which describes the dataset and results of the data cleaning process. In addition, UW personnel impute the genotypic data to 1KGP and post the results on dbGaP.
* zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, O'Dushlaine C, Moran JL, Chambert K, Stevens C; Swedish Schizophrenia Consortium; ARRA Autism Sequencing Consortium, Sklar P, Hultman CM, Purcell S, McCarroll SA, Sullivan PF, Daly MJ, Neale BM.
A Research Technologist examines an array
See our Services page for a full description of what is included for all study types
See our Sample Requirements page for detailed specifications for each product