Skip navigation

You are here: CIDR>SNP Genotyping> General Information


General Genotyping Information

Using Illumina Infinium chemistries and Affymetrix Axiom chemistry, investigators have a wealth of genotyping study design options available to them. We currently offer production scale linkage, GWAS and custom SNP genotyping services which utilize LIMS tracking, robotic automation and strict QC standards.



Depending on access pathway, we include at no additional cost:




Clustering and Calling Genotype Data


For Illumina SNP genotyping services, genotype cluster definitions are determined using the Illumina Gentrain algorithm version 1.0 contained in GenomeStudio software. For Affymetrix genotyping services, Axiom Analysis Suite is used. We initially use the software to determine cluster boundaries using a project's samples. Sample call rate and quality metrics are evaluated and a small portion of samples will be marked for exclusion from project release due to poor data quality (call rate generally less than 97-98% for genomic DNAs). For Illumina projects, after exclusion of poor quality experiments, the clustering algorithm is run again for determination of final cluster positions. It is important to include only high quality raw data for accurate clustering.



Linkage and Custom Studies

For custom genotyping projects, a technical SNP filter is applied to data and a percentage of the data is manually reviewed, depending on the number of markers.  Manually reviewed clusters are adjusted as necessary, using HapMap replicate and relationship status as a guide.  Intensity data is released for any SNP that is technically filtered. 

For linkage studies, a subset of linkage markers is chosen from the Illumina Core Array marker set, and that subset is manually reviewed.  Manually reviewed clusters are adjusted as necessary, using HapMap replicate and relationship status as a guide.  Intensity data is released for all SNPs on the array.



GWAS Studies

GWAS cluster definitions are determined with the same procedures with some modifications.   A lower genotyping quality score is tolerated, manual review is only done for XY, Y and Mitochondrial SNPs and a SNP “technical filter” is applied to the GWAS data designed to remove genotypes only for markers that are complete assay failures.  For any study that contains low frequency variants (exome array content or low MAF SNPs), CIDR performs additional manual review of some SNPs based on flags obtained from zCall.*  For dbGaP posting purposes, the desire is to post a very raw form of the data thus aggressive genotype “dropping” is not performed. 




Released Genotyping Data


SNP genotyping data released back to our investigators includes:




GWAS Data Cleaning


Additional assistance with post-release data processing is performed for many GWAS-level studies, providing assistance to the PI for data cleaning and posting of datasets to dbGaP as well as imputation to 1KGP. 


The GWAS data cleaning process typically focuses first on resolving any sample identity problems identified at release (gender, Mendelian inconsistencies and cryptic relatedness issues). Samples are also identified that should be removed for some analyses but may be retained as part of the posting to dbGaP, such as unexpected relatives. Batch effects (samples processed together, DNA source or extraction method, substudy/site) are checked and differences in ethnicity are evaluated and controlled for in analysis. PCA is used to identify ethnic outliers and to calculate eigenvectors to adjust for population stratification in association analyses. SNP filters are developed including missing data filters, duplicate and Mendelian errors, minor allele frequency and Hardy-Weinberg equilibrium. A relatively simple association (“pre-compute”) analysis is performed to determine whether there is a problematic level of genomic inflation suggesting false positives. The pre-compute also allows investigators who access the data to verify they were able to download, merge the genotype and phenotype datasets and apply the filters correctly by repeating the pre-compute results. A QC report is prepared to be included on dbGaP which describes the dataset and results of the data cleaning process. Data will be imputed and results posted to dbGaP.



* zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, O'Dushlaine C, Moran JL, Chambert K, Stevens C; Swedish Schizophrenia Consortium; ARRA Autism Sequencing Consortium, Sklar P, Hultman CM, Purcell S, McCarroll SA, Sullivan PF, Daly MJ, Neale BM.




Contact Us  |  Privacy Policy  |  Site Map  |  Get Adobe Reader




photo of lab tech


A Research Technologist examines an array


See our Services page for a full description of what is included for all study types



See our Sample Requirements page for detailed specifications for each product