You are here: CIDR>Sequencing> General Information
Investigators have a wealth of study design options available to them using next generation sequencing technologies. We currently offer production scale whole genome, whole exome and custom targeted sequencing services which utilize LIMS tracking, robotic automation, strict QC standards and automated primary, secondary and tertiary analyses.
CIDR utilizes the Illumina HiSeq platform and either Agilent SureSelect or Nimblegen SeqCap Target Enrichment chemistries.
Sequencing Release Formats
The QC report is a per-sample report that contains >100 QC metrics, including sequencing completeness and coverage/depth information (such as mean coverage, % targeted bases covered at 20X), quality measures against high density SNP array (heterozygote sensitivity, homozygote and heterozygote concordance with SNP array), variant call quality measures (% dbSNP and TiTv ratio) and a number of laboratory oriented measures (% duplicates, library size, insert size, etc.). Cross-sample contamination is estimated using VerifyBamID. The QC report is used in-house to monitor QC metrics real-time and to quickly evaluate data.
ANNOVAR is incorporated in our CIDRSeqSuite pipeline and is used for annotation of variants from VCF files. A merged per-sample ANNOVAR report includes information from databases such as RefGene, UCSC, Ensembl, dbSNP131&132, OMIM and the NHGRI GWAS catalogue. Global and population-specific non-reference allele frequencies are provided from Complete Genomics, 1000 Genomes Project and NHLBI Exome Sequencing Project (ESP), and prediction scores from SIFT, polyphen and dbNSFP.
CIDR can provide single-sample and/or multi-sample VCF files. SNV and small indels are called using GTAK's 3+HaplotypeCaller joint-calling gVCF workflow using all samples sequenced for a project. Depending on the size of the project, variant filtering is performed either using GATK's Variant Quality Score Recalibration (VQSR) protocol or using "hard" cut-offs.
The most raw form of data released are the BAM or CRAM files. We prefer to release loss-less CRAM files generated using SAMTools 1.3.2.
In addition, we release the genotypes obtained from a high density SNP array (whole exome and whole genome services) or a 96-SNP barcode (custom sequencing service).
Although our exploration of new analysis software and new algorithms is ongoing, we use an established analysis pipeline, CIDRSeqSuite, for processing sequence data. The CIDRSeqSuite pipeline is updated frequently to enhance the quality of alignment and variant calling and details are included in the release.
Please inquire if you have specific questions related to sequencing release formats.
Senior Informatics Research Analyst Brian Craig programs the MiSeq