The following protocol summaries are for the generation of paired-end sequencing reads of 16S or 18S PCR amplicons with multiple barcodes (ie: "indices") on the Illumina MiSeq machine of length approx. 400-500 bp (300+300 bp with ~100-200 bp overlap). It assumes an input of up to 384 (380 samples + 4 PCR controls). These protocols are a synthesis of multiple sources in the current scientific literature, but draws mainly from the following sources:
- Comeau AM, Douglas GM, Langille M. 2017. Microbiome Helper: A custom and streamlined workflow for microbiome research. mSystems, 2:e00127-16. [overview of the entire IMR wet-lab and bioinformatics pipeline]
- Comeau AM, Li WK, Tremblay JE, Carmack EC, Lovejoy C. 2011. Arctic Ocean microbial community structure before and after the 2007 record sea ice minimum. PLoS ONE, 6:e27492. [initial 16S V6-V8 and 18S V4 primer design/sequences]
- Walters W, et al. 2015. Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal Internal Transcribed Spacer marker gene primers for microbial community surveys. mSystems, 1:e00009-15. [16S V4-V5 primer design/sequences]
- Op De Beeck M, Lievens B, Busschaert P, Declerck S, Vangronsveld J, Colpaert JV 2014. Comparison and validation of some ITS primer pairs useful for fungal metabarcoding studies. PLoS ONE, 9:e97629. [ITS2 primer design/sequences]
- Earth Microbiome Project (EMP) at www.earthmicrobiome.org/protocols-and-standards/ [blocking protocol for eukaryote contaminants]
- Human Microbiome Project (HMP) at hmpdacc.org/resources/tools_protocols.php [general considerations/extraction for stool samples]
Sample collection issues can be very specific to your sample type (ie: the medium which you are sampling) - many of our IMR projects so far focus on water column "environmental" samples or fecal matter. For the former, varying volumes of water are typically collected onto filters and frozen at -20°C or -80°C in a storage buffer. For the latter, fresh fecal pellets are collected from mice or human stool is sampled and then frozen immediately at -20°C or -80°C (without buffer). All samples are kept frozen until extraction below. For transcriptomics approaches for environmental, lab culture or biopsy samples (not recommended for stool), samples must be immediately flash-frozen in liquid nitrogen after collection.
DNA/RNA is extracted from the samples using the method/kit appropriate to the specific samples - this may also be a choice you make for your personal samples since you have experience with them, or you may wish to follow our choice extraction methods. There is no general consensus in the literature on choice of kits, other than to say all of them affect the final profiles to some degree and that bead-beating is most probably a must for difficult materials (and/or with Gram+ves and protist cysts) - we have currently evaluated and are using the MO BIO PowerFecal DNA Kit with mouse pellets and human fecal samples. We are also testing various DNA kits for use with human urine. We have not yet ventured into RNA kits for metatranscriptomes, but this is something that we will be examining in the near future. Quantification and quality-checks are done (via NanoDrop or PicoGreen/Qubit) to verify success. Optional: A gel can be run to verify integrity (generally unnecessary for PCR-only studies, but required for shotgun metagenomic sequencing).
Amplicon fragments are PCR-amplified from the DNA in duplicate using separate template dilutions (generally 1:1 & 1:10) using the high-fidelity Phusion polymerase. A single round of PCR is done using "fusion primers" (Illumina adaptors + indices + specific regions) targeting either the 16S V6-V8 (Bacteria/Archaea; ~440-450 bp), 16S V4-V5 (primarily Bacteria; ~410 bp), 18S V4 (Eukarya; ~440 bp) or ITS2 (Fungi; variable length, avg. ~350 bp) regions with multiplexing which allows up to 380 samples to be run. PCR products are verified visually by running on a high-throughput Hamilton Nimbus Select robot using Coastal Genomics Analytical Gels. Any samples with failed PCRs (or spurious bands) are re-amplified by optimizing PCR conditions to produce correct bands in order to complete the sample plate before continuing. The PCR reactions from the same samples are pooled in one plate, then cleaned-up and normalized using the high-throughput Charm Biotech Just-a-Plate 96-well Normalization Kit. The (up to) 380 samples are then pooled to make one library which is then quantified fluorometrically before sequencing.
Microbial (or mtDNA) genomes and community metagenomes are prepared either using: 1) the Illumina Nextera Flex kit for MiSeq+NextSeq, which requires a very small amount of starting material (1 ng) as it is a PCR-based library preparation procedure; or 2) the PacBio SMRTbell Template Prep kit for the Sequel, which requires much more HMW DNA since it is not PCR-based. For the Illumina method, samples are "tagmented" (enzymatically "sheared" and tagged with adaptors), PCR amplified while adding barcodes, purified using columns or beads, normalized using Illumina beads or manually, then pooled for loading onto the MiSeq or NextSeq. For the PacBio method, a large amount of HMW DNA is mechanically sheared (Covaris gTubes), optionally size-selected, repaired, converted into SMRTbell libraries, cleaned-up and normalized, then pooled for loading onto the Sequel.
We are currently in the process of evaluating Illumina (Ribo-Zero + TruSeq Stranded mRNA LT) vs. NuGEN (Ovation Complete Prokaryotic RNA-Seq) kits for the production of sufficiently rRNA-depleted libraries for RNA-Seq. Briefly, after rRNA depletion, remaining mRNAs are converted to cDNA in a way that maintains stranded information, tagged with adaptors+barcodes, PCR amplified, purified using columns or beads, normalized using beads or manually, then pooled for loading onto the MiSeq or NextSeq.
Amplicon samples, small metagenomic sets, and small genomes are run on our Illumina MiSeq using 300+300 bp paired-end V3 chemistry which allows for overlap and stitching togther of paired amplicon reads into one full-length read of higher quality. Output is generally ~20-22 million raw reads and ~13 Gb of sequence = ~50,000 reads per sample for 380 amplicons. De novo genomes are run on our PacBio Sequel using v3 chemistry which outputs roughly 30-40 Gb of sequence per cell of variable (long) lengths. For larger metagenomic/metatranscriptomic projects, we run on our hospital-shared Illumina NextSeq 550 using 150+150 bp paired-end "high output" chemistry generating up to ~400 million raw reads and ~120 Gb of sequence.
Details of our amplicon and metagenomics pipelines are available at https://github.com/mlangill/microbiome_helper/wiki, but the following is a summary of the major deliverables clients will receive (analyses will require a "mapping file" from the clients containing any relevant metadata for the study):
16S/18S/ITS Amplicon Analysis
- Final ASV/OTU tables in text, BIOM and STAMP formats
- Accompanying QIIME-formatted mapping/metadata file
- FASTA file of representative sequences (one per ASV/OTU)
- Phylogenetic tree of ASVs/OTUs placed within reference sequences
- Taxonomic assignment files at various levels (ex: phylum, genus, etc.)
- Alpha-diversity rarefaction plots + statistics
- Beta-diversity UniFrac plots
- Logfiles from the various major steps in the QC process
- Functional prediction files generated from PICRUSt2.0 (if requested)
- FASTA files of the final sequences screened to remove human (or other) contaminants (available upon request)
- Taxonomic composition of the samples from MetaPhlAn 2.0 (text and STAMP files)
- Stratified (by taxa) and unstratified functional prediction files generated from HUMAnN 2.0 (text and STAMP files) for individual UniRef90 gene families and MetaCyc pathways
Custom Bioinformatics: Additional bioinformatic analyses can be requested at an hourly rate or through research collaboration. Please contact us for more details.