The following protocol summaries are for the generation+sequencing of WGS/metagenome libraries on the Illumina NextSeq or PCR amplicons with multiple barcodes (ie: "indices") on either the Illumina MiSeq machine (paired-end mode) of length approx. 400-500 bp (300+300 bp with ~100-200 bp overlap) or of longer reads on the PacBio Sequel. It assumes an input of up to 384 (380 samples + 4 PCR controls) for MiSeq or 48 (47 samples + 1 PCR control) for Sequel. These protocols are a synthesis of multiple sources in the current scientific literature and are summarized (primarily amplicon protocols) in the "IMR paper" here below that outlines Microbiome Helper, as well as on our GitHub site for Microbiome Helper which outlines all the steps to follow for both the bioinformatics processing and the step-by-step wet-lab protocols (soon to be converted to Protocols.io):
Sample collection issues can be very specific to your sample type (ie: the medium which you are sampling) - many of our IMR projects so far focus on water/soil "environmental" samples or fecal matter. For the former, varying volumes of water are typically collected onto filters or a few grams of soil are collected into tubes and frozen at -20°C or -80°C, preferrably dry without a storage buffers. For the latter, fresh fecal pellets are collected from mice or human stool is sampled and then frozen immediately at -20°C or -80°C (again without buffer). All samples are kept frozen until extraction below. For transcriptomics approaches for environmental, lab culture or biopsy samples (not recommended for stool), samples must be immediately flash-frozen in liquid nitrogen after collection.
DNA/RNA is extracted from the samples using the method/kit appropriate to the specific samples - this may also be a choice you make for your personal samples since you have experience with them, or you may wish to follow our choice extraction methods. There is no general consensus in the literature on choice of kits, other than to say all of them affect the final profiles to some degree and that bead-beating is most probably a must for difficult materials (and/or with Gram+ves and protist cysts) - we have currently evaluated and are using the QIAGEN PowerFecal DNA Kit with mouse pellets and human fecal samples; the QIAGEN PowerSoil DNA Kit for soil particles; and the QIAGEN PowerWater DNA Kit for water filters. We tend to also use the QIAGEN PowerFecal DNA Kit as a general kit for swab and saliva extractions, as it has good inhibitor removal+overall performance, as well as the necessary bead-beating (note that swab samples are notoriously difficult/variable in output due to low biomass). We have also tested various DNA kits for use with human urine without much success due to their low target biomass. We have not yet ventured into RNA kits for metatranscriptomes, but this is something that we will be examining in the near future. Quantification and quality-checks are done (via PicoGreen/Qubit [primarily] and NanoDrop) to verify success. Optional: A gel can be run to verify integrity (generally unnecessary for PCR-only studies, but required for WGS sequencing).
Amplicon fragments are PCR-amplified from the DNA in duplicate using separate template dilutions (generally 1:1 & 1:10) using the high-fidelity Phusion polymerase. A single round of PCR is done using "fusion primers" (Illumina adaptors + indices + specific regions) targeting various sub-regions of the 16S (Bacteria/Archaea), 18S (Eukarya, incl. Fungi) or ITS2 (Fungi only) genes with multiplexing which allows up to 380 samples to be run. Preparation for PacBio Sequel is essentially the same, except full-length 16S/18S/ITS fusion primers (PacBio adaptors + barcodes + specific regions) are used instead. PCR products are verified visually by running on a high-throughput Hamilton Nimbus Select robot using Coastal Genomics Analytical Gels. The PCR reactions from the same samples are pooled in one plate, then cleaned-up and normalized using the high-throughput Charm Biotech Just-a-Plate 96-well Normalization Kit. Up to 380 (Illumina) or 47 (PacBio) samples are then pooled to make one library which is then quantified fluorometrically before sequencing.
Currently Available Amplicon Targets/Primers (recommended sets in bold)
|Primer Set Coverages (SILVA TestPrime, 0-2 mismatches)a|
|Primer Targets||Region(s)||Forward Primer||Reverse Primer||Source(s)||Archaea||Bacteria||Cyanos||Eukarya||mtDNA||chlDNA|
|Illumina MiSeq short variable region targets|
|Universal||V4-V5b||515FB = GTGYCAGCMGCCGCGGTAA||926R = CCGYCAATTYMTTTRAGTTT||Parada 2015 / Walters 2015||81-93%||85-95%||85-94%||81-94%||57-77%||81-93%|
|Archaea-specific||V6-V8||A956F = TYAATYGGANTCAACRCC||A1401R = CRGTGWGTRCAAGGRGCA||Comeau 2011||71-82%||-||-||0-89%||0-1%||0-1%|
|Bacteria-specific||V6-V8||B969F = ACGCGHNRAACCTTACC||BA1406R = ACGGGCRGTGWGTRCAA||Comeau 2011||0-14%||72-83%||66-88%||0-1%||14-75%||47-87%|
|Eukaryote-specific||V4||E572F = CYGCGGTAATTCCAGCTC||E1009R = AYGGTATCTRATCRTCTTYG||Comeau 2011||-||-||-||54-92%||1%||1%|
|Fungi-specific||ITS2c||ITS86(F) = GTGAATCATCGAATCTTTGAA||ITS4(R) = TCCTCCGCTTATTGATATGC||Op De Beeck 2014||n/a||n/a||n/a||n/a||n/a||n/a|
|Bacteria-specific||V1-V3d||27Fmod = AGRGTTTGATCMTGGCTCAG||519R = GWATTACCGCGGCKGCTG||Kim 2013 / Lane 1985||-||73-93%||44-89%||-||9-75%||24-87%|
|Bacteria-specific ("Illumina")||V3-V4||341F = CCTACGGGNGGCWGCAG||805R = GACTACHVGGGTATCTAATCC||Illumina / Klindworth 2013||0-90%||83-95%||71-93%||-||11-54%||49-90%|
|Bacteria+Archaea-specific ("EMP")||V4e||515FB = GTGYCAGCMGCCGCGGTAA||806RB = GGACTACNVGGGTWTCTAAT||Walters 2015||84-96%||84-95%||76-92%||0-19%||52-89%||64-89%|
|Cyano-specific||V3-V4f||CYA359F = GGGGAATYTTCCGCAATGGG||CYA781R = GACTACWGGGGTATCTAATCCCWTT||Nübel 1997||-||2-5%||58-88%||-||1-3%||35-79%|
|PacBio Sequel entire region targets|
|Bacteria-specific||Full 16S||27F(Paliy) = AGRGTTYGATYMTGGCTCAG||1492R = RGYTACCTTGTTACGACTT||Paliy 2009 / Lane 1991||-||72-83%||72-87%||-||42-74%||65-84%|
|Eukaryote-specific||Full 18Sg||NSF4/18 = CTGGTTGATYCTGCCAGT||EukR = TGATCCTTCTGCAGGTTCACCTAC||Hendriks 1989 / Medlin 1988||0-14%||-||-||82-92%||2%||1%|
|Fungi-specific||Full ITS||ITS1FKYO2 = TAGAGGAAGTAAAAGTCGTAA||ITS4KYO1 = TCCTCCGCTTWTTGWTWTGC||Toju 2012||n/a||n/a||n/a||n/a||n/a||n/a|
a) Green = good coverage; Yellow = moderate coverage; Red = poor/trace coverage. Chloroplast DNA coverage applies to Cyano-derived chloroplasts only.
b) The V4-V5 primer set is our overall recommended set, but should not be used for bacterial diversity in samples with substantial eukaryote "host/associated" contamination (use V6-V8 sets instead, however these are still susceptible to Euk mito contamination).
c) Although both ITS1 and ITS2 primers have substantial Ascomycete or Basidiomycete biases, the ITS2 region seems to be the more recommended region of the two with highest coverage of other phyla.
d) The 519R primer with this sequence is sometimes called "mod" or "modbio", but the stated sequence here is the actual Lane original sequence.
e) Note this is the newer EMP set, the older/original primers being 515F (GTGCCAGCMGCCGCGGTAA) + 806R (GGACTACHVGGGTWTCTAAT). These primers are not recommended for 2x300bp sequencing (which we always perform) is its size is too small - use V4V5 instead.
f) The Cyano primers in our hands tend to perform less well in PCR+sequencing than the other sets listed here, but are the only alternative for more Cyano-specific studies. However, note that they will not give more specific annotations than the other bacterial amplicons.
g) The listed coverage for this primer set is from only the NSF4/18 forward primer values as there is insufficient coverage of the EukR region in SILVA to get reliable results.
Microbial (or mtDNA) genomes and community metagenomes are prepared either using: 1) the Illumina Nextera Flex kit for MiSeq+NextSeq, which requires a very small amount of starting material (1 ng) as it is a PCR-based library preparation procedure; or 2) the PacBio SMRTbell Express Template Prep kit for the Sequel, which requires much more HMW DNA since it is not PCR-based. For the Illumina method, samples are "tagmented" (enzymatically "sheared" and tagged with adaptors), PCR amplified while adding barcodes, purified using columns or beads, normalized using Illumina beads or manually, then pooled for loading onto the MiSeq or NextSeq. For the PacBio method, a large amount of HMW DNA is mechanically sheared (Covaris gTubes), optionally size-selected, repaired, converted into SMRTbell libraries (covalently closed circles), cleaned-up and normalized, then pooled for loading onto the Sequel.
We recently completed our evaluation of multiple RNA-Seq kits with rRNA depletion and have selected the QIAseq FastSelect depletion + Stranded Total RNA kits for the production of sufficiently rRNA-depleted libraries (using three rRNA modules = Bacteria + Plant[covers Protists] + Fungi). Briefly, pure RNA samples are rRNA depleted, then the remaining mRNAs are converted to cDNA in a way that maintains stranded information, tagged with Illumina adaptors+barcodes, PCR amplified, purified using beads, then normalized for pooling + loading onto the NextSeq.
Amplicon samples and small genomes are run on our Illumina MiSeq using 300+300 bp paired-end V3 chemistry which allows for overlap and stitching togther of paired amplicon reads into one full-length read of higher quality. Output is generally ~20-22 million raw reads and ~13 Gb of sequence = ~50,000 reads per sample for 380 amplicons. Long amplicons and de novo genomes are run on our PacBio Sequel using v3 chemistry which outputs roughly 30-40 Gb of sequence per cell of variable (long) lengths. For larger metagenomic/metatranscriptomic projects, we run on our Illumina NextSeq 550 using 150+150 bp paired-end "high output" chemistry generating up to ~400 million raw reads and ~120 Gb of sequence.
Details of our amplicon and metagenomics pipelines are available at https://github.com/mlangill/microbiome_helper/wiki, but the following is a summary of the major deliverables clients will receive (analyses will require a "mapping file" from the clients containing any relevant metadata for the study):
16S/18S/ITS Amplicon Analysis
- Final ASV/OTU tables in text, BIOM and STAMP formats
- Accompanying QIIME-formatted mapping/metadata file
- FASTA file of representative sequences (one per ASV/OTU)
- Phylogenetic tree of ASVs/OTUs placed within reference sequences
- Taxonomic assignment files at various levels (ex: phylum, genus, etc.)
- Alpha-diversity rarefaction plots + statistics
- Beta-diversity UniFrac plots
- Logfiles from the various major steps in the QC process
- Functional prediction files generated from PICRUSt2.0 (if requested)
- FASTA files of the final sequences screened to remove human (or other) contaminants (available upon request)
- Taxonomic composition of the samples from Kraken2 (text and STAMP files)
- Stratified (by taxa) and unstratified functional prediction files generated from MMseqs2+Kraken2 (text and STAMP files) for individual UniRef90 gene families and MetaCyc pathways
Custom Bioinformatics: Additional bioinformatic analyses can be requested at an hourly rate or through research collaboration. Please contact us for more details.