Protocols

The following protocol summaries are for the generation+sequencing of WGS/metagenome libraries (Illumina NextSeq or PacBio Sequel 2) or PCR amplicons with multiple barcodes (ie: "indices") on either the Illumina MiSeq machine (paired-end mode) of length approx. 400-500 bp (300+300 bp with ~100-200 bp overlap) or of longer reads on the PacBio Sequel 2. It assumes an input of up to 384 (380 samples + 4 PCR controls) for MiSeq or 192 for Sequel 2 (190 samples + 2 PCR controls) or NextSeq2000 (92 metagenomes). These protocols are a synthesis of manufacturer's guidelines and our current scientific research and development. They are summarized (primarily amplicon protocols) in the "IMR paper" here below that outlines Microbiome Helper, as well as on our GitHub site for Microbiome Helper, which outlines all the steps to follow for the bioinformatics processing, and on our Protocols.io site conatining the step-by-step wet-lab protocols:

Sample Collection

Sample collection issues can be very specific to your sample type (ie: the medium which you are sampling) - many of our IMR projects so far focus on water/soil "environmental" samples or fecal matter. For the former, varying volumes of water are typically collected onto filters or a few grams of soil are collected into tubes and frozen at -20°C or -80°C, preferrably dry without a storage buffers. For the latter, fresh fecal pellets are collected from mice or human stool is sampled and then frozen immediately at -20°C or -80°C (again without buffer). All samples are kept frozen until extraction below. For transcriptomics approaches for environmental, lab culture or biopsy samples (not recommended for stool), samples must be immediately flash-frozen in liquid nitrogen after collection.

DNA/RNA Extraction

DNA/RNA is extracted from the samples using the method/kit appropriate to the specific samples - this may also be a choice you make for your personal samples since you have experience with them, or you may wish to follow our choice extraction methods. There is no general consensus in the literature on choice of kits, other than to say all of them affect the final profiles to some degree and that bead-beating is most probably a must for difficult materials (and/or with Gram+ves and protist cysts) - we have currently evaluated and are using the QIAGEN PowerFecal DNA Kit with mouse pellets and human fecal samples; the QIAGEN PowerSoil DNA Kit for soil particles; and the QIAGEN PowerWater DNA Kit for water filters. We tend to also use the QIAGEN PowerFecal DNA Kit as a general kit for swab and saliva extractions, as it has good inhibitor removal+overall performance, as well as the necessary bead-beating (note that swab samples are notoriously difficult/variable in output due to low biomass). We have also tested various DNA kits for use with human urine without much success due to their low target biomass. We have not yet ventured into RNA kits for metatranscriptomes, but this is something that we will be examining in the near future. Quantification and quality-checks are done (via PicoGreen/Qubit [primarily] and NanoDrop) to verify success. Optional: A gel can be run to verify integrity (generally unnecessary for PCR-only studies, but required for WGS sequencing).

Library Preparation

→ 16S/18S/ITS Amplicons

Amplicon fragments are PCR-amplified from the DNA in duplicate using separate template dilutions (generally 1:1 & 1:10) using the high-fidelity Phusion Plus polymerase. A single round of PCR is done using "fusion primers" (Illumina adaptors + indices + specific regions) targeting various sub-regions of the 16S (Bacteria/Archaea), 18S (Eukarya, incl. Fungi) or ITS2 (Fungi only) genes with multiplexing which allows up to 380 samples to be run. Preparation for PacBio Sequel 2 is essentially the same, except full-length 16S/18S/ITS fusion primers (PacBio barcodes + specific regions) are used instead. PCR products are verified visually by running on a high-throughput Hamilton Nimbus Select robot using Coastal Genomics Analytical Gels. The PCR reactions from the same samples are pooled in one plate, then cleaned-up and normalized using the high-throughput Charm Biotech Just-a-Plate 96-well Normalization Kit. Up to 380 (Illumina) or 190 (PacBio) samples are then pooled to make one library which is then quantified fluorometrically before sequencing.

Currently Available Amplicon Targets/Primers (recommended sets in bold)

Primer Set Coverages (SILVA TestPrime, 0-2 mismatches)a
Primer Targets Region(s) Forward Primer Reverse Primer Source(s) Archaea Bacteria Cyanos Eukarya mtDNA chlDNA
Illumina MiSeq short variable region targets

Standard rRNA targets

Universal V4-V5b 515FB = GTGYCAGCMGCCGCGGTAA 926R = CCGYCAATTYMTTTRAGTTT Parada 2015 / Walters 2015 81-93% 85-95% 85-94% 81-94% 57-77% 81-93%
Archaea-specific V6-V8 A956F = TYAATYGGANTCAACRCC A1401R = CRGTGWGTRCAAGGRGCA Comeau 2011 71-82% - - 0-89% 0-1% 0-1%
Bacteria-specific V6-V8 B969F = ACGCGHNRAACCTTACC BA1406R = ACGGGCRGTGWGTRCAA Comeau 2011 0-14% 72-83% 66-88% 0-1% 14-75% 47-87%
Eukaryote-specific V4 E572F = CYGCGGTAATTCCAGCTC E1009R = AYGGTATCTRATCRTCTTYG Comeau 2011 - - - 54-92% 1% 1%
Fungi-specific ITS2c ITS86(F) = GTGAATCATCGAATCTTTGAA ITS4(R) = TCCTCCGCTTATTGATATGC Op De Beeck 2014 n/a n/a n/a n/a n/a n/a
Bacteria-specific V1-V3d 27Fmod = AGRGTTTGATCMTGGCTCAG 519R = GWATTACCGCGGCKGCTG Kim 2013 / Lane 1985 - 73-93% 44-89% - 9-75% 24-87%
Bacteria-specific ("Illumina") V3-V4 341F = CCTACGGGNGGCWGCAG 805R = GACTACHVGGGTATCTAATCC Illumina / Klindworth 2013 0-90% 83-95% 71-93% - 11-54% 49-90%
Bacteria+Archaea-specific ("EMP") V4e 515FB = GTGYCAGCMGCCGCGGTAA 806RB = GGACTACNVGGGTWTCTAAT Walters 2015 84-96% 84-95% 76-92% 0-19% 52-89% 64-89%
Cyano-specific V3-V4f CYA359F = GGGGAATYTTCCGCAATGGG CYA781R = GACTACWGGGGTATCTAATCCCWTT Nübel 1997 - 2-5% 58-88% - 1-3% 35-79%

Metabarcoding targets

All Metazoans COI mlCOIintF-XT = GGWACWRGWTGRACWITITAYCCYCC jgHCO2198 = TAIACYTCIGGRTGICCRAARAAYCA Wangensteen 2018 / Geller 2013 n/a n/a n/a n/a n/a n/a
Fish 12S MiFishU-F = GTCGGTAAAACTCGTGCCAGC MiFishU-R = CATAGTGGGGTATCTAATCCCAGTTTG Miya 2015 n/a n/a n/a n/a n/a n/a
PacBio Sequel entire region targets

Standard rRNA targets

Bacteria-specific Full 16S 27F(Paliy) = AGRGTTYGATYMTGGCTCAG 1492R = RGYTACCTTGTTACGACTT Paliy 2009 / Lane 1991 - 72-83% 72-87% - 42-74% 65-84%
Eukaryote-specific Full 18SgNSF4/18 = CTGGTTGATYCTGCCAGT EukR = TGATCCTTCTGCAGGTTCACCTAC Hendriks 1989 / Medlin 1988 0-14% - - 82-92% 2% 1%
Fungi-specific Full ITS ITS1FKYO2 = TAGAGGAAGTAAAAGTCGTAA ITS4KYO1 = TCCTCCGCTTWTTGWTWTGC Toju 2012 n/a n/a n/a n/a n/a n/a
Notes/Details:
a) Green = good coverage; Yellow = moderate coverage; Red = poor/trace coverage. Chloroplast DNA coverage applies to Cyano-derived chloroplasts only.
b) The V4-V5 primer set is our overall recommended set, but should not be used for bacterial diversity in samples with substantial eukaryote "host/associated" contamination (use V6-V8 sets instead, however these are still susceptible to Euk mito contamination).
c) Although both ITS1 and ITS2 primers have substantial Ascomycete or Basidiomycete biases, the ITS2 region seems to be the more recommended region of the two with highest coverage of other phyla.
d) The 519R primer with this sequence is sometimes called "mod" or "modbio", but the stated sequence here is the actual Lane original sequence.
e) Note this is the newer EMP set, the older/original primers being 515F (GTGCCAGCMGCCGCGGTAA) + 806R (GGACTACHVGGGTWTCTAAT). These primers are not recommended for 2x300bp sequencing (which we always perform) is its size is too small - use V4V5 instead.
f) The Cyano primers in our hands tend to perform less well in PCR+sequencing than the other sets listed here, but are the only alternative for more Cyano-specific studies. However, note that they will not give more specific annotations than the other bacterial amplicons.
g) The listed coverage for this primer set is from only the NSF4/18 forward primer values as there is insufficient coverage of the EukR region in SILVA to get reliable results.

→ (Meta)genomes ("Shotgun")

Microbial (or mtDNA) genomes and community metagenomes are prepared either using: 1) the Illumina Nextera Flex kit for MiSeq+NextSeq (now called DNA Prep kit), which requires a very small amount of starting material (1 ng) as it is a PCR-based library preparation procedure; or 2) the PacBio SMRTbell Prep Kit 3.0 for the Sequel, which requires more HMW DNA since it is not PCR-based. For the Illumina method, samples are "tagmented" (enzymatically "sheared" and tagged with adaptors), PCR amplified while adding barcodes, purified using columns or beads, normalized using Illumina beads or manually, then pooled for loading onto the MiSeq or NextSeq. For the PacBio method, the HMW DNA is mechanically sheared (Covaris gTubes), optionally size-selected, repaired, converted into SMRTbell libraries (covalently closed circles), cleaned-up and normalized, then pooled for loading onto the Sequel.

→ (Meta)transcriptomes ("RNA-Seq")

We recently completed our evaluation of multiple RNA-Seq kits with rRNA depletion and have selected the QIAseq FastSelect depletion + Stranded Total RNA kits for the production of sufficiently rRNA-depleted libraries (using three rRNA modules = Bacteria + Plant[covers Protists] + Fungi). Briefly, pure RNA samples are rRNA depleted, then the remaining mRNAs are converted to cDNA in a way that maintains stranded information, tagged with Illumina adaptors+barcodes, PCR amplified, purified using beads, then normalized for pooling + loading onto the NextSeq.

Next-Generation Sequencing

Amplicon samples and small genomes are run on our Illumina MiSeq using 300+300 bp paired-end V3 chemistry which allows for overlap and stitching togther of paired amplicon reads into one full-length read of higher quality. Output is generally ~20-22 million raw reads and ~13 Gb of sequence = ~50,000 reads per sample for 380 amplicons. Long amplicons and de novo genomes are run on our PacBio Sequel 2 using new 8M chips which output roughly 240-320 Gb of sequence per cell of variable (long) lengths. For larger metagenomic/metatranscriptomic projects, we run on our Illumina NextSeq 2000 using 150+150 bp paired-end "high output" chemistry generating up to ~1.2 billion raw PE reads and ~345 Gb of sequence.

Bioinformatics Analyses

Details of our amplicon and metagenomics pipelines are available at https://github.com/mlangill/microbiome_helper/wiki, but the following is a summary of the major deliverables clients will receive (analyses will require a "mapping file" from the clients containing any relevant metadata for the study):

16S/18S/ITS Amplicon Analysis

  • Final ASV tables in text, BIOM and STAMP formats
  • Accompanying QIIME2-formatted mapping/metadata file
  • FASTA file of representative sequences (one per ASV)
  • Phylogenetic tree of ASVs placed within reference sequences
  • Taxonomic assignment files at various levels (ex: phylum, genus, etc.)
  • Alpha-diversity rarefaction plots + statistics
  • Beta-diversity UniFrac plots
  • Logfiles from the various major steps in the QC process
  • Functional prediction files generated from PICRUSt2.0 (if requested)

Metagenomics/Metatranscriptomics Analysis

  • FASTA files of the final sequences screened to remove human (or other) contaminants (available upon request)
  • Taxonomic composition of the samples from Kraken2 (text and STAMP files)
  • Stratified (by taxa) and unstratified functional prediction files generated from MMseqs2+Kraken2 (text and STAMP files) for individual UniRef90 gene families and MetaCyc pathways

Custom Bioinformatics: Additional bioinformatic analyses can be requested at an hourly rate or through research collaboration. Please contact us for more details.