Protocols

The following protocol summaries are for the generation+sequencing of WGS/metagenome libraries (Illumina NextSeq or PacBio Vega) or PCR amplicons with multiple barcodes (ie: "indices") on either the Illumina MiSeq i100 machine (paired-end mode) of length approx. 400-500 bp (300+300 bp with ~100-200 bp overlap) or of longer reads on the PacBio Vega. It assumes an input of up to 384 (380 samples + 4 PCR controls) for MiSeq or 192 for Vega (190 samples + 2 PCR controls) or NextSeq2000 (92 metagenomes). These protocols are a synthesis of manufacturer's guidelines and our current scientific research and development. They are summarized (primarily amplicon protocols) in the "IMR paper" here below that outlines Microbiome Helper, as well as on our GitHub site for Microbiome Helper, which outlines all the steps to follow for the bioinformatics processing, and on our Protocols.io site conatining the step-by-step wet-lab protocols:

Sample Collection

Sample collection issues can be very specific to your sample type (ie: the medium which you are sampling) - many of our IMR projects so far focus on water/soil "environmental" samples or fecal matter. For the former, varying volumes of water are typically collected onto filters or a few grams of soil are collected into tubes and frozen at -20°C or -80°C, preferrably dry without a storage buffers. For the latter, fresh fecal pellets are collected from mice or human stool is sampled and then frozen immediately at -20°C or -80°C (again without buffer). All samples are kept frozen until extraction below. For transcriptomics approaches for environmental, lab culture or biopsy samples (not recommended for stool), samples must be immediately flash-frozen in liquid nitrogen after collection.

DNA/RNA Extraction

DNA/RNA is extracted from the samples using the method/kit appropriate to the specific samples - this may also be a choice you make for your personal samples since you have experience with them, or you may wish to follow our choice extraction methods. There is no general consensus in the literature on choice of kits, other than to say all of them affect the final profiles to some degree and that bead-beating is most probably a must for difficult materials (and/or with Gram+ves and protist cysts) - we have currently evaluated and are using the QIAGEN PowerFecal DNA Kit with mouse pellets and human fecal samples; the QIAGEN PowerSoil DNA Kit for soil particles; and the QIAGEN PowerWater DNA Kit for water filters. We tend to also use the QIAGEN PowerFecal DNA Kit as a general kit for swab and saliva extractions, as it has good inhibitor removal+overall performance, as well as the necessary bead-beating (note that swab samples are notoriously difficult/variable in output due to low biomass). We have also tested various DNA kits for use with human urine without much success due to their low target biomass. We have not yet ventured into RNA kits for metatranscriptomes, but this is something that we will be examining in the near future. Quantification and quality-checks are done (via PicoGreen/Qubit [primarily] and NanoDrop) to verify success. Optional: A gel can be run to verify integrity (generally unnecessary for PCR-only studies, but required for WGS sequencing).

Library Preparation

→ 16S/18S/ITS Amplicons

Amplicon fragments are PCR-amplified from the DNA in duplicate using separate template dilutions (generally 1:1 & 1:10) using the high-fidelity Phusion Plus polymerase. A single round of PCR is done using "fusion primers" (Illumina adaptors + indices + specific regions) targeting various sub-regions of the 16S (Bacteria/Archaea), 18S (Eukarya, incl. Fungi) or ITS2 (Fungi only) genes with multiplexing which allows up to 380 samples to be run. Preparation for PacBio Vega is essentially the same, except full-length 16S/18S/ITS fusion primers (PacBio barcodes + specific regions) are used instead. PCR products are verified visually by running on a high-throughput Hamilton Nimbus Select robot using Coastal Genomics Analytical Gels. The PCR reactions from the same samples are pooled in one plate, then cleaned-up and normalized. Up to 380 (Illumina) or 190 (PacBio) samples are then pooled to make one library which is then quantified fluorometrically before sequencing.

Currently Available Amplicon Targets/Primers (recommended sets in bold)

					Primer Set Coverages (SILVA TestPrime, 0-2 mismatches)^a
Primer Targets	Region(s)	Forward Primer	Reverse Primer	Source(s)	Archaea	Bacteria	Cyanos	Eukarya	mtDNA	chlDNA
Illumina MiSeq short variable region targets
Standard rRNA targets
Universal	V4-V5^b	515FB = GTGYCAGCMGCCGCGGTAA	926R = CCGYCAATTYMTTTRAGTTT	Parada 2015 / Walters 2015	81-93%	85-95%	85-94%	81-94%	57-77%	81-93%
Archaea-specific	16S V6-V8	A956F = TYAATYGGANTCAACRCC	A1401R = CRGTGWGTRCAAGGRGCA	Comeau 2011	71-82%	-	-	0-89%	0-1%	0-1%
Bacteria-specific	16S V6-V8	B969F = ACGCGHNRAACCTTACC	BA1406R = ACGGGCRGTGWGTRCAA	Comeau 2011	0-14%	72-83%	66-88%	0-1%	14-75%	47-87%
Bacteria-specific	16S V3-V4	341F = CCTACGGGNGGCWGCAG	805R = GACTACHVGGGTATCTAATCC	Illumina / Klindworth 2013	0-90%	83-95%	71-93%	-	11-54%	49-90%
Eukaryote-specific	18S V4	E572F = CYGCGGTAATTCCAGCTC	E1009R = AYGGTATCTRATCRTCTTYG	Comeau 2011	-	-	-	54-92%	1%	1%
Fungi-specific	ITS2^c	ITS86(F) = GTGAATCATCGAATCTTTGAA	ITS4(R) = TCCTCCGCTTATTGATATGC	Op De Beeck 2014	n/a	n/a	n/a	n/a	n/a	n/a
Bacteria-specific	16S V1-V3^d	27Fmod = AGRGTTTGATCMTGGCTCAG	519R = GWATTACCGCGGCKGCTG	Kim 2013 / Lane 1985	-	73-93%	44-89%	-	9-75%	24-87%
Bacteria+Archaea-specific ("EMP")	16S V4^e	515FB = GTGYCAGCMGCCGCGGTAA	806RB = GGACTACNVGGGTWTCTAAT	Walters 2015	84-96%	84-95%	76-92%	0-19%	52-89%	64-89%
Cyano-specific	16S V3-V4^f	CYA359F = GGGGAATYTTCCGCAATGGG	CYA781R = GACTACWGGGGTATCTAATCCCWTT	Nübel 1997	-	2-5%	58-88%	-	1-3%	35-79%
Arbuscular Mycorrhiza (AMF)-specific	18S V4^g	WANDA = CAGCCGCGGTAATTCCAGCT	AML2 = GAACCCAAACACTTTGGTTTCC	Lee 2008 / Dumbrell 2011	-	n/a	n/a	n/a	n/a	n/a
Metabarcoding targets (require preamplification)
All Metazoans	COI	mlCOIintF-XT = GGWACWRGWTGRACWITITAYCCYCC	jgHCO2198 = TAIACYTCIGGRTGICCRAARAAYCA	Wangensteen 2018 / Geller 2013	n/a	n/a	n/a	n/a	n/a	n/a
Fish	12S	MiFishU-F = GTCGGTAAAACTCGTGCCAGC	MiFishU-R = CATAGTGGGGTATCTAATCCCAGTTTG	Miya 2015	n/a	n/a	n/a	n/a	n/a	n/a
PacBio Vega entire region targets
Standard rRNA targets
Archaea-specific	Full 16S	Arch21Ftrim = TCCGGTTGATCCYGCCGG	A1401R = CRGTGWGTRCAAGGRGCA	Reysenbach 2000 / Comeau 2011	58-91%	-	-	0-6%	-	-
Bacteria-specific	Full 16S	27F(Paliy) = AGRGTTYGATYMTGGCTCAG	1492R = RGYTACCTTGTTACGACTT	Paliy 2009 / Lane 1991	-	72-83%	72-87%	-	42-74%	65-84%
Eukaryote-specific	Full 18S^h	NSF4/18 = CTGGTTGATYCTGCCAGT	EukR = TGATCCTTCTGCAGGTTCACCTAC	Hendriks 1989 / Medlin 1988	0-14%	-	-	82-92%	2%	1%
Fungi-specific	Full ITS	ITS1FKYO2 = TAGAGGAAGTAAAAGTCGTAA	ITS4KYO1 = TCCTCCGCTTWTTGWTWTGC	Toju 2012	n/a	n/a	n/a	n/a	n/a	n/a

Notes/Details:
a) Green = good coverage; Yellow = moderate coverage; Red = poor/trace coverage. Chloroplast DNA coverage applies to Cyano-derived chloroplasts only.
b) The V4-V5 primer set is our overall recommended set, but should not be used for bacterial diversity in samples with substantial eukaryote "host/associated" contamination (use V6-V8 sets instead, however these are still susceptible to Euk mito contamination).
c) Although both ITS1 and ITS2 primers have substantial Ascomycete or Basidiomycete biases, the ITS2 region seems to be the more recommended region of the two with highest coverage of other phyla.
d) The 519R primer with this sequence is sometimes called "mod" or "modbio", but the stated sequence here is the actual Lane original sequence.
e) Note this is the newer EMP set, the older/original primers being 515F (GTGCCAGCMGCCGCGGTAA) + 806R (GGACTACHVGGGTWTCTAAT). These primers are not recommended for 2x300bp sequencing (which we usually perform) is its size is too small - use V4V5 instead.
f) The Cyano primers in our hands tend to perform less well in PCR+sequencing than the other sets listed here, but are the only alternative for more Cyano-specific studies. However, note that they will not give more specific annotations than the other bacterial amplicons.
g) As AMF abundance appears to be quite variable, depending on geography/latitude, preamplification is recommended (but not essential) to get reliable results.
h) The listed coverage for this primer set is from only the NSF4/18 forward primer values as there is insufficient coverage of the EukR region in SILVA to get reliable results.

→ (Meta)genomes ("Shotgun")

Microbial (or mtDNA) genomes and community metagenomes are prepared either using: 1) the Illumina Nextera Flex kit for MiSeq+NextSeq (now called DNA Prep kit), which requires a very small amount of starting material (1 ng) as it is a PCR-based library preparation procedure; or 2) the PacBio SMRTbell Prep Kit 3.0 for the Vega, which requires more HMW DNA (90% >7 kb) since it is not PCR-based. For the Illumina method, samples are "tagmented" (enzymatically "sheared" and tagged with adaptors), PCR amplified while adding barcodes, purified using columns or beads, normalized using Illumina beads or manually, then pooled for loading onto the MiSeq or NextSeq. For the PacBio method, the HMW DNA is mechanically sheared (if needed), optionally size-selected, repaired, converted into SMRTbell libraries (covalently closed circles), cleaned-up and normalized, then pooled for loading onto the Vega.

→ (Meta)transcriptomes ("RNA-Seq")

We recently completed our evaluation of multiple RNA-Seq kits with rRNA depletion and have selected the Zymo-Seq RiboFree Total RNA Library kit for the production of sufficiently rRNA-depleted libraries (using universal Bacteria/Archaea + Eukarya removal). Briefly, pure RNA samples are rRNA depleted, then the remaining mRNAs are converted to cDNA in a way that maintains stranded information, tagged with Illumina adaptors+barcodes, PCR amplified, purified using beads, then normalized for pooling + loading onto the NextSeq.

Next-Generation Sequencing

Amplicon samples and small genomes are run on our Illumina MiSeq i100 using 300+300 bp XLEAP chemistry which allows for overlap and stitching togther of paired amplicon reads into one full-length read of higher quality. Output is generally ~30-32 million raw reads and ~20 Gb of sequence = ~75,000 reads per sample for 380 amplicons. Long amplicons and de novo genomes are run on our PacBio Vega using SMRTcells which output roughly 20 Gb of HiFi sequence per cell of variable (long) lengths. For larger metagenomic/metatranscriptomic projects, we run on our Illumina NextSeq 2000 using 150+150 bp paired-end chemistry (on P3 cells usually) generating up to ~1.2 billion raw PE reads and ~360 Gb of sequence.

Bioinformatics Analyses

Details of our amplicon and metagenomics pipelines are available at https://github.com/mlangill/microbiome_helper/wiki, but the following is a summary of the major deliverables clients will receive (analyses will require a "mapping file" from the clients containing any relevant metadata for the study):

→ 16S/18S/ITS Amplicon Analysis

Final ASV tables in text, BIOM and STAMP formats
Accompanying QIIME2-formatted mapping/metadata file
FASTA file of representative sequences (one per ASV)
Phylogenetic tree of ASVs placed within reference sequences
Taxonomic assignment files at various levels (ex: phylum, genus, etc.)
Alpha-diversity rarefaction plots + statistics
Beta-diversity UniFrac plots
Logfiles from the various major steps in the QC process
Functional prediction files generated from PICRUSt2.0 (if requested)

→ Metagenomics/Metatranscriptomics Analysis

FASTA files of the final sequences screened to remove human (or other) contaminants (available upon request)
Taxonomic composition of the samples from Kraken2 (text and STAMP files)
Stratified (by taxa) and unstratified functional prediction files generated from MMseqs2+Kraken2 (text and STAMP files) for individual UniRef90 gene families and MetaCyc pathways

→ Custom Bioinformatics:

Additional bioinformatic analyses can be requested at an hourly rate or through research collaboration. Please contact us for more details.