The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific

doi:10.1371/journal.pbio.0050077

. 2007 Mar;5(3):e77.

doi: 10.1371/journal.pbio.0050077.

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific

Affiliations

PMID: 17355176
PMCID: PMC1821060
DOI: 10.1371/journal.pbio.0050077

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific

Douglas B Rusch et al. PLoS Biol. 2007 Mar.

. 2007 Mar;5(3):e77.

doi: 10.1371/journal.pbio.0050077.

Affiliation

¹ J. Craig Venter Institute, Rockville, Maryland, United States of America. [email protected]

PMID: 17355176
PMCID: PMC1821060
DOI: 10.1371/journal.pbio.0050077

Abstract

The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Sampling Sites**
Microbial populations were sampled from locations in the order shown. Samples were collected at approximately 200 miles (320 km) intervals along the eastern North American coast through the Gulf of Mexico into the equatorial Pacific. Samples 00 and 01 identify sets of sites sampled as part of the Sargasso Sea pilot study [19]. Samples 27 through 36 were sampled off the Galapagos Islands (see inset). Sites shown in gray were not analyzed as part of this study.

**Figure 2. Fragment Recruitment Plots**
The horizontal axis of each panel corresponds to a 100-kb segment of genomic sequence from the indicated reference microbial genome. The vertical axis indicates the sequence identity of an alignment between a GOS sequence and the reference genomic sequence. The identity ranges from 100% (top) to 50% (bottom). Individual GOS sequencing reads were colored to reflect the sample from which they were isolated. Geographically nearby samples have similar colors (see Poster S1 for key). Each organism shows a distinct pattern of recruitment reflecting its origin and relationship to the environmental data collected during the course of this study. (A) P. ubique HTCC1062 recruits the greatest density of GOS sequences of any genome examined to date. The GOS sequences show geographic stratification into bands, with sequences from temperate water samples off the North American coast having the highest identity (yellow to yellow-green colors). At lower identity, sequences from all the marine environments could be aligned to HTCC1062. (B) P. marinus MIT9312 recruits a large number of GOS sequences into a single band that zigzags between 85%–95% identity on average. These sequences are largely derived from warm water samples in the Gulf of Mexico and eastern Pacific (green to greenish-blue reads). (C) P. marinus MED4 recruits largely the same set of reads as MIT9312 (B) though the sequences that form the zigzag recruit at a substantially lower identity. A small number of sequences from the Sargasso Sea samples (red) are found at high identity. (D) P. marinus NATL2A recruits far fewer sequences than any of the preceding panels. Like MED4, a small number of high-identity sequences were recruited from the Sargasso samples. (E) P. marinus MIT9313 is a deep-water low-light–adapted strain of *Prochlorococcus.* GOS sequences were recruited almost exclusively at low identity in vertical stacks that correspond to the locations of conserved genes. On the left side of this panel is a very distinctive pattern of recruitment that corresponds to the highly conserved 16S and 23S mRNA gene operon. (F) P. marinus CCMP1375, another deep-water low-light–adapted strain, does not recruit GOS sequences at high identity. Only stacks of sequences are seen corresponding to the location of conserved genes. (G) *Synechococcus* WH8102 recruits a modest number of high-identity sequences primarily from the Sargasso Sea samples. A large number of moderate identity matches from the Pacific and hypersaline lagoon (GS33) samples are also visible. (H) *Synechococcus* CC9605 recruits largely the same sequences as does *Synechococcus* WH8102, but was isolated from Pacific waters. GOS sequences from some of the Pacific samples recruit at high identity, while sequences from the Sargasso and hypersaline lagoon (bluish-purple) were recruited at moderate identities. (I) *Synechococcus* CC9902 is distantly related to either of the preceding *Synechococcus* strains. While this strain also recruits largely the same sequences as the WH8102 and CC9902 strains, they recruit at significantly lower identity. (J–O) Fragment recruitment plots to extreme assemblies seeded with phylogenetically informative sequences. Using this approach it is not only possible to assemble contigs with strong similarities to known genomes but to identify contigs from previously uncultured genomes. In each case a 100-kb segment from an extreme assembly is shown. Each plot shows a distinct pattern of recruitment that distinguishes the panels from each other. (J) Seeded from a Prochlorococcus marinus-related sequence, this contig recruits a broad swath of GOS sequences that correspond to the GOS sequences that form the zigzag on P. marinus MIT9312 recruitment plots (see [B] or Poster S1 for comparison). (K–L) Seeded from SAR11 clones, these contigs show significant synteny to the known P. ubique HTCC1062 genome. (K) is strikingly similar to previous recruitment plots to the HTCC1062 genome (see [A] or Poster S1). In contrast, (L) identifies a different strain that recruits high-identity GOS sequences primarily from the Sargasso Sea samples (red). (M–O) These three panels show recruitment plots to contigs belonging to the uncultured *Actinobacter, Roseobacter,* and SAR86 lineages.

**Figure 3. Population Structure and Variation as Revealed by Phylogeny**
Phylogenies were produced using neighbor-joining. There is significant within-clade variation as well as an absence of strong geographic structure to variants of SAR11 (P. ubique HTCC1062) and P. marinus MIT9312. Similar reads are not necessarily from similar locations, and reads from similar locations are not necessarily similar. (A) Geographic distribution of SAR11 proteorhodopsin variants. Keys to coloration: blue, Pacific; pink, Atlantic. (B) Geographic distribution of *Prochlorococcus* variants. Keys to coloration: blue, Pacific; pink, Atlantic. (C) Origins of spectral tuning of SAR11 proteorhodopsins. Reads are colored according to whether they contain the L (green) or Q (blue) variant at the spectral tuning residue described in the text. The selection of tuning residue is lineage restricted, but each variant must have arisen on two separate occasions.

**Figure 4. Categories of Recruitment Metadata**
The recruitment metadata distinguishes eight different general categories based on the relative placement of paired end sequencing reads (mated reads) when recruited to a reference sequence in comparison to their known orientation and separation on the clone from which they were derived. Assuming orientation is correct, two mated reads can be recruited closer together, further apart, or within expected distances given the size of the clone from which the sequences were derived. These sequences are categorized as “short,” “long,” or “good,” respectively. Alternately, the mated reads may be recruited in a mis-oriented fashion, which trumps issues of separation. These reads can be categorized as “normal,” “anti-normal,” or “outie.” In addition, there are two other categories. “No mate” indicates that no mated read was available for recruitment, possibly due to sequencing error. Perhaps most useful of any of the recruitment categories, “missing” mates indicate that while a mated sequence was available, it was not recruited to the reference. “Missing” mates identify breaks in synteny between the environmental data and the reference sequence.

**Figure 5. Fragment Recruitment at Sites of Rearrangements**
Environmental sequences recruited near breaks in synteny have characteristic patterns of recruitment metadata. Indeed, each of five basic rearrangements (i.e., insertion, deletion, translocation, inversion, and inverted translocation) produced a distinct pattern when examining the recruitment metadata. Here, example recruitment plots for each type of rearrangement have been artificially generated. The “good” and “no mate” categories have been suppressed. In each case, breaks in synteny are marked by the presence of stacks of “missing” mate reads. The presence or absence of other categories distinguishes each type of rearrangement from the others.

**Figure 6. Examples of Chimeric Extreme Assemblies**
(A) Fragment recruitment to an extreme assembly contig indicates the assembly is chimeric between two organisms, based on dramatic shifts in density of recruitment, level of conservation, and sample distribution. (B) Fragment recruitment to a SAR11-related extreme assembly. Changes in color, density, and vertical location toward the top of the figure indicate transitions among multiple subtypes of SAR11.

**Figure 7. Fragment Recruitment Plots to 20-kb Segments of SAR11-Like Contigs Show That Many SAR11 Subtypes, with Distinct Distributions, Can Be Separated by Extreme Assembly**
Each segment is constructed of a unique set of GOS sequencing reads (i.e., no read was used in more than one segment). Segments are arbitrarily labeled (A–X) for reference in Figure 8.

**Figure 8. Phylogeny of GOS Reads Aligning to P. ubique HTCC1062 Upstream of 16S Gene Indicates That the Extreme Assemblies in Figure 7 Correspond to Monophyletic Subtypes**
Coloring of branches indicates that the corresponding reads align at >90% identity to the extreme assembly segments shown in Figure 7; colored labels (A–X) correspond to the labels in Figure 7, indicating the segment or segments to which reads aligned.

**Figure 9. Presence and Abundance of Dominant Ribotypes**
The relative abundance of various ribotypes (rows) in each filter (columns) is represented by the area of the corresponding spot (if any). The listed ribotypes each satisfied the following criteria in at least one filter: the ribotype was among the five most abundant ribotypes detected in the shotgun data, and was represented by at least three sequencing reads. Relative abundance is based on the total number of 16S sequences in a given filter. Order and grouping of filters is based on the clustering of genomic similarity shown in Figure 11. Ribotype order was determined based on similarity of sample distribution. A marked contrast between temperate and tropical groups is visible. Estuarine samples GS11 and GS12 contained a mix of ribotypes seen in freshwater and temperate marine samples, while samples from nonmarine habitats or larger filter sizes were pronounced outliers. The presence of large amounts of *Burkholderia* and *Shewanella* in one Sargasso Sea sample (GS00a) makes this sample look much less like other Sargasso and tropical marine samples than it otherwise would. Note that 16S is not a measure of cell abundance since 16S genes can be multicopy.

**Figure 10. Similarity between Samples in Terms of Shared Genomic Content**
Genomic similarity, as described in the text, is an estimate of the amount of the genetic material in two filters that is “the same” at a given percent identity cutoff—not the amount of sequence in common in a finite dataset, but rather in the total set of organisms present on each filter. Similarities are shown for 98% identity. (A) Hierarchical clustering of samples based on pairwise similarities. (B) Pairwise similarities between samples, represented as a symmetric matrix of grayscale intensities; a darker cell in the matrix indicates greater similarity between the samples corresponding to the row and column, with row and column ordering as in (A). Groupings of similar filters appear as subtrees in (A) and as squares consisting of two or more adjacent rows and columns with darker shading. Colored bars highlight groups of samples described in the text; labels are approximate characterizations rather than being strictly true of every sample in a group.

**Figure 11. Sample Similarity at 90% Identity**
Similarity between samples in terms of shared genomic content similar to Figure 10, except that the plots were done using a 90% identity cutoff that has proven reasonable for separating some moderately diverged subtypes

**Figure 12. Distribution of Common Proteorhodopsin Variants across GOS Samples**
The leucine (L) and methionine (M) variants absorb maximally in the green spectrum (Oded Beja, personal communication) while the glutamine (Q) variant absorbs maximally in the blue spectrum. The relative abundance of each variant is shown as a percentage (x-axis) per sample (y-axis). Total abundance for all variants in read equivalents normalized by the abundance of recA protein are shown on the right side of the y-axis. The L and Q variants show a nonrandom distribution. The L variant is abundant in temperate Atlantic waters close to the U.S. and Canadian coast. The Q variant is abundant in warmer waters further from land. The M variant is moderately abundant in a wide range of samples with no obvious geographic/environmental association.

See this image and copyright information in PMC

Comment in

Global ocean sampling collection.
Parthasarathy H, Hill E, MacCallum C. Parthasarathy H, et al. PLoS Biol. 2007 Mar;5(3):e83. doi: 10.1371/journal.pbio.0050083. PLoS Biol. 2007. PMID: 17355178 Free PMC article.
Untapped bounty: sampling the seas to survey microbial biodiversity.
Gross L. Gross L. PLoS Biol. 2007 Mar;5(3):e85. doi: 10.1371/journal.pbio.0050085. Epub 2007 Mar 13. PLoS Biol. 2007. PMID: 20076663 Free PMC article. No abstract available.

Cited by

VIROME: a standard operating procedure for analysis of viral metagenome sequences.
Wommack KE, Bhavsar J, Polson SW, Chen J, Dumas M, Srinivasiah S, Furman M, Jamindar S, Nasko DJ. Wommack KE, et al. Stand Genomic Sci. 2012 Jul 30;6(3):427-39. doi: 10.4056/sigs.2945050. Epub 2012 Jul 27. Stand Genomic Sci. 2012. PMID: 23407591 Free PMC article.
Diversity of viral photosystem-I psaA genes.
Hevroni G, Enav H, Rohwer F, Béjà O. Hevroni G, et al. ISME J. 2015 Aug;9(8):1892-8. doi: 10.1038/ismej.2014.244. Epub 2014 Dec 23. ISME J. 2015. PMID: 25535938 Free PMC article.
New abundant microbial groups in aquatic hypersaline environments.
Ghai R, Pašić L, Fernández AB, Martin-Cuadrado AB, Mizuno CM, McMahon KD, Papke RT, Stepanauskas R, Rodriguez-Brito B, Rohwer F, Sánchez-Porro C, Ventosa A, Rodríguez-Valera F. Ghai R, et al. Sci Rep. 2011;1:135. doi: 10.1038/srep00135. Epub 2011 Oct 31. Sci Rep. 2011. PMID: 22355652 Free PMC article.
Not All Particles Are Equal: The Selective Enrichment of Particle-Associated Bacteria from the Mediterranean Sea.
López-Pérez M, Kimes NE, Haro-Moreno JM, Rodriguez-Valera F. López-Pérez M, et al. Front Microbiol. 2016 Jun 22;7:996. doi: 10.3389/fmicb.2016.00996. eCollection 2016. Front Microbiol. 2016. PMID: 27446036 Free PMC article.
Novel N4 Bacteriophages Prevail in the Cold Biosphere.
Zhan Y, Buchan A, Chen F. Zhan Y, et al. Appl Environ Microbiol. 2015 Aug;81(15):5196-202. doi: 10.1128/AEM.00832-15. Epub 2015 May 29. Appl Environ Microbiol. 2015. PMID: 26025897 Free PMC article.

See all "Cited by" articles

References

1. Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: The unseen majority. Proc Natl Acad Sci U S A. 1998;95:6578–6583. - PMC - PubMed
1. Beja O, Koonin EV, Aravind L, Taylor LT, Seitz H, et al. Comparative genomic analysis of archaeal genotypic variants in a single population and in two different oceanic provinces. Appl Environ Microbiol. 2002;68:335–345. - PMC - PubMed
1. DeLong EF, Pace NR. Environmental diversity of bacteria and archaea. Systematic Biol. 2001;50:1–9. - PubMed
1. Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere.”. Proc Natl Acad Sci U S A. 2006;103:12115–12120. - PMC - PubMed
1. Garrity GM. Bergey's manual of systematic bacteriology. New York: Springer-Verlag; 2001.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Molecular Biology Databases
- REBASE - The Restriction Enzyme Database
- SILVA

[1] Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: The unseen majority. Proc Natl Acad Sci U S A. 1998;95:6578–6583. - PMC - PubMed

[2] Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: The unseen majority. Proc Natl Acad Sci U S A. 1998;95:6578–6583. - PMC - PubMed

[3] Beja O, Koonin EV, Aravind L, Taylor LT, Seitz H, et al. Comparative genomic analysis of archaeal genotypic variants in a single population and in two different oceanic provinces. Appl Environ Microbiol. 2002;68:335–345. - PMC - PubMed

[4] Beja O, Koonin EV, Aravind L, Taylor LT, Seitz H, et al. Comparative genomic analysis of archaeal genotypic variants in a single population and in two different oceanic provinces. Appl Environ Microbiol. 2002;68:335–345. - PMC - PubMed

[5] DeLong EF, Pace NR. Environmental diversity of bacteria and archaea. Systematic Biol. 2001;50:1–9. - PubMed

[6] DeLong EF, Pace NR. Environmental diversity of bacteria and archaea. Systematic Biol. 2001;50:1–9. - PubMed

[7] Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere.”. Proc Natl Acad Sci U S A. 2006;103:12115–12120. - PMC - PubMed

[8] Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere.”. Proc Natl Acad Sci U S A. 2006;103:12115–12120. - PMC - PubMed

[9] Garrity GM. Bergey's manual of systematic bacteriology. New York: Springer-Verlag; 2001.

[10] Garrity GM. Bergey's manual of systematic bacteriology. New York: Springer-Verlag; 2001.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific

Affiliation

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases