Introduction
The parallels between phenotype and lineage diversification, and the underlying genetic mechanisms of rapid phenotypic change during divergence, are topics that continue to receive considerable attention in evolutionary biology (Schluter ; Gillespie ; Nosil ; Nadeau and Jiggins ; Devitt et al. ; Conte et al. ; Kronforst and Papa ). Geographic variation in insect coloration, particularly with respect to mimicry, has provided some of the most intriguing examples of the ecological and evolutionary drivers of phenotypic divergence and convergence among closely related populations and species (Bates ; Müller ; Williams ; Kronforst and Papa ). These systems can be especially useful as a means of elucidating the roles of genetic isolation and exchange among lineages (Mallet and Joron ; Nosil ; Hines et al. ; Heliconius Genome Consortium ), both of which may be important for the distribution of adaptive variation. For example, divergent phenotypes may stem from accumulation of unique genetic variants among populations due to reproductive barriers, drift, and selective pressures that maintain differentiation in distinct environments (Nosil et al. ). Conversely, the spread of beneficial alleles through hybridization and selection may contribute adaptive phenotypic evolution across species (Seehausen ; Gompert et al. ; Mallet ; Hines et al. ). Understanding the relative contributions of divergence and admixture in populations that are in the process of diversifying may help reveal the evolutionary forces at play in the origins of phenotypic variation like coloration.
Bumble bees commonly exhibit extraordinary color‐pattern variation (Plowright and Owen ; Williams ) that provides opportunities for investigating the interaction of phenotypic polymorphism and diversification in closely related lineages. Convergence of sympatric species on similar pigmentation patterns has produced more than 30 “Müllerian mimicry complexes” globally (Plowright and Owen ; Williams ). Intraspecific populations in different geographic regions may evolve different color patterns to match local mimicry complexes, while multiple species in the same region phenotypically converge (Williams ; Owen et al. ; Hines and Williams ), and untangling the relationships between genetic divergence and phenotype can thus be a challenge. Indeed, associations between phylogeny and color at deeper timescales are often weak (Cameron et al. ; Hines and Williams ), suggesting rapid evolution of pigmentation changes that might be attributed to the relatively small set of “color‐pattern elements” (Rapti et al. ) and associated genes. However, lineage associations with color are found at lower phylogenetic levels (Duennes et al. ; Hines and Williams ; Lozier et al. ), which may ultimately be most useful for illuminating the processes involved in pigmentation evolution.
Bombus (Pyrobombus) bifarius Cresson is among the more dramatic cases of divergence in abdominal coloration in North American bumble bees (Stephen ; Williams et al. ) (Fig. ). In the USA, two major color forms belong to two of the main North American regional color pattern groups (Plowright and Owen ; Williams ); populations in the easternmost parts of the species range in the southern Colorado Rocky Mountains, primarily in Colorado (CO) and southeastern Wyoming (WY), exhibit bright red hair color on the second and third abdominal tergites (red‐banded), while those in the westernmost Pacific region have black colored hairs (black‐banded). Geographically intermediate populations have much more variable pigmentation, usually with some degree of mixed coloration between the more extreme red‐ and black‐banded forms (Lozier et al. ; Williams et al. ) (Fig. ). Subspecific names have been applied to reflect geographic color‐pattern differences (Stephen ): B. bifarius bifarius Cresson corresponds to the red‐banded form occurring primarily in easternmost populations (henceforth bifarius‐E), whereas B. bifarius nearcticus Handlirsch is more common and widespread, ranging from the black‐banded forms of the western‐most US (nearcticus‐W) but also including more intermediate mixed‐color forms from centrally located populations (nearcticus‐C) (Stephen ).
The evolutionary status of B. bifarius lineages remains unclear (Lozier et al. ; Williams et al. ). Microsatellites show structuring consistent with both phenotype and subspecies designations, distinguishing the eastern red‐banded bifarius‐E populations from the remaining nearcticus populations that show more complex patterns of genetic and phenotypic variation but generally cluster together (Lozier et al. , ). However, fairly weak differentiation, mixed assignment of individuals to multiple genetic groups, and gradients in both allele frequency and pigmentation through intermediate populations all suggest the possibility of ongoing gene flow between two major population groups (Lozier et al. , ). The relatively small numbers of markers analyzed may have limited the resolution of analyses, however, and the apparent connectivity could stem from weak drift in large regional bumble bee populations, combined with the high variability of microsatellites. Resolving relationships among color forms will be a key for revealing the evolutionary changes that contribute to phenotype in this species.
In this study, we clarify the relationships among lineage and color‐pattern differentiation across the major red‐vs‐black B. bifarius pigmentation groups of the contiguous USA, employing two complementary genomic approaches. Given the diversity of next generation sequencing methods available for evolutionary analysis, our goal was to strengthen conclusions by incorporating multiple sequencing and processing strategies, while also taking advantage of the unique information that might be provided by different genomic data sources. We perform transcriptome sequencing (RNAseq) (Renaut et al. ; Geraldes et al. ; De Wit et al. ; De Wit and Palumbi ) and double digest restriction enzyme associated DNA (ddRAD) sequencing (Davey et al. ; Peterson et al. ) to generate markers. ddRAD offers large numbers of short sequences adjacent to restriction cut sites throughout the genome and should reflect a relatively random sample of genomic variation. RNAseq generates large numbers of SNPs in expressed gene regions that may reveal distinct evolutionary patterns in protein coding regions compared to ddRAD sequences. We sequence representative samples of phenotypically distinguishable bees from CO (bifarius‐E) through Utah (UT) and WY (nearcticus‐C), and Oregon (OR) (nearcticus‐W) to investigate phylogeographic relationships among extreme and intermediate color forms and evaluate whether existing subspecies names accurately reflect distinct evolutionary lineages. We utilize this large number of loci to test whether divergence among color forms is limited or widespread across the genome, and to examine admixture hinted at by previous studies to determine whether phenotypically intermediate nearcticus‐C bees exhibit any hybrid ancestry between the more discrete bifarius‐E and nearcticus‐W phenotypes.
Methods
Samples
Microsatellite studies (Lozier et al. , ) provide basic knowledge of range‐wide population structure of B. bifarius in the western USA. We shift the focus here to increased genome‐scale sampling to explore relationships among select representatives from the two extreme black‐ and red‐banded B. bifarius color forms and geographically and phenotypically intermediate bees (Fig. ). Flying worker bees were sampled by sweep netting across a color pattern and spatial transect from Oregon, Wyoming, Utah, and Colorado (Fig. D; Table ), and pooled into western (OR), central (WY/UT), and eastern (CO) populations based on geography and phenotype. With the focus of this study on expanded genomic coverage for coarse‐scale comparison of divergence and admixture across major lineages, rather than a fine‐scale analysis of population structure, samples should be representative of regional lineages for such “subspecies‐level” questions (Lozier et al. , ). Samples were collected into RNAlater (Invitrogen, Carlsbad, CA), stored at 4°C overnight, transferred to dry ice, and stored at −80°C. RNA was isolated with RNeasy kits (Qiagen, Valencia, CA) or trizol (Invitrogen) from head and thorax tissues. DNA isolations for ddRAD sequencing (Peterson et al. ) were performed following a modified DNeasy protocol (Lozier ).
Collection site information, including the population and phenotypic groupings used for population genetic analyses, and sample sizes (N) for RNA sequencing (RNAseq) and double digest restriction site associated DNA tag (ddRAD) sequencing used in analyses, after removal of putative siblings (see )Site | State | County | Latitude | Longitude | B. bifarius subsp. (T2‐3 color) | N RNAseq | N ddRAD |
CO01 | Colorado | Boulder | 39.994 | −105.418 | bifarius‐E (red) | 1 | 1 |
CO02 | Colorado | Boulder | 39.940 | −105.555 | bifarius‐E (red) | 2 | 8 |
CO03 | Colorado | Routt | 40.349 | −106.750 | bifarius‐E (red) | 2 | 1 |
CO04 | Colorado | Routt | 40.448 | −106.800 | bifarius‐E (red) | 1 | 1 |
OR01 | Oregon | Klamath | 42.396 | −122.201 | nearcticus‐W (black) | 2 | 8 |
OR02 | Oregon | Clackamas | 45.301 | −121.769 | nearcticus‐W (black) | 1 | 1 |
OR03 | Oregon | Clackamas | 45.331 | −121.708 | nearcticus‐W (black) | 1 | 1 |
OR04 | Oregon | Clackamas | 45.256 | −121.712 | nearcticus‐W (black) | 2 | 5 |
OR09 | Oregon | Jackson | 42.073 | −122.754 | nearcticus‐W (black) | 1 | 1 |
UT01 | Utah | Cache | 41.735 | −111.823 | nearcticus‐C (mixed) | 1 | 1 |
WY01 | Wyoming | Lincoln | 42.559 | −110.895 | nearcticus‐C (mixed) | – | 1 |
WY03 | Wyoming | Teton | 43.657 | −110.797 | nearcticus‐C (mixed) | 3 | 7 |
WY04 | Wyoming | Teton | 43.657 | −110.790 | nearcticus‐C (mixed) | 3 | 4 |
WY06 | Wyoming | Teton | 43.669 | −110.821 | nearcticus‐C (mixed) | 1 | 1 |
2T2‐3 coloration = primary pigment classification for the second and third abdominal tergites
Bumble bees are social and sampling sibs can produce artificial population structure, although collecting sister workers is relatively rare (Cameron et al. ) and we attempted to maximize independence of samples by collecting small numbers of bees from multiple localities to reflect each region (Table ). Thus after preliminary SNP calling using all sequenced individuals (using methods below), we estimated pairwise relatedness within bifarius‐E, nearcticus‐W, and nearcticus‐C using VCFtools (Yang et al. ; Danecek et al. ). Relatedness estimates were ~1.0 for self and ~0.0 for the majority of non‐self pairs. A small number of non‐self pairs exhibited relatedness near or greater than 0.5, and as they were always from the same collection area, they were considered putative sisters, and we retained only one sample from each pair (RNAseq: only one bee removed from nearcticus‐C; RADseq: 1 removed from bifarius‐E, 2 from nearcticus‐W, 2 from nearcticus‐C). Ultimately, a total of 22 (including a single B. vosnesenskii worker as an out‐group) and 41 unrelated diploid workers were included in RNAseq and ddRAD experiments, respectively (Table ), numbers that should prove sufficient for determining major lineage relationships with the large number of markers afforded by genome‐scale data (Felsenstein ; Novembre & Ramachandran ; Rheindt et al. ).
RNAseq analysis
Transcriptome libraries were generated by the Hudson Alpha Institute for Biotechnology (Huntsville, AL) from 500 ng total RNA, and run on an Illumina HiSeq 2000 with 100 bp paired‐end chemistry, producing 17.2 × 106 (1.9 × 106 S.D.) reads per individual (Appendix S1). Trim Galore! (
RNAseq variant detection
SNPs were called once for both B. bifarius and B. vosnesenskii, and then separately with B. vosnesenskii excluded. Samtools (mpileup, bcftools, and varFilter; Li et al. ) was used for variant calling (Quinn et al. ; Singhal ), ignoring indels and SNPs within 4 bp of a gap. To further minimize artifacts associated with RNAseq SNP calling, we used vcftools to filter variants. The final data sets comprised 91,282 SNPs that were variable across the three species and 46,259 SNPs that were variable within B. bifarius alone, each with ≥12× coverage per sample (more stringent, e.g., 20×, filters did not substantially alter mean SNP depth or final results), tested for excess heterozygosity (e.g., Taylor et al. ) in either bifarius‐E or nearcticus‐W (nearcticus‐C populations were not screened to allow for the possibility of admixture generating many heterozygotes), and with ≤20% missing data. Although linkage is weak in bees (Whitfield et al. ; Stolle et al. ; Sadd et al. ), for certain analyses where unlinked SNPs would be beneficial, the data set was thinned to one variant per 20 kb. For some analyses, SNPs were classified as synonymous or nonsynonymous from B. impatiens 2.0 genome annotations using SNPeff (Cingolani et al. ). With the caveat that the B. impatiens draft is not perfectly annotated and annotations may not always transfer across species, SNPs falling outside of any B. impatiens exons or that produced SNPeff warnings (e.g., from incomplete or incorrect annotations) were excluded from classification; visual inspections of alignments suggested that remaining sites appeared accurately classified.
Analysis of RNAseq variants
Data manipulation for some analyses relied on scripts from De Wit et al. () and PGDSpider (Lischer and Excoffier ). To identify signatures of average differentiation and admixture at the individual level, we performed principal components analysis (PCA) on RNAseq SNPs using smartpca and twstats (Patterson et al. ; Price et al. ) after removing all missing data. To quantify average differentiation at the population level, pairwise population FST [SNPs with minor allele frequency (MAF) > 0.05] was calculated with Arlequin 3.5 (Excoffier and Lischer ), with significance tested by 10,000 permutations. To distinguish whether the patterns of differentiation or admixture were restricted or widespread across the genome, SNP‐specific FST values, sliding window analysis, and allele frequencies were calculated in vcftools. As a final test to determine whether the intermediate phenotype of nearcticus‐C could be explained by admixture between the more discrete color forms, three‐population tests of admixture in nearcticus‐C were conducted in Treemix 1.2 (Pickrell and Pritchard ) for the same data sets using the f3 statistic (Reich et al. ; Patterson et al. ) with the form f3(nearcticus‐C; bifarius‐E, nearcticus‐W).
To better describe phylogeographic relationships and quantify relative divergence among lineages, we performed two phylogenetic analyses on the RNAseq SNPs. First, SNPs were concatenated for maximum likelihood analysis with RAxML 8.0.20 (Stamatakis ) with 100 bootstrap replicates. To take advantage of SNP‐only models in RAxML, sites were converted to FASTA format, heterozygous sites coded as missing data, and invariant sites removed (Final N = 52,081 SNPs) as described in the manual. As suggested for SNP‐only data in RAxML, we used the general time reversible model, a gamma model of rate heterogeneity and ascertainment correction, and B. vosnesenskii and B. impatiens as out‐groups. The final tree was largely unaffected by the exact method of sample processing (e.g., removing heterozygous sites, including all SNPs vs. thinned data) or the model employed. Forcing tree‐like structures is not always ideal for individuals, and concatenation violates the reality that independent gene trees underlie unlinked loci, but phylogenetic analysis can provide an overall distance among individuals and populations with respect to B. impatiens and B. vosnesenskii (e.g., Rheindt et al. ; Soria‐Carrasco et al. ). Such problems can also be alleviated by modeling population‐level relationships and evolutionary parameters within a coalescent framework (Bryant et al. ; Fujita et al. ). Thus, we used the Bayesian coalescent species tree method SNAPP (Bryant et al. ), which is designed to estimate relationships from unlinked bi‐allelic data, to gauge relative divergence times. We utilized only synonymous sites (likely to be closest to neutral) with no missing data from the unlinked 20 kb‐thinned SNP set. This considerably improved the performance of SNAPP without sacrificing the inferred topology, which, as for RAxML, was stable in a variety of test data sets with greater or fewer numbers of loci (not shown). SNPs were coded as “0 = homozygous derived allele”, “1 = heterozygote”, and “2 = homozygous B. impatiens allele.” We estimated the u and v parameters from the data (Drummond and Bouckaert ). Other parameters were set as default, except for exploring sensitivity to the gamma distribution priors on θ = 3Neμ (for haplodiploids) by adjusting the α and β parameters to account for a range of possible population sizes (e.g., Rheindt et al. ). Because of demographic parameter conversion uncertainty, only relative divergences are presented, but topologies were unaffected by parameter settings. For default priors, 5 × 105 generations (with 10% burn‐in) was sufficient to produce effective sample sizes of 200 in Tracer 1.6 (Rambaut and Drummond ); however, alterations to default priors required increases to >1.5 × 106 generations in some runs. Results were qualitatively similar regardless of run length.
As a final test of the relative contributions from divergence and admixture to the evolution of B. bifarius lineages, we made use of the inferred phylogenetic relationships for demographic modeling with ∂a∂i v1.7 (Gutenkunst et al. ) under a three‐population isolation with migration model allowing population size changes and symmetrical migration. The folded site frequency spectra were estimated with the full B. bifarius‐only SNP set projected down to 12 sampled chromosomes per population. No parameter rescaling was performed, as we were only interested in comparisons of relative migration rates between the three populations.
Double digest RADtag (ddRAD) sequencing
ddRAD libraries and sequencing
ddRAD sequencing allowed increased sampling for each lineage and provides a complementary source of genomic data to RNAseq that should more randomly sample the genome. We prepared libraries following Peterson et al. (). Genomic DNA (250 ng) was digested with 1 μL each EcoRI (100,000 U/μL) and MspI (20,000 U/μL microliter) (New England Biolabs, Ipswitch, MA) following manufacturer protocol. Digested samples were cleaned with Agencourt Ampure XP beads (Beckman Coulter, Danvers, MA). Ligation of double stranded adapters with unique inline barcodes followed Peterson et al. (). Barcoded samples were pooled and 225–375 bp regions were selected using a Pippin Prep (Sage Science) 2% agarose gel cassette and confirmed via PCR and electrophoresis. Size‐selected fragments were purified with M‐270 Streptavidin Dynabeads (Invitrogen). To incorporate primer regions for Illumina sequencing libraries, the size‐selected product was split into five replicates and amplified with Phusion High‐Fidelity DNA polymerase (manufacturer's protocol) for 12 cycles. Products were pooled and Ampure purified. The library was sequenced on the Illumina MiSeq platform (150 bp chemistry) at Hudson Alpha.
ddRAD sequence processing
Sequences were demultiplexed and cleaned using Stacks v1.23 (Catchen et al. ) process_radtags. Reads with multiple errors in the barcode and cut site and with average Phred quality score <10 within a sliding window (default 15% of total read length) were discarded. Adapter sequences were identified and removed and all reads trimmed to 135 bp (shorter reads excluded). We elected to construct ddRAD loci de novo and treat them as random genomic marker data, as is common in diverse genome reduction genotyping studies (e.g., Emerson et al. ; Baxter et al. ; Gompert et al. ; Catchen et al. ; Benestan et al. ), and to provide a contrasting approach for comparison to the B. impatiens‐aligned RNAseq data. Loci were created with the Stacks denovo_map pipeline using only the EcoRI end fragment. Removing the MspI read excludes some data, however, using only a single end produced sufficient markers for our goals, simplified bioinformatics in the presence of both overlapping and non‐overlapping read pairs, and reduced the numbers of closely‐spaced SNPs that should provide minimal novel information. Stacks implements approaches to filter erroneous genotypes during de novo alignment (Catchen et al. ), and although we did not use B. impatiens for alignments, we did employ the genome post hoc to optimize parameters for excluding artifacts such as over‐merged (e.g., potential repeat elements and paralogs) and under‐merged (e.g., highly divergent alleles) loci. Following an approach similar to (Catchen et al. ), we tested different parameters for mismatches allowed between stacks within and among individuals to optimize merging of loci, and ran 10 batches ranging from 1 to 10 allowed mismatches. For each parameter set, we used Stacks populations to identify loci present in >50% of the samples, and BLASTN (E‐value of 1e‐10) to align against the B. impatiens 2.0 genome (Sadd et al. ) in Geneious v7 (Biomatters;
Calculation of FST, f3, and allele frequencies used Arlequin, Treemix, and vcftools, as above. To test for admixture, major groups of populations were identified using STRUCTURE (Falush et al. ) with a data set filtered to contain only one SNP per ddRAD locus to minimize within‐locus linkage (7248 SNPs). We used the admixture model with 100,000 iterations (burn‐in of 25,000 iterations). We allowed the number of populations (K) to range from 2 to 5 and performed 2 replicate iterations each. Results were processed with STRUCTURE HARVESTER (Earl and VonHoldt ) to determine the optimal value of K with ΔK (Evanno et al. ). We used SPLITSTREE v4.13.1 (Huson and Bryant ) to construct a phylogenetic network, implementing the neighbor‐net (ordinary least squares variance) and equal angle algorithms, using uncorrected p‐distances with heterozygous ambiguities averaged and normalized, and 1,000 bootstrap replicates (>75% shown). Analysis with ∂a∂i was performed as above with a projection to 16 alleles per population.
Results
Sequencing summary
Approximately 2 × 108 100 bp RNAseq read pairs were included for samples in our analysis, with an average of 97% passing quality control and, of these, an average of 76% of reads per sample aligned to the B. impatiens genome for final analysis (Appendix S1). RNAseq‐derived SNPs that passed quality filters were sequenced to an average of 66× coverage per site per sample (Appendix S2), with samples receiving fairly even coverage ranging from an average of 48×–78× per SNP. Approximately 3 × 107 ddRAD reads were obtained (Appendix S3). After processing and removal of putative siblings and quality control filtering, an average of 91% of these 135 bp reads were retained, with a mean of 31× after removing loci with >20% missing data (N = 25,308; Appendix S2).
Phylogenetic and ancestry clustering analyses
Phylogenetic analysis of concatenated RNAseq B. bifarius SNP data (Fig. A) produces two major groups: one monophyletic group comprising red‐banded bifarius‐E bees, and the second comprising nearcticus (100% support). Apart from a single sample, nearcticus‐W and nearcticus‐C form their own groups, however, this split is not well supported. SNAPP produces a similar population tree (Fig. B), with bifarius‐E sister to the more closely related nearcticus‐W and nearcticus‐C, with all nodes supported with a posterior probability of 1.00. Absolute divergence and θ estimates were sensitive to θ prior parameters (α and β), but, as observed in other studies (Rheindt et al. ), topology and support were identical across test runs and divergence time ratios for clades were relatively similar (e.g., Appendix S4). The greatest uncertainty surrounds divergences of B. impatiens and B. vosnesenskii, as expected given the more limited population sampling of these lineages. However, the B. bifarius population tree and divergence time ratio is similar with out‐group removal (not shown). Notably, the divergence time confidence region for bifarius‐W and the two nearcticus lineages does not overlap that within nearcticus. Split network analysis of ddRAD SNPs also produces two major clades (Fig. C), one containing only bifarius‐E and one containing members of nearcticus‐W and nearcticus‐C with substantial distance between the two clades (RNAseq produced similar results; Appendix S5).
Clustering analyses also divide B. bifarius into two major groups, with no clear evidence of mixed ancestry between bifarius and nearcticus populations (Fig. ). The first axis (eigenvalue = 12.6) in the RNAseq PCA (Fig. A) separates B. vosnesenskii from B. bifarius and the second (eigenvalue = 2.6) separates bifarius‐E and nearcticus W + C (P < 0.001). The third axis (eigenvalue = 0.4) separates nearcticus‐W from nearcticus‐C (P < 0.001), with some weak subdivisions among nearcticus‐W into northern and southern OR clusters also observed, a pattern that may suggest within‐subspecies isolation by distance (IBD). A similar pattern appears in a PCA using the 20 kb distance‐filtered SNP set, but with weaker within‐nearcticus resolution (Appendix S6). The ddRAD STRUCTURE analysis (Fig. B) identifies two clusters (Fig. B, Appendix S7), with nearcticus‐W + nearcticus‐C populations in one cluster, and bifarius‐E in another, with no admixed genotypes. Setting K = 3 produces no additional structuring. A second STRUCTURE analysis was performed only for nearcticus samples, but no additional subdivision was detected (Appendix S7). STRUCTURE analysis for RNAseq SNPs produced essentially the same K = 2 pattern as the ddRAD data, failing to show any within‐nearcticus separation (not shown). Demographic modeling with ∂a∂i generally supports the distinction between bifarius and nearcticus groups, with near‐zero estimates of contemporary migration relative to the high levels between nearcticus populations (only ~1% of that estimated between the two nearcticus populations) (Appendix S8).
Population genetics
Population genetic analyses support the split between bifarius and nearcticus bees revealed by the tree‐based and clustering analyses. Red‐banded bifarius‐E are equally divergent from intermediate nearcticus‐C and black‐banded nearcticus‐W for both RNAseq and ddRAD SNPs (Fig. ). For the RNAseq SNPs, derived (i.e., non‐B. impatiens) allele frequency (DAF) differences between bifarius‐E and both nearcticus populations are strongly correlated (Fig. A; Pearson's r = 0.93). Both pairs exhibit an excess of SNPs with fixed allele frequency differences compared to those observed for nearcticus‐W versus nearcticus‐C (Fig. C), which show a relative deficit of large DAF differences and a much higher frequency of SNPs with little DAF difference. There was no pattern with respect to fixation of ancestral (i.e., B. impatiens reference) or derived alleles between nearcticus and bifarius‐E, with similar numbers of SNPs in both categories (e.g., 709 fixed ancestral and 753 fixed derived alleles in the bifarius‐E versus nearcticus‐W comparison). Similar correlations for differences in global MAF are apparent for the ddRAD SNPs (Fig. B; Pearson's r = 0.89). A lower frequency of fixed‐divergence SNPs between bifarius‐E and the nearcticus populations is observed in the ddRAD than in the RNAseq data; however, there is still a substantial excess of SNPs with high‐frequency differences and a deficit of SNPs with low‐frequency differences, relative to that between the two nearcticus populations (Fig. D). Consequently, in both RNAseq and ddRAD data average, FST is low between the nearcticus populations and high in bifarius‐E versus nearcticus pairs (Table ). The nearcticus‐W versus nearcticus‐C FST values are comparably low for both SNP sets; however, FST values for bifarius‐E versus nearcticus comparisons are notably higher for the RNAseq SNPs, particularly for nonsynonymous RNAseq SNPs. A randomly thinned SNP set also exhibits somewhat elevated FST's, however, suggesting this increase may be a random effect from reducing the number of SNPs analyzed. Notably, regardless of filtering or data set, very little variation in FST for the two nearcticus populations is observed, with most variability apparent in the bifarius–nearcticus comparisons.
No. SNPs1 | nearcticus‐W versus nearcticus‐C | nearcticus‐W versus bifarius‐E | nearcticus‐C versus bifarius‐E | f3 (nearcticus‐C; bifarius‐E, nearcticus‐W)2 | |
RNAseq‐all SNPs | |||||
All Sites | 20,170 | 0.015*** | 0.449*** | 0.445*** | 0.002 |
Synonymous Sites | 7450 | 0.013*** | 0.444*** | 0.440*** | 0.002 |
Non‐synonymous Sites | 2016 | 0.021*** | 0.597*** | 0.592*** | 0.003 |
RNAseq‐20 kb thinned | |||||
All Sites | 2251 | 0.018*** | 0.538*** | 0.529*** | 0.002 |
ddRAD‐SNPs | |||||
All SNPs | 7948 | 0.019*** | 0.243*** | 0.238*** | 0.001 |
Single SNP per locus | 2115 | 0.019*** | 0.258*** | 0.252*** | 0.001 |
1Single‐nucleotide polymorphism with >0.05 minor allele frequencies were used in FST calculations. Otherwise data sets are as described in .
2A negative f3 would indicate admixture in nearcticus‐C from both nearcticus‐W and bifarius‐E, while a positive value is expected under no mixture; statistical tests of f3 are only informative for negative values, but in no instance supported negative f3 values.
***Significantly greater than zero, P < 0.001.
The f3‐statistics are positive in all data sets, providing no support for admixture in nearcticus‐C leading to deviations from a three‐population tree (Table ). To further explore for signals of admixture, however, we examined FST at SNPs that were diagnostic (FST = 1) for the hypothesized “parental” populations (bifarius‐E vs. nearcticus‐W) in the two remaining population pairs. For the 1341 diagnostic RNAseq SNPs, mean FST = 0.002 for nearcticus‐W versus nearcticus‐C and 0.991 for bifarius‐E versus nearcticus‐C (and 0.003 and 0.992, respectively, for diagnostic ddRAD SNPs). Thus, SNPs that are fixed between the two extreme color forms are also at or near fixation between bifarius‐E and nearcticus‐C. Taking advantage of the B. impatiens genome alignment, we examined whether RNAseq SNPs with high frequency differentiation were restricted within the genome with sliding window analysis across scaffolds (non‐overlapping window size = 20 kb). Regions of high differentiation (windows with FST ≥ 0.9) are scattered throughout the B. impatiens scaffolds; 196 windows across 101 scaffolds show high differentiation for bifarius‐E versus nearcticus‐W, 186 windows across 98 scaffolds show high differentiation for bifarius‐E versus nearcticus‐C, and no windows exhibit high divergence for the two nearcticus populations (Appendix S9).
Discussion
The origins of insect color polymorphism is a topic of immense interest, especially the mechanisms by which intraspecific population diverge, while interspecific populations converge, on particular phenotypes, and the role such patterns have on diversification (Mallet and Joron ; Williams ; Jiggins ; Hines et al. ; Kronforst and Papa ). As Stephen () noted regarding B. bifarius, “this highly variable species has undoubtedly caused more consternation to bumble bee taxonomists than any other species in western North America.” Morphological analyses of B. bifarius have been confounded by intermediate phenotypes that lie between the typical extreme red‐banded (B. b. bifarius) and black‐banded (B. b. nearcticus) color forms (Stephen ; Lozier et al. ; Williams et al. ) in the western USA. Although challenging for taxonomists, the relationships among both discrete and more intermediate color forms have the potential to be of great utility for understanding the strength of and mechanisms underlying genetic isolation and the correlations with phenotypic variation.
Previous microsatellite‐based studies indicated a moderate level of structure for two major B. bifarius groups, but with most individuals, especially geographic and phenotypic intermediates, exhibiting detectable contributions from both lineages that suggested ongoing gene flow (Lozier et al. , ). However, results here provide no clear evidence for extensive genomic introgression as a major explanation for intermediate phenotypes. Instead, B. b. bifarius (red‐banded bifarius‐E) appear as a clearly divergent lineage from the more phenotypically variable B. b. nearcticus from OR (black‐banded nearcticus‐W) and from WY or UT (mixed colored nearcticus‐C), while the latter show a high genome‐wide similarity irrespective of color‐pattern variability. Across classes of SNPs, allele frequencies were similar within nearcticus, despite the geographic distance between sites, and high in both nearcticus‐bifarius comparisons, a pattern inconsistent with IBD or other types of pervasive admixture between the phenotypic extremes in more central localities (Fig. ; Table ). The f3 statistics are consistent with a three‐population tree without admixture, and individual‐based analyses from the SNP sets support FST patterns (Fig. ), grouping nearcticus‐W + nearcticus‐C separate from bifarius‐E, with no evidence of hybrid genotypes.
Phylogenetic analysis likewise groups genotypes into two major clades consistent with existing subspecies classification, with weaker within‐nearcticus division, and provides some perspectives on the seemingly deep nucleotide divergence between the main B. bifarius lineages. We did not conduct a formal dating analysis at this point given challenges in estimating heterogeneous evolutionary rates across classes of SNPs. However, a divergence from B. impatiens approximately ~6.2 mya from calibrated phylogenetic analysis (Hines ,b) applied to the coalescent tree here would suggest a rough mean divergence of the major B. bifarius lineages of ~0.96–1.2 mya (converted from range of means using different SNAPP priors; Appendix S4). Because of uncertainty in mutation rate variation and parameter settings, model simplicity, and limited out‐group sampling, this approximation should be viewed with caution. However, combined with other results that provide little evidence for admixture, including PCA, STRUCTURE, and the large percentage of fixed‐divergence SNPs, a mid‐late Pleistocene B. bifarius split is likely to be reasonable, even if the exact date is uncertain. The basic patterns of subspecific divergence and relationships developed here lay the groundwork for future studies to test specific phylogeographic hypotheses and refine divergence time estimates.
Possible signatures of adaptive divergence or introgression?
Highly divergent regions in both bifarius vs. nearcticus comparisons were distributed across genomic scaffolds, suggesting extensive divergence across the genome rather than localized differentiation that might indicate islands of recent adaptive differentiation. These SNPs may represent a mixture of genome regions targeted by divergent selection, but to a large extent likely reflect neutral differentiation from population history and random drift (Nosil et al. , ), which will make identifying the signatures of selection through standard “outlier” divergence mapping a challenge. However, while the broadly comparable patterns of structure for RNAseq SNP classes and ddRAD sites suggests general inferences about differentiation and admixture are robust, the increased differentiation at non‐synonymous SNPs could point to the action of divergent selection in genic regions. Future research will examine possible explanations for the differences in the degree of differentiation among different SNP classes, in particular focusing on regions containing many high‐divergence SNPs that may reveal signatures of positive selection.
Given these patterns of parallel genome‐wide divergence between bifarius‐E and both nearcticus‐W and nearcticus‐C, however, additional research will be needed to understand the balance between selection and reproductive isolation in maintaining the pigmentation dimorphism in B. bifarius. Previous results suggest that hot and dry low elevation landscapes act to drive differentiation at a regional scale in B. bifarius (Lozier et al. ), and it is likely that differential selection on pigmentation during allopatric divergence could have driven initial adaptive differentiation in the two extreme color forms. However, it is possible that color patterns could have played a more direct role in reproductive isolation (Jiggins et al. ), and as with other mimicry systems multiple hypotheses are plausible (Brown et al. ; Mallet and Singer ; Mallet and Joron ). The lack of clear genetic divergence between nearcticus populations certainly raises questions as to why bees with partial red‐banded pigmentation dominate many parts of the B. b. nearcticus range (Fig. ). In general, red‐black coloration in Bombus is regulated as part of the melanin biosynthesis pathway (Hines ; Rapti et al. ). Certain red/black Bombus color polymorphisms, most notably in the western NAm species B. melanopygus, seem compatible with single‐locus Mendelian inheritance (Owen and Plowright ), but for species like B. melanopygus, binning individuals into classes is relatively straightforward, there is little population structure, and the morphs are at least partly sympatric (Owen et al. ). Color in B. bifarius may be more complex, involving both historical phylogeographic isolation and a range of phenotypic intermediates. A simple single‐locus system may thus not explain pigmentation in this case, and more complex regulation of expression within the melanin pathway may be at play (Hines ), including regulatory mutations or environmental effects that affect the timing of melanization.
Previous data originally led us to hypothesize an admixture contribution for the mixed nearcticus‐C phenotype, which would have led to clear hypotheses to detect loci involved in determining red pigmentation. In this scenario, an admixture outlier mapping approach could have provided clues about regions involved in pigmentation through adaptive introgression between lineages (i.e., evidenced by rare loci with bifarius‐E alleles at high frequencies in nearcticus‐C) (Buerkle and Lexer ; Crawford and Nielsen ; Hedrick ; Pallares et al. ). Such patterns have been observed, for example, at mimetic loci in Heliconius (Heliconius Genome Consortium ; Pardo‐Diaz et al. ; Martin et al. ). However, no obvious candidate regions with unusually large similarity between nearcticus‐C and bifarius‐E were identified; sites that were highly differentiated between the more discrete color morphs were equally divergent between bifarius‐E and nearcticus‐C. For both RNAseq and ddRAD data sets, some individual SNPs showed slightly greater similarity in allele frequencies between bifarius‐E and nearcticus‐C, but a similar number of loci show equally high similarity for bifarius‐E and nearcticus‐W (Fig. ), a pattern expected under divergence and genetic drift. Such patterns will make it challenging to distinguish any potentially adaptive introgression from background noise with population genetics alone.
In light of the unexpectedly large level of background divergence and lack of any clear admixture, we thus elected not to pursue genome “outlier” scans to avoid the risk of false positives owing to population structure. However, lack of evidence for obvious signatures of adaptation does not mean such loci do not exist. Transcriptomes of workers will lack genes expressed only in developing bees, and ddRAD data may be insufficiently dense for mapping large, outbred populations. In Heliconius mimicry complexes, only markers closely linked to localized genome changes that regulate red coloration exhibit evolutionary histories reflecting phenotype, with most genome regions reflecting demography (Hines et al. ). Denser genomic coverage may thus be needed for fine mapping color‐determining loci in B. bifarius. Together with such population genomic approaches, developing laboratory crosses may prove especially valuable for determining the degree to which the divergence observed here reflects pre‐mating barriers to reproduction, the phenotypes observed in true hybrids, and ultimately for linkage analyses of pigmentation genes.
Differences between NGS and microsatellite conclusions
As population genomic studies accumulate, it will be interesting to observe how frequently studies exhibit subtle deviations from previous hypothesis such as those observed here. Instead of reflecting substantial admixture, microsatellites may have poor statistical power for resolving the strength of population structure in B. bifarius. The microsatellites used previously have high polymorphism (mean gene diversity = 0.77), limiting the degree of possible differentiation that can be detected by FST or analogous measures (e.g., Jost ; Meirmans and Hedrick ). Indeed, differentiation statistics that account for polymorphism (e.g., Jost's D; Jost ) were somewhat closer to levels reported here (Lozier et al. ). Further, the high diversity is indicative of large populations (or high mutation rates) that may be influenced by incomplete lineage sorting, homoplasy, or other factors that could limit differentiation, and could be overcome by the large number of SNPs examined here even with smaller sample sizes. Alternatively, signatures of hybridization may be complex (e.g., Gompert et al. ; Cahill et al. ; Lindtke and Buerkle ), and it is possible that our focus on extensive genomic sampling of representative exemplar populations could oversimplify the system. Strong genetic clustering can, for example, be observed with incomplete sampling in IBD populations (Meirmans ). The degrees of differentiation here argue against this (see above), and bees from widely separated sites within population groups cluster together as well as individuals from the same locality (Figs. and ). The samples thus appear representative of broader regional diversity, and the main conclusions are not likely driven by local population structure or uneven sampling, but instead by deeper phylogenetic structuring. Nonetheless, admixture is a complex process, and although finer‐scale “genomics‐quality” samples were not available for this study, increased spatial sampling may ultimately reveal localized hybridization. This study demonstrates, however, that admixture is not sufficiently extensive to leave clear signatures across the genome in all intermediate populations, even where mixed‐color phenotypes might suggest a history of possible hybridization. It will likely be necessary to combine both population genomic and spatially intensive sampling to ultimately resolve such discrepancies.
Conclusions
This study highlights the novel insights that can come from expanding genomic data sets for resolving evolutionary relationships among closely related lineages and illustrates that different types of genetic data may lead to similar but subtly different conclusions. Genome‐wide patterns suggest a history of long isolation between major B. bifarius lineages, which strengthens previous data on the role of geographic and climatic barriers to dispersal as a driver of population structure, but suggests even less gene flow than previously hypothesized. Results thus hold significance for future taxonomic work, but also for understanding the processes involved in maintaining divergence in widespread species. More intensive morphological analysis, mating experiments, and analysis of other color variants, especially individuals with partially red coloration within the B. b. nearcticus range (Stephen ), should be conducted to ultimately determine the need for taxonomic revision, the relative rates at which species and phenotypes diverge, and the population and genome‐level mechanisms underlying reproductive isolation (e.g., Lindtke and Buerkle ). We are only beginning to understand the phylogeographic relationships within this remarkably complex species, and adaptive pigmentation in bumble bees generally. Together with laboratory crosses, denser genome sequencing, and functional genomics studies, the more refined understanding of relationships among B. bifarius lineages presented here will provide the framework for developing general hypotheses for the evolutionary processes underlying red‐black polymorphism in bumble bees that can be tested in other species.
Acknowledgements
We thank the U Alabama College of Arts and Sciences and the National Science Foundation (DEB‐1457645 to JDL, MED, and JPS) for funding. We thank P. Scott, J. Fierst, and S. Duncan for valuable comments on the manuscript and H. Hines for discussions on color patterns. We thank A. Shahan and G. Unbehaun for assisting with sample preparation and analysis, and the National Parks Service for providing access to Grand Teton National Park for specimen acquisition (permit GRTE‐2012‐SCI‐0036 to MED).
Data Accessibility
DRYAD submission (
Conflict of Interest
None declared.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2016. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Geographic variation in insect coloration is among the most intriguing examples of rapid phenotypic evolution and provides opportunities to study mechanisms of phenotypic change and diversification in closely related lineages. The bumble bee Bombus bifarius comprises two geographically disparate color groups characterized by red‐banded and black‐banded abdominal pigmentation, but with a range of spatially and phenotypically intermediate populations across western North America. Microsatellite analyses have revealed that B. bifarius in the USA are structured into two major groups concordant with geography and color pattern, but also suggest ongoing gene flow among regional populations. In this study, we better resolve the relationships among major color groups to better understand evolutionary mechanisms promoting and maintaining such polymorphism. We analyze >90,000 and >25,000 single‐nucleotide polymorphisms derived from transcriptome (
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Department of Biological Sciences, University of Alabama, Tuscaloosa, Alabama
2 Department of Zoology & Physiology and Program in Ecology, University of Wyoming, Laramie, Wyoming
3 USDA‐ARS Pollinating Insect Research Unit, Utah State University, Logan, Utah