Bison: mitochondrial genomics: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 439: Line 439:
=== Synapomorphies in bison, yak and cattle mitochondrial proteomes ===
=== Synapomorphies in bison, yak and cattle mitochondrial proteomes ===


 
What makes a bison a bison? Synapomorphies (derived characters) are those amino acids invariant within bison but differing from the ancestral value as determined by yak, cow, water buffalo and other pecoran ruminant outgroups. To collect these across the entire mitochondrial proteome (13 proteins, 3790 amino acids in bison), 80 complete mitochondrial genomes were aligned at the protein level, with 102 sites of interest then extracted (below) and grouped by type.
What makes a bison a bison? Synapomorphies (derived characters) are those amino acids invariant within bison but differing from the ancestral value as determined by yak, cow, water buffalo and other pecoran ruminant outgroups. To collect these across the entire mitochondrial proteome (13 proteins, 3790 amino acids in bison), 80 complete mitochondrial genomes were aligned at the protein level, with 102 sites of interest then extracted (below) and grouped by type.  


It is important here to include all available bison, yak, cattle and water buffalo data so that bona fide synapomorphies -- rather than sub-clade features or one-off mutations -- are properly defined. (Here 15 cattle genomes and 11 cattle-bison hybrids were chosen from the many available.)
It is important here to include all available bison, yak, cattle and water buffalo data so that bona fide synapomorphies -- rather than sub-clade features or one-off mutations -- are properly defined. (Here 15 cattle genomes and 11 cattle-bison hybrids were chosen from the many available.)


Yak and cattle synapomorphies provide important context to bison so they too are displayed along with synapomorphies applicable to the yak-bison ancestral divergence node. Note cattle have more synapomorphic sites than bison and yak put together, possibly a byproduct of breeding for certain features during their long history of domestication. Sub-clade features of bison and yak, considered above, are also shown along with a couple of water buffalo synapomorphies (which can be determined with less precision because only 4 genomes are available).
Yak and cattle synapomorphies provide important context to bison so they too are displayed along with synapomorphies applicable to the yak-bison ancestral divergence node. Note cattle have more synapomorphic sites than bison and yak put together, possibly a byproduct of breeding for certain features during their long history of domestication. Sub-clade features of bison and yak, considered above, are also shown along with a couple of bison + yak + cattle synapomorphies relative to water buffalo (which can be determined with less precision because only 4 genomes are available).


These changes may represent adaptive variations that swept across the entire population because of their selective value. Alternatively, they might represent maladaptive alleles that nonetheless rose to high frequency because historic bottlenecks accidentally favored a small subpopulation of animals carrying a mutation (much as seen above for V98a and N10I in bison sub-clades). Another option is simply neutral drift that fixed a particular amino acid from the normal reduced alphabet at that position. Only the maladaptive alleles have relevance to conservation genomics management.
These changes may represent adaptive variations that swept across the entire population because of their selective value. Alternatively, they might represent maladaptive alleles that nonetheless rose to high frequency because historic bottlenecks accidentally favored a small subpopulation of animals carrying a mutation (much as seen above for V98a and N10I in bison sub-clades). Another option is simply neutral drift that fixed a particular amino acid from the normal reduced alphabet at that position. Only the maladaptive alleles have relevance to conservation genomics management.


One striking feature of the table is the apparent over-representation by amino acids from the first two columns of the genetic code, especially methionine. Many variations are known for the mitochondrial genetic code, so this raises concerns that bison, yak or cattle might use a slightly differant translation table (making the synapomorphies into artefacts), perhaps only at certain sites along certain proteins. This scenario would be consistent with the many oddities in bison mitochondrial tRNAs reported by the Derr group. Recall here protein sequences have not been experimentally determined but rather inferred from dna (the exceptions being bovine cytochrome b and cytochrome oxidases used in xray crystallography -- which agree with standard table translation).
One striking feature of the table is the apparent over-representation by amino acids from the first two columns of the genetic code, especially methionine. Many variations are known for the mitochondrial genetic code, so this raises concerns that bison, yak or cattle might use a slightly differant translation table (making the synapomorphies into artifacts), perhaps only at certain sites along certain proteins. This scenario would be consistent with the many oddities in bison mitochondrial tRNAs reported by the Derr group. Recall here protein sequences have not been experimentally determined but rather inferred from dna (the exceptions being bovine cytochrome b and cytochrome oxidases used in xray crystallography -- which agree with standard table translation).  
 
 


The first five columns of table below is sorted by over-representation of synapomorphies. For example, ATP8 has five of these in a very short sequence of 66 residues. While ND5 has more synapomorphies (25), it is a much longer protein at 606 residues and so has a lower density of synapomorphies than ATP8 (1.5 vs 2.8). The next three columns show the composition of the bison mitochondrial proteome, sorted according to decreasing occurrence. Note the very low abundances of charged amino acids attributable to the many transmembrane domains which utilize apolar amino acids. Overall, the abundances of amino acids does not resemble that of nuclear-encoded cytoplasmic proteins. The last three columns show abundances taken from the large synapomorphy table below. The top line of that was provided by the Yellowstone bison sequence and dots indicate identity relative to it.
Bison Synapomorphy Table
Gene #AAs Syn %syn %AA Over AA Freq %
3790 102 100 100 3790 100 . 3683
ATP8 66 5 4.9 1.7 2.8 L 593 15.7 T 589
CYTB 379 14 13.7 6.9 2 I 328 8.7 I 487
ATP6 226 11 10.8 6 1.8 T 308 8.1 V 478
ND3 115 5 4.9 3 1.6 S 274 7.2 M 414
ND5 606 25 24.5 16 1.5 M 265 7 L 292
ND2 347 10 9.8 9.2 1.1 A 250 6.6 A 277
ND6 175 5 4.9 4.6 1.1 F 244 6.4 Y 260
ND4 459 12 11.8 12.1 1 G 219 5.8 S 172
ND4L 98 2 2 2.6 0.8 P 191 5 P 89
COX3 260 4 3.9 6 0.7 V 185 4.9 F 81
ND1 318 5 4.9 8.4 0.6 N 164 4.3 H 73
COX2 227 3 2.9 13.6 0.2 Y 132 3.5 N 35
COX1 514 1 1 10 0.1 W 104 2.7
H 98 2.6
K 99 2.6
E 95 2.5
Q 87 2.3
D 69 1.8
R 63 1.7
C 22 0.6
         (The table will paste properly into common desktop spreadsheets if all spaces are replaced with tabs:)
         (The table will paste properly into common desktop spreadsheets if all spaces are replaced with tabs:)
  .gene.. A A C N N N N A N C N C A A C C N N C N N N A N N N N N N A N N N N N C N A N N N C N C C N N N N N N N N N A N N A N N N A N C N N A N N C N N C C N C N N N N C N A C A C C N N A A C C N N N N N N N N N C N  
  .gene.. A A C N N N N A N C N C A A C C N N C N N N A N N N N N N A N N N N N C N A N N N C N C C N N N N N N N N N A N N A N N N A N C N N A N N C N N C C N C N N N N C N A C A C C N N A A C C N N N N N N N N N C N  

Revision as of 12:40, 26 December 2010

Introduction to bison and yak conservation genomics

Bison and wild yak are but two of many genomically endangered species impacted by past and present human activities: historic population bottlenecks (from overhunting and take of habitat) and unnatural selection from uninformed culls, loss of best bulls to trophy hunting, gender imbalance practices, interference with predator selection, competition for forage, breeding opportunities, and disease resistance, selection for docility, and introgression from inbred domestic animals unfit for the wild.

Entire mitochondrial genomes on a population level first became available in December 2010, with several dozen sequences now available for both bison and its sister species yak. A nuclear genome for cattle is available now and one is underway in China for yak, with a genome expected for bison by 2016. In the meantime, a very extensive SNP bead chip allows querying of the nuclear genome on a herd scale. Thus for the first time, it has become possible to consider the genetic status of the herd and make rational conservation management decisions. It is the genome that must be conserved -- humps and shaggy appearance will follow.

The expected genetic impacts of a population expanding out from a severe bottleneck include undesirably high frequencies -- or even total fixation -- of maladaptive amino acid alleles originally present as rare mutations in the founder population (eg all wolves on Isle Royale descend from two that crossed a rare ice bridge in 1949, the mitochondria descending from a wolf-coyote hybrid; their idiosyncratic genomes providing the new allele frequencies). Deleterious mutations can be reliably identified and distinguished from adaptive change or desirable miscellaneous genetic diversity by the comparative genomics techniques described below, provided sufficient sampling data is available.

Mitochondria encode 13 distinct proteins central to energy metabolism. Well-studied human and canine mitochondrial diseases associated with specific mutations in these proteins give rise to clinical conditions, typically exercise intolerance for cytochrome b. A polymorphism at any site in this gene can currently be compared across 1300 mammalian species with individual animal multiplicities bringing the total feasible comparison to over 12,600 sequences.

With such an incredible data set and a fully resolved mammalian phylogenetic tree, the admissible amino acid spectrum (reduced alphabet) is defined to very high sensitivity. Although the function of each residue is seldom entirely known, this reduced alphabet has already been thoroughly vetted by a hundred million years of placental evolution and suffices to evaluate variation in moderately conserved genes. It is not currently possible to attain cutting edge sensitivity for nuclear encoded proteins because data from only 55 vertebrates might typically be available.

However mitochondrial inheritance has nine complexities that strongly affect conservation genomics:

  • mitochondrial dna is maternally inherited, meaning that any and all mutations in bull mitochondrial dna are lost in their descendants but any hybrid resulting from cow introgression retain strictly cattle mitochondrial dna that persists indefinitely without any prospects for dilution by back-breeding within a bison herd.
  • although mitochondrial dna is present in very high copy number in bovine oocytes, the strands are effectively non-recombining, meaning no prospects exist for compiling good variations from the multiple haplotypes present in an individual animal (heteroplasmy).
  • mitochondrial dna can be erratically replicated (not proportionally to haplotype abundance), allowing copy number of mutation-bearing mitochondria dna to surge (or fall) unpredictably relative to residual wildtype haplotypes both in oocytes and somatic cell lineages.
  • stochastic segregation of mitochondria at cell division both in terms of haplotypes and number of mitochnodria inherited by daughter cells
  • mitochondrial dna sequences at GenBank do not describe the germline oocyte haplotype proportions but rather are taken from leucocytes, skin or muscle whose polymorphisms are not necessarily applicable to germline inheritance. It is only when the same mutation surfaces in multiple animals in a phylogenetically coherent clade that sporadic mutations (possibly somatic) can be distinguished from stably heritable mutational haplotypes (where a given mutational haplotype has expanded to become the only haplotype present).
  • in the heteroplasmic case, only the predominant haplotype in the tissue sampled will get reported, even though the dire nature of some mutations and the essentiality of the cytochrome b imply internal compensation by unreported wildtype haplotypes must be occurring at some level.
  • functional compensation could conceivably occur within a single mitochondrion carrying multiple haplotypes, one of them wildtype or for that matter between an allele of an imported nuclear gene serving as part of the mitochondrial oxidative phosphoylation complex. The bc1 complex involves 11 gene products, with all but cytochrome nuclear encoded.
  • though the mutation rate in mitochondria is high and hotspots may exist, actual homoplasy (recurrent mutation) is rare. That is, it is fairly uncommon for the same amino substitution to occur in the oocyte, much less surface from low heteroplasmy to full heritability. This can be seen either from the low occurrence of the same mutation across tens of thousands of sequences and also from human mitochondrial disease statistics. However it is not unusual for multiple haplotypes to wax and wane across a species divergence, with sampling artifacts than picking one of these out in preference to others, giving the appearance of a fixed substitution.
  • lineage sorting of haplotypes at the time of speciation is quite different from nuclear genes because of heteroplasmic persistence, making the determination, indeed definition, of ancestral state quite difficult, though that remains important in establishing the fully functional amino acid alphabet at a given position.

The complexities of heteroplasmy make sequence data difficult to interpret and inheritance of mitochondrial polymorphisms problematic to predict, much less affect by management. Cattle have been specifically studied, with those results probably transferable to bison and yak. However mitochondrial disease has proven exceedingly difficult to understand even in human and cannot be treated.

"Heteroplasmy is the presence of a mixture of more than one type of an organellar genome within a cell or individual. It is a factor for the severity of mitochondrial diseases, since every eukaryotic cell contains many hundreds of mitochondria with hundreds of copies of mtDNA, it is possible and indeed very frequent for mutations to affect only some of the copies, while the remaining ones are unaffected."

GS Michaels 1982: "Restriction endonuclease analysis and direct nucleotide sequencing of bovine mitochondrial DNA have revealed a high apparent rate of sequence divergence between maternally related individuals. Oocytes had 260,000 dna genomic copies per cell, whereas primary bovine tissue culture cells contained only 2,600 copies. These experiments ... are consistent with models which generate mitochondrial DNA polymorphisms by unequal amplification of mitochondrial genomes within an animal/"

"Mitochondrial diseases arise frequently: 1 in 4000 individuals is at risk of developing a mitochondrial disease sometime in their lifetime. Half of those affected are children who show symptoms before age five, and approximately 80% of them will die before age 20. The mortality rate is roughly that of cancer... The mutation rate of the mitochondrial genome is 10–20 times greater than of nuclear DNA, and mtDNA is more prone to oxidative damage than is nuclear DNA. Mutations in human mtDNA cause premature aging, severe neuromuscular pathologies and maternally inherited metabolic diseases, and influence apoptosis."

An alarming situation has arisen in North American bison at position 98 of cytochrome b. A majority of animals sampled to date (17 of 33, none hybrids, one from Yellowstone) have alanine at this position, a seemingly innocuous but -- as shown below -- clearly a deleterious change from wildtype valine, which is otherwise invariant here throughout mammals and indeed vertebrates (ie unchanged over a hundred billion years of observed branch length). Note canine spongiform leukoencephalomyelopathy arises from the very similar V98M.

The single basepair change resulting in V98A suffices to define the same two major clades of bison established by whole genome comparison of 16,322 bp. The A98 mutation evidently arose in a single female bison, expanded over time to become the sole haplotype in female descendants who then provided -- through the bottleneck effect -- all mitochondria of the vast dispersed herd corresponding to the A98 clade. The rise of a maladaptive allele is not surprising in view of human interference with natural selection.

The V98A mutation is considered at greater detail below, along with 5 mostly deleterious sporadic haplotypes of lower frequency (lesser concern) and 10 other sites where all bison sequences differ from the ancestral amino acid at the time of divergence from yak. These latter probably reflect heteroplasmic lineage sorting of haplotype frequencies though two substitution I316M and M353L are of some concern.

While here only one of 13 mitochondrial proteins is considered below, disturbing findings have been reported for bison mitochondrial tRNAs. This raises the question of the current severity of genetic burden of all endangered wildlife species, not just bison. It may prove very difficult to recover these species to their previous adaptive genetics.

Management options for mitochondrial dna genetics begin with hybrids. Here bison or yak with cattle mitochondria will also have cattle nuclear gene introgression (whose dilutional state today depends on subsequent backcrossing history); this follows without testing for all progeny. Bison residing in fenced preserves are not commonly limited in population size by predation, disease or winter starvation. Since vegetation productivity will only sustainably support a certain population, removal of surplus animals could emphasize female hybrids, as any desirable authentic nuclear genome diversity can be carried forward by bulls (provided the herd is not gender-imbalanced). Because of cheap and reliable testing, cattle mitochondrial introgression may soon be a thing of the past for confined herds under conservation-minded management.

Given the complexities of mitochondrial inheritance however, even in pure bison no selective breeding strategy may exist should multiple mitochondrial genes have widespread adverse polymorphisms, perhaps leaving all surviving haplotypes adversely affected one way or another. Needless to say, alleles of the 20,000 nuclear genes have to be considered at the same time.

Phylogeny: bison and yak are sister species

Bison genomics is best considered within its phylogenetic context. This means first of all parallel consideration of its sister species (nearest living relative) the yak. Although not tropical, both species were dramatically affected by closing of the Darien gap in Panama at 2.5 million years and ensuing unstable climatic change. This led to Pleistocene ice ages: episodic glacial barriers isolating regional herds yet promoting repeated dispersion across Beringia as sea levels fell. Those events manifest today as deep bifurcations of the mitochondrial phylogenetic tree of both species.

BisonPhylo.jpg

However a broader phylogenetic perspective is also essential to provide the outgroup sequences that influence ancestral sequence reconstruction. Here the evolutionary history of cetartiodactyls has taken decades to sort out: the position of whales, once controversial, has been settled (sister, together with hippopotamus, of Ruminantia), as has the non-intuitive branching order of pigs and lamas (Camelidae are basal).

Within pecoran ruminants, difficulty arises not so much from conflicts between fossils morphology and molecular trees but rather rapid radiation of species (hard polytomy), only recently resolved (we hope and assume below) with the bovine SNP bead chip. This samples nuclear genes vastly better than homoplasy-prone microsatellites and sidesteps limitations of mitochondrial inheritance.

In the figure at left, JE Decker et al evaluated 52,356 sites across the nuclear genome not only of cows but throughout ruminants. The resulting tree (antelopes,(giraffes,(deer,((gazelles,sheep),bovinae)))) is critical to understanding the evolution of mitochondrial proteins and evaluating amino acid substitutions -- which are of grave concern for conservation of bottlenecked species such as bison and yak.

Notice that Linnaean taxonomy requires substantial revision according to the tree below -- genera such as Bos, Tragelaphus and Gazella are inconsistent with it. This could be remedied for bison by either placing them in Bos or putting yak, gaur and banteng in the genus Bison. Here the position of gaur and banteng has less bootstrap support than other nodes and has long been contentious. The position of kouprey and mithun (gayal), Bos sauveli and Bos frontalis, as not analyzed with the bead cheap.

There may not be any simplistic nomenclatural resolution because of male introgression as illustrated in european bison (wisent) and zebu cattle. The speciation process is far messier than suggested by bifurcating tree nodes. For example, subsequent to some measure of genomic divergence, wandering bulls from one population can join another or mixed herds of wild taurine cows form. While this does not affect mitochondrial lineages, it does result in periodic introgressions into the nuclear genomes. Since Holocene domestication, cattle have hybridized with aurochs, yak and bison, indicating full speciation barriers still do not exist. Polymorphic alleles represented in an ancestral population at various frequencies may sort out differently in descendant lineages, though this plays out quite differently for nuclear and mitochondrial genomes.

The data situation is otherwise very favorable with over 214 mammalian species having sequenced mitochondrial genomes, with high multiplicities for some individual species such as eland, cow, bison and yak. Individual genes such as CYTB may have extensive additional data from targeted studies. However all data, especially fragmentary older GenBank entries, must be carefully screened for errors and implausible sequence anomalies.
The table below makes no nomenclature proposals whatsoever but simply describes the heuristic terminology adopted here -- driven by that used, right or wrong, at GenBank Taxonomy -- because only that can be used to restrict the blast searches necessary for comparative genomics. To open all 72 article abstracts, click here. Free full text is available for 27.

Acronm Species          Common           Mito CYTB NucG  PubMed
bosSau Bos sauveli      (kouprey)           0    5    0  15522811 16439342 17848372
bosFro Bos frontalis    (mithun gayal)      0   16    0  20331596 20433524 18244904 17560527 
bosGau Bos gaurus       (gaur)              0   17    0  19436739 19777782 19367625 17986322
bosJav Bos javanicus    (banteng)           2   39    0  18937038 18937038 17614913 16922247 12522420
bosTau Bos taurus       (cattle)          168  500    1  19603063 19484124 19393053 19393048 19393045 20347826
bosPri Bos primigenius  (auroch)            1   17    0  18199470 19456314 20346116
bosInd Bos indicus      (zebu)              3  387    0  12648092 19436739 19770222 20597883 18467841 12399392

bosGru Bos grunniens    (yak)              72   53    *  19917041 17257194 18439980 16942892 12137333
bisBis Bison bison      (plains bison)     33    7    0  20870040 20637048 19414501
bisAth Bison athabascae (woods bison)       2    3    0  20808568 18191321 
bisBon Bison bonasus    (wisent eurobison)  4    9    0  14739241 19623210 17177698 15125253 14703870
bisPri Bison priscus    (steppe bison)      0    0    0  15567864 20409351 20212118 18653730 18199470
bisAnt Bison antiquus   (ancient bison)     0    0    0  17256570  9826742 17686730

bubBub Bubalus bubalis  (water buffalo)     4  342    0  17459014 15621663 11212504 19140976 19462514 19207933
synCaf Syncerus caffer  (cape buffalo)      0   10    0  10603253  9126673  9987926 17313588 17459014 14715223

traScr Tragelaphus scriptus (eland)         0  172    0  10222159  7723053 17520013
traSpp Tragelaphus others (7 spp eland)     0    7    0  10380679

bseTra Boselaphus tragocamelus (nilgai)     0    3    0  10603253  17158073

In the table, sequence availability counts do not include poor quality fragments or inadvertent hybrid data, eg 13 nominal Bos frontalis entries are instead introgressions from Bos indicus and misplaced at GenBank.

Yak nuclear genome sequencing is in progress at Beijing Genomics Institute. Other cetartiodactyl genomes in progress include Camelus bactrianus and Ovis aries with Camelus dromedarius and Pantholops hodgsonii completed but not released. Other relevant genomes said to be underway include Bubalus bubalis, Addax nasomaculatus, Muntiacus muntjak, Hippopotamus amphibius, and Balaena mysticetus. Cow, pig, sheep, and vicuna genomes have long been available for blast search.

These additional genomes would allow fossil nuclear numts to contribute to understanding of mitochondrial gene evolution, making the mitochondrial proteome of ancestral species such as Leptobos (last common ancestor to cattle, bison and yak) easy to work out. Note too that the mitochondrial genome, although not targeted, gets sequenced to very high multiplicity as a byproduct. To date, such projects have produced single mitochondrial genomes. This however is surely wrong in view of the prevalence of heteroplasmy: most species host a population of significantly different mitochondrial genomes. Thus these genome projects are a golden opportunity to characterize mitochondrial genome diversity within single species.

GenBank sequences are often retrieved blindly and run through extensive software pipelines to provide some conclusion. However it is imperative to manually curate accessions prior to analysis because a certain percentage of legacy entries are completely inappropriate. This ranges from attribution to the wrong species, gross and subtle sequence errors, reduced reliability at sequence termini, redundant entries, unpublishable submissions from third-world countries, mixups of mitochondrial and nuclear dna, lab dna contamination, text processing mishaps during the submission process, to outright data fraud. Below, bison and yak and their contextual species are considered individually.

  • Both complete and fragmentary aurochs (Bos primigenius) accessions condense to two sequences sufficient to represent all GenBank aurochs data on 8 Dec 10 namely ACE76876 ADE05539 which differ as I4F T23A V372I (latter two changes are sporadic for ACE76876). Aurochsen became extinct in 1627 due to overhunting and the loss of habitat. Their mitochondrial genome still persists in a few Italian and Korean cows.
  • The nine GenBank sequences for european bison (wisent) condense to a single representative sequence, for example ADF29596. Here it must be noted that ADQ12704 has a terrible sequencing error introducing ETTAEF for VNYGWI -- unfortunately this sequence has been used uncritically in published analyses. CAA75238 is also defective distally, a poor quality sequence from 2005 that was never published. Blast shows beyond any doubt that the known wisent sequences are not remotely affiliated with bison but instead are Bos taurus (not even aurochs Bos primigenius). It has long been suspected that wisent originated from a bison bull naturally crossed with a taurine cow. It follows wisent mitochondrial genomes will not be terribly informative for bison or yak.
  • The extinct steppe bison, Bison priscus, has no protein sequences among its 298 GenBank entries, only control regions. Complete mitochondrial genomes from this species would be very informative -- evidently the dna is readily collected.
  • The kouprey Bos sauveli has five CYTB sequences but only one full length, AAV51239. Two fragmentary entries are polymorphic relative to this at T248I, namely ABB88561 ABN73101. Here care must be taken as kouprey bull x banteng cow hybrids are known, causing confusion as to kouprey status as distinct wild cow species.
  • Domestic cattle have a vast amount of sequence data, much breed-specific. The detail anomalies of inbred animals are not especially informative to wild bison or yak. Since however many of the sobserved cattle substitutions are radical chemical changes at highly conserved sites in a vital enzyme, the question arises as to how these animals survived to adulthood. The answer is probably heteroplasmy, with late onset, that is compensation via wildtype mitochondrial dna that persists in some mitochondria to some extent. Exercise intolerance -- a common outcome of human cytochrome b deficiency -- would hardly be noticed in a cow prior to the animal's arrival at a slaughterhouse. Two CYTB sequences have very high multiplicity, represented by AAM12814 AAW78524 at 208 and 71 copies. The latter differs at V356I I372V. The single-site polymorphisms shown below arise in AAV88174 BAC54760 AAZ16727 AAW83829 AAT80776 AAV88122 AAS93073 AAZ16896 AAT80776 AAZ95368 AAZ95339 AAS93061 AAM08329 AAZ95354 AAZ16545 AAZ95379 AAZ95338 AAZ95338 ABV70594 AAW78531 AAZ95334 AAZ95385 AAZ95378 ABV70763 ACQ73865 ACQ73761 AAZ95359 AAZ17091 BAA07016 AAV88161 ABV70555 AAZ95331 AAV88135 AAZ95405 AAZ95389 AAV88187 AAZ95386 AAZ16688 AAZ95339 AAQ06605 BAC20256 ACQ73813 AAW78527 respectively.
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAILRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASVLYFLLILVLMPTAGTIENKLLKW
................T..........................................................D...W...................................................L.........................................................K.........................................................N..............................................V.....S....................................................................S.........
.................L...........................................................M............................................................................................N..................T...............................................................N...............................................T....................................................................AV.......
......................T......................................................V...................................................................................................................T....................................................................................................T.........................T...S....T.........................I...............V.......
..............................S..............................................V....................................................................................................................I.................................................................................................................................S....T.........................I.......................
................................................................................C.............................................................................................................................I.......................................................................................................................T....................................................
.........................................................................................V....................................................................................................................I....L.....................................................................................................................................S.................................
...................................................................................................G..................................................................................................................A....................................................................................................................................T...............................
......................................................................................................N...............................................................................................................A......................................................................................................................................F.............................
...............................................................................................................A......................................................................................................M............................................................................................................................................I.......................
.....................................................................................................................L.................................................................................................N.........F.........................................................................................................................................................
.......................................................................................................................................................................................................................................T...................................................................................................................................................
  • The gayal (or mithun) Bos frontalis has 28 full length CYTB sequences. These fall into two very distinct groups, suggesting introgression of female mitochondria from another species. According to blastp, this species is Bos taurus or Bos indicus, a conclusion also reached for Yunnan gayal. Here ABO07421 most parsimoniously represents the first group should that be desired, with ACF17717 BAJ05325 identical, ABO07426 differing by a sporadic L376V, and ACN12147 differing by F296L and K375N. A derived subgroup has I356V and I372V and sporadic A291V, namely ABO07428 ABO07427 ABO07425 ABO07422 ACF17716 ABO07419.
  • The second group of 16 near-identical gayal sequences can be represented by ABO07423. This set contains four sporadic mutations N3S D252N F276L I298L and two sites of shared polymorphism with the first group, T232A K375N. It differs consistently from the first group at 6 sites, I39V V215A A232T A302I A327T L357M and so bears much closer relationships -- given the strong conservation of this protein -- to Bos gaurus and secondarily Bos javanicus (2 and 4 differences respectively) than to Bos taurus or Bos indicus (6 differences at best). Only this second group is usefully included as an outgroup to yak and bison.
  • Thus the first choice for Bos frontalis conservation genomics -- based solely on CYTB -- involves animals represented by the second group ABO07424 ACF17720 BAJ05320 ABO07423 ABO07420 ACF17718 ABS18292 AAV51237 BAJ05321 BAJ05322 ABS18291 with possible inclusion of ABO07418 ACF17719 for diversity but not sporadics BAJ05323 BAJ05324 ACM24710 unless other considerations warrant it. Based on skimpy GenBank entries, these animals are called Dulong cattle in China but mithun in Myanmar and Bhutan. This is apparently corroborated by a 2010 study utilizing 16S mitochondrial rRNA. Nuclear genes also are very important to consider.
  • Bos gaurus has 17 entries including 3 where a nucleotide was submitted but not a translation (causing protein queries to miss them). After observing that the fragmentary sequences where not flawed are merely supportive, the set can be pruned to six. However two of these (ABF20228 ABF20227) are actually maternal Bos indicus/taurus sequences. The remaining four are practically identical to Bos javanicus/frontalis but differ from each other at 6 sporadic sites V39I A62V Y95H T108P L105P T190M N206I. This species will not prove useful to bison/yak comparative genomics but one sequence ADB80894 is retained below.
  • The banteng Bos javanicus has an excellent set of complete sequences among its 35 entries for cytochrome b. After noting sporadic variation and checking for hybrids, a set of three sequences ABS18295 ABW82495 ABW82494 suffices to represent population diversity. Banteng do appear quite diverse, with several substantial variants supported by sequences from multiple individuals. Some clearly deleterious mutations are also evident, such as R80W in ADC53249. Sequences such as ABW82495 are peculiar in having 8 substitutions, suggesting a hybrid, yet with what is unclear: possibly a remote ancestor of Bos taurus or some extinct lineage not otherwise represented today. This sequence is supported by AAV51238 BAA11625 BAA07017 and so cannot be sequencing error; their disparate GenBank entries do not provide locational information.
  • For the zebu, Bos indicus, 20 full length sequences are available (in addition to hundreds of fragments not considered further). These however are all identical with the exception of a sporadic variation T67I in ABS18290. Thus ABO07435 can serve to represent this species. It differs from the most abundant Bos taurus allele (208 entries) at only two distal positions I356V and V372I.
  • Bison, yak and cattle have buffaloes as outgroup. Here Syncerus caffer (cape buffalo) has 10 CYTB sequences, only 2 of which are informative, AF036275 BAA11624. The latter differs at H3N T56S I295V.
  • An extraordinary amount of data exists for water buffalo (Bubalus bubalis) -- some 165 CYTB sequences (after dropping defective entries ABO20788, ABO26586, BAJ05824 and discarding boundary variation of fragmentary sequences) of which 44 are essentially full length. However very little polymorphism occurs. In the first half of the molecule, 8 sites exhibit variation but only in unique individuals, making it impossible to distinguish sequencing error from authentic one-off events(which themselves could be non-heritable heteroplasmy. This is remarkably low (0.02%) in an alignment with 165 x 190 aa = 31,350 residues.
  • The second half of Bubalus cytochrome b exhibits higher variation. Three individuals carry A191G, 28 have T246A, five are I365V and seven I372V, in addition to eight scattered sporadic variations. All the I372V individuals -- chinese water buffaloes -- are also T246A. The remaining 21 T246A animals apparently originated in China, Japan and Thailand but details remain unpublished. Non-sporadic variation in water buffalo is satisfactorily represented by GenBank accessions ACF17726 ABR08397.
  • Syncerus is surprisingly diverged from Bubalus (12 positions): L102M T122A N159S I195V S246T I290V I293L L320F D331N M357T T371M. Only two of these positions are polymorphic in cape buffalo H3N and I295V; water buffalo are all 3N and 295V making those ancestral, with no indication of lineage sorting. This species is satisfactorily represented by AF036275 BAA11624.
  • Can there be too much data? GenBank carries 172 CYTB sequences for Tragelaphus scriptus and its 30 subspecies (sylvaticus, uellensis, signatus, scriptus, simplex, sassae, roualeyni, punctatus, powelli, pictus, phaleratus, ornatus, meruensis, meridionalis, meneliki, massaicus, locorinae, knutsoni, johannae, heterochrous, haywoodi, fasciatus, dodingae, dianae, delameri, decula, dama, cottoni, bor, barkeri). However only two of these are full length, AF036277 AAD13501 (and differ at 7 sites) with the rest older and running from residue 138 to 232. Despite dropping poor quality sequences, considerable variation remains, both of sporadic and sub-clade type. To track this without sequences proliferating too much, a third quasi-sequence consisting of AF036277 substituted in silico with all major non-sporadic alleles -- which cannot represent sequencing error -- was made below, called CYTB_traScr3.
  • Seven other species of Tragelaphus also have full length sequences available -- T. eurycerus, strepsiceros, imberbis, oryx, angasii, spekii, and derbianus. These sequences are moderately diverged from each other. They are fairly old in terms of sequencing technology used - 1999. Nonetheless, AAD51427 AAD51431 AAD13498 AAD13491 AAD42706 CAA10935 AAD13496 have been added to the sequence base below to represent this diversity. Tragelaphus is a large and important outgroup for bison/yak/cattle.
  • Five of seven posted sequences for Boselaphus tragocamelus (nilgai) are poor quality fragments, illustrating a pitfall for blast searches. However the two full length sequences are in complete agreement. Here CAA10934 will be taken as reference sequence.

The goals here are to reduce the clutter from redundant sequences allowing an informative final alignment without discarding significant allele data or losing track of species multiplicities. This information can be retained within the alignment by a carefully designed fasta header. (Some web tools cut off the header at 10 characters but others allow any length.)

Interpreting bison CYTB variation

BisonsCytb.jpg


Bison mitochondrial genomes are well-represented at GenBank because of a Dec 2010 release by the JN Derr group of 31 complete genomes (along with various cow-bison hybrids and cow breeds) from 6 herds including two woods bison (sometimes denoted Bison athabascae) from a wood bison herd in Elk Island, Canada that was not historically admixed with plains bison. Their mitochondrial genomes did not however form a separate clade expected of a distinct taxon.

Cattle-bison hybrids represent crossing a bison male with domestic cow (or rather a continuous line of female descent from such a cross) and so have strictly cow mitochondrial dna, not relevant here because wild yak and aurochs provide more appropriate outgroups than a domesticated animal. Note however the haplotype of all bison hybrids studied cluster with cow haplotype cHap32 which may shed light on the historic cow lineage involved in late nineteenth century cattalo experiments. (The Derr group also posted a complete mitochondrial genome HQ223450 from European bison on 15 Nov 10 that -- like all to date -- is a taurine hybrid.)

Bison CYTB protein accessions: wood bison
 ADF48936 ADF48949 ADF48962 ADF48975 ADF48988 ADF49001 ADF49014 ADF49027 ADF49040 ADF49053 ADF49066
 ADF49079 ADF49092 ADF49105 ADF49118 ADF49131 ADF49144 ADF49157 ADF49170 ADF49183 ADF49196 ADF49209
 ADF49222 ADF49235 ADF49248 ADF49261 ADF49274 ADF49287 ADF49300 ADF49313 ADF49326

Earlier bison protein accessions:
 ABV70945 AAD51424          (ABV70945 complete genome: YP_002791041 derived from it; AAD51424 complete gene only)
 AAW28804 AAW28803 AAL85955 (fragmentary)
 ADM87433                   (uninformative fragment)
 AAN28295                   (taurine hybrid poor quality)

Non-redundant protein set (with multiplicities): pick one from each row
 18 98A:       ADF49092 ADF49170 ADF49118 ADF49248 ADF49131 ADF49300 ADF48936 ADF48949 ADF48962 ADF49001 ADF49027 ADF49040 ADF49157 ADF49183 ADF49196 ADF49261 ADF49066 AAW28803 (frag)
  1 98A V132D: AAD51424
  1 98A Q322R: AAL85955 (frag)
 13 V98:       ADF49105 ADF49209 ADF49014 ADF48975 ADF49144 ADF49222 ADF49287 ADF49235 ADF49274 ADF49313 ADF48988 ADF49053 AAW28804 (frag) 
  1 V98   N3S: ADF49079
  1 V98  I42T: ABV70945
  1 V98 V123M: ADF49326

Thus for comparative genomics purposes, all available authentic bison cytochrome b data on 11 Dec 10 can be represented by just three sequences (one a constructed composite of all polymorphisms). This facilitates comparison of amino acid variation with yak and other species. The fasta headers are designed to display informatively after alignment. Apart from V98A, the other 5 variations are sporadic (observed in only one animal to date). They are analyzed in great detail below to determine which are deleterious mutations.

CYTB_bisBis_V98 wild type
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis_98A major variant
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW
 
>CYTB_bisBis_all N3S I42T V98A V123M V132D Q322R all-allele composite 
MTSLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLTLQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTMMATAFMGYDLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSRCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

The alignment below shows bison CYTB aligned against its nearest living relatives within Bovinae. Data from nearly a thousand individual animals are compressed without significant loss of information into the 28 lines of the alignment. The order of species corresponds to the topology of the phylogenetic tree, facilitating interpretation of individual sites in bison (or its sister species yak). If a residue is invariant in the preceding 4-5 levels of outgroup but changes to another amino acid in bison, that change needs detailed evaluation. The main possibilities are:

  • near-neutral wander within the acceptable reduced alphabet for that site (blue) with modestly increased sampling likely to reveal the outgroup value within bison and lineage sorting likely as the site is persistently polymorphic
  • deleterious mutations (red) that nonetheless persist in bison due to population bottleneck expansions or drift to unnaturally high frequencies under non-adaptive management. This includes private polymorphisms affecting one known animal, semi-population level changes such as V98A and changes fixed since divergence from yak.
  • synapomorphic change in cytochrome b (green) at the bison/yak divergence node, possibly adaptive (improving fitness relative to environment) but more likely just sites affected by lineage sorting of a reduced alphabet present in the ancestral population.

These substitutions are then discussed individually below using an advanced nsSNP evaluation protocol that considers the physical-chemical nature of amino acid change (Grantham value and later refinements such as PolyPhen2), site-specific phylogenetic tree-aware comparative genomics (along the lines of TreeSAAP), and clade pattern analysis (ie random dispersement or sub-clade persistent of synapomorphic or phyloSNP type) of homoplasic occurrences of the change elsewhere in mammals.

While nsSNP interpretation can never be perfect, here the analysis will be extraordinarily reliable for two reasons: the truly massive data set that exists for this particular protein (12,603 sequences in 1,637 mammals utilized below) and the relatively slow evolution of CYTB (still 83% identity between bison and platypus/echidna proteins) that allows the data set to retain applicability.

                          10        20        30        40        50        60        70        80 b562   90  b566 100       110       120       130       140       150       160       170       180 b562  190
                           |         |         |  <------- TM1 ----|-->      |         |     <---|--*- TM2 |------*> |         | <-------|- TM3 ---|---->  <-|----- TM4|-------> |         |       <-|-* TM5 --|
CYTB_bisBis_V98   MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIM
CYTB_bisBis_all   ..S......................................T.......................................................A........................M........D..........................................................
CYTB_bosPriW      ...F..................................I...........................T...........................................................................................................................
CYTB_bosPriM      ...I..................T...............I...........................T...........................................................................................................................
CYTB_bosSau       ...I....................P.............V...........................T.....................................................I....................................................................A
CYTB_bosfroI      ...I..................................I...........................T...........................................................................................................................
CYTB_bosFroW      ...I..................................V...........................T..........................................................................................................................T
CYTB_bosGau1      ...I..................................V...........................T..........................................................................................................................T
CYTB_bosJav1      ...I....................P.............V...........................T.....................................................I....................................................................T
CYTB_bosJav2      ...I....................P.............V.................P.........T...........................................................................................................................
CYTB_bosJav3      ...I............T.......P.............V...........................T.....................................................I....................................................................T
CYTB_bosInd       ...I..................................I...........................T...........................................................................................................................
CYTB_bisBon       ...I..................................V...........................T...........................................................................................................................
CYTB_bosTau1      ...I..................................I...........................T...........................................................................................................................
CYTB_bosTau2      ...I..................................I...........................T...........................................................................................................................
CYTB_synCafW      ..HI.........L........................I.................................................................................F....................................................................A
CYTB_synCafP      ...I.........L........................I................S................................................................F....................................................................A
CYTB_bubBubW      ...I.........L........................I..............................................................M..................FA....................................S..............................A
CYTB_bubBubP      ...I.........L........................I..............................................................M..................FA....................................S..............................A
CYTB_traScr1      ...I..................................I....................M......T.......H..........................M..................F.....................................S..............................A
CYTB_traEur       .I.I..................................I...........................T..................................M..................F.......T.............................S..............................T
CYTB_traStr       ...I..................................I...........................T............................V.....M..................F....................................................................A
CYTB_traImb       .I.I..................T.P.............I..V.................M......T.....................................................F....................................................................A
CYTB_traOry       ...I..................T...............I..T........................TD.................................M..................F.....................................S..............................A
CYTB_traAng       ...I..................................V....................M......T...............................................V.....FM...................................................................T
CYTB_traSpi       ...I..................................I...........................T..................................M..................F.....................................S.........................F....A
CYTB_traDer       ...I..................................I...........................T..................................M..................F.....................................S..............................A
CYTB_bseTra       ...I..................................I....................M...A..T.....................................................F....................................................................A
                    b566 200       210       220       230       240       250       260       270       280       290       300       310       320       330       340       350       360       370
                  - TM5*-> |         |         |         <---- TM6 |--------->         |         |         |       <-|--- TM7 -|-------> |         |  <------|TM8 -----|-->     <|--------|-TM9----->|
CYTB_bisBis_V98   AIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAILRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW
CYTB_bisBis_all   ...................................................................................................................................R.........................................................
CYTB_bosPriW      ........................V..............................A.....................................................................M..........A.....................T...L..VL......................
CYTB_bosPriM      ........................V..............................A.....................................................................M..........A.....................T...L..VL..............V.......
CYTB_bosSau       ....................V...V.............T................A.......................................................I.............M...............................TT...L..................V.......
CYTB_bosfroI      ........................V..............................A.....................................................................M..........A.....................T...L...L..............V.......
CYTB_bosFroW      ........................A................T.............A.......................................................I.............M................................T...L..................V.......
CYTB_bosGau1      ........................A................T.............A.......................................................I.............M................................T...L..................V.......
CYTB_bosJav1      ....................V...A..............................A.......................................................I.............M................................T...L..................V.......
CYTB_bosJav2      ........................V..............................A.....................................................................M..........I.M...................T...L..................V.......
CYTB_bosJav3      ....................V...A..............................A.......................................................I.............M................................T...L...T..............V.......
CYTB_bosInd       ........................V..............................A.....................................................................M..........A.....................T...L...L..............V.......
CYTB_bisBon       ........................T......................T.......A.......................................................I.............M..........A.....................T...L..........................
CYTB_bosTau1      ........................V..............................A.....................................................................M..........A.....................T...L..VL......................
CYTB_bosTau2      ........................V..............................A.....................................................................M..........A.....................T...L...L..............V.......
CYTB_synCafW      .L..I...................T..............................S................................................IL.....IIM...........M..........I.........................L................S....N....
CYTB_synCafP      .L..I...................T..............................S................................................VL.....IIM...........M..........I.........................L................S....N....
CYTB_bubBubW      .L......................T...............................................................................VL.....I.M...........M...F......I...N.....................L...T............SM...N....
CYTB_bubBubP      .L......................T..............................A................................................VL.....I.M...........M...F......I...N.....................L...T............SMV..N....
CYTB_traScr1      .L...................P.........................I.......A................................................V......I.M...........M..........I.A.......................L.......I......ATSM...SF...
CYTB_traEur       .L.....................N.......................T.......A................................................VL.....I.M....M......M..........I.A.......................L.......I......VTSM...NF...
CYTB_traStr       .L...........................................V..........................................................VL.....IFL...........M..........I....................M....L..............VTSM...NF...
CYTB_traImb       .L.L....................T..............................A................................................ILT..MPI.M....A......M..........I.........................L..............M..S...N....
CYTB_traOry       .L......................T........H.............T.......A................................................VL.....I.M...........M..........V.A.......................L..............V.SM...NF.--
CYTB_traAng       .LV...............................................V.....................................................VL.....I.M....M......M..........L.........................L...I..........VIS....N....
CYTB_traSpi       .L.......................................V.....T.......A................I...............................VL.....I.M....V......M..........I.A.......................L.......I......ATSM...NF...
CYTB_traDer       .L.I...................................................A................S...................L...........VL...V.M.M...........M.......F..I.A............L..........L..............V.SM...N....
CYTB_bseTra       .L..I...................A.........................M....A................S............................M..VL.....I.M...........M.......M..I...N.....................L................SM...N....

Site by site analysis of bison sporadic and sub-population variation in CYTB:

   N3S    I42T    V98A    V123M    V132D    Q322R
4345 N  2403 I  4522 V   4409 V   4981 V   4993 Q
 132 I   719 A   430 I    483 T      9 I      2 R
  70 H   645 G    34 M     73 I      5 L      2 P
  14 Y   640 M    11 A     18 L      3 C      2 K
   9 K   359 V     1 L      7 A      2 D      1 D
   5 S   167 T     1 N      5 M    
   2 T    52 L     2 G    
          10 S       			

N3S: MT plains bison GU946987. Deleterious despite physical-chemical similarities of asparagine and serine. Based on 12,603 CYTB sequences from 1,637 species of mammals, this substitution has never gained traction in any clade despite being a simple 1 bp transition (codon AAC to AGC) that must have arisen a great many times in one species or another.

I42T: IT plains bison EU177871. Considerable amino acid flexibility exists at this site for cytochrome b. While threonine might be sub-optimal, it is not plausible that 167 species all have mitochrondrial disease because of it. Nor could a substantial fraction of these other occurences reflect sequencing error. Since the species are phylogenetically quite dispersed, the codon change here ATC to ACC arose and gained predominance many times. The Italian sequencers have not disclosed any details about the source of this bison dna.

V123M: EI woods bison GU947006. Threonine appears to be a fully functional alternative to valine but methionine is not. The 5 accessions having methionine do not comprise a coherent taxonomic clade but rather occur sporadically. As in this woods bison, the mutation in the other species may simply reflect heteroplasmy in the tissue sample used as dna source rather than a germline condition that would give rise to full-blown mitochondrial disease.

V132D: FR plains bison AF036273. The bison here resided at a Paris zoo. The peculiar change reported, AT --> TA, might be viewed as a 2 bp inversion rather than a double point mutation. However it is far likelier, given that the sequencing was done with 1999 technology, that an error occurred in the process of making the GenBank submission. The substitution is so radical at such a conserved site that lethal mitochondrial disease would likely have resulted if not a minor heteroplasmic haplotype.

Q322R: found only in a fragmentary plains bison sequence, AAL85955. This is again a radical change at an invariant site so either sequence error, minor heteroplasmy, or causative for mitochondrial disease. As with other sporadic mutations, there are no management implications unless broader sampling uncovers more individuals with this haplotype.

V98A: 18 plains bison. This site, located at the very end of transmembrane helix 2 just past the axial iron histidine ligand, is very important to bison conservation genomics. Of the 33 bison evaluated at this positon, ancestrally valine), 18 animals are V98A but not either wood bison nor outgroups yak, aurochs or other Bovinae. Two of the sporadic variations occur among the 15 bison comprising the V98 clade, namely N3S in a MT private herd bison and V123M (in outlier wood bison wHap14). V123M may not remain sporadic as more Elk Island animals are sequenced; if more common, it becomes of management concern as it too is deleterious.

BisonHaploDerr.jpg

The single change V98A corresponds perfectly to the two major clades, with 98A shared by all individuals in the upper half of the tree ending in bHap2. In the overall mammalian context, A98 is a non-adaptive derived condition (synapomorpy) of the upper clade of bison. Note V98L is a domestic yak mutation described below. Although the vast majority of mammals are V98, isoleucine is also common with methionine fairly rare among the 12,603 CYTB sequences in 1,637 mammals considered. The statistics: 4522 V, 430I, 34M, 11A, 1L, 1N.

The other occurrences of alanine are scattered and shallow in the phylogenetic sense, ie alanine has never become established in another mammalian subclade despite tens of billions of (geologic) branch time accessible to study. These other species with V98A are Castor fiber (4 subspecies of beaver), Anomalurus (rodent), Eptesicus hottentotus (bat), Herpestes naso (mongoose), Genetta johnstoni (carnivore), Hyaena hyaena (hyena), and Macroscelides proboscideus (elephant shrew).

No internal compensation by a co-evolving residue elsewhere in CYTB can occur since V98A is the sole residue change. Conceivably a change in one of the ten nuclear encoded proteins targeted to mitochondria Complex III (of which oligomeric partners cytochrome c1 and Rieske iron sulfur proteins are the likeliest candidates). The concept of balanced polymorphism (along the lines of E6V of sickle cell hemoglobin) also seems inapplicable.

Looking now at 500 non-mammalian cytochrome b -- ie at species predating the bison divergence from birds at 310 myr, not a single alanine occurs. Valine no longer dominates at 186 species (37%) but instead the closely related branched chain aliphatic isoleucine at 314 occurrences. Cytochrome b has been very particular about position 98 for a very long time.

The next level of consideration, beyond the private and sub-population variation considered above, are sites the same in all bison but different from a conserved ancestral value at yak divergence. These can be seen in the dot alignment above as residue columns identical for the progressive outgroup (ie yak, wild cattle, water buffaloes, elands, nilgai) but another amino acid at bison.

These 9 additional sites need to be carefully evaluated because they might be deleterious alleles that have spread to all bison (rather than just a sub-population in the manner of V98A). Alternatively, they might simply reflect neutral drift within the acceptable reduced alphabet for the respective sites, an innocent haplotype that become predominant in the stem lineage. Another less common possibility is adaptive change, part of what makes a bison a bison rather than a yak. Note heteroplasmy is inapplicable here because the change in so many bison is clearly being inherited.

Evaluating these as before (phylogeny-aware frequencies), most of the changes are fully consistent with near-neutral drift within the reduced alphabet. Here the notation is slightly different: in I4L etc, I is the outgroup consensus, 4 the numbering within CYTB, and L is bison variant. The only causes for concern here are I39M, I316M and L353M because in these the observed frequency of the bison amino acid is quite low. L353M may be significantly sub-adaptive given the invariance of leucince and rarity of methionine and similarly for I39M and M3126I. The same holds for M316I. A246T is a bit peculiar in that both bison and outgroups have less common residues. V215M does not have that decisive an outgroup value and can be dismissed as neutral.

   I4L    I39M     T67A    V215M    A246T    M316I    T349I    L353M    V372I
3757 I  2399 I   4513 T   1682 S   3096 S   3857 I   3701 I   3913 L   2805 I
 319 L  1640 V   456 A   1522 A    765 F    650 T    761 T    413 T    616 M
 287 M   688 L    14 S    741 M    611 T    230 M    199 V    395 V    526 V
 244 T   140 M      7 M    528 T    455 A    127 A    165 L    146 M    415 L
  16 F   101 A     6 V    354 P     26 M     95 S     85 M     69 I    363 F
  10 V    17 T     2 I     91 V     16 V     24 L     75 A     44 A    202 A
   3 Y    13 F             34 L     12 L     15 V      3 S     13 F     40 T
   3 P     2 M             32 C      7 Y                        3 F      5 S  
   2 S                     6 I       5 N                                 2 H  
   2 L                     5 Y       2 I          
   2 C                               2 C          
                                     1 Y          


Sequences are color clustered according to the haplotype tree. bHap1 is not shown. Note the woods bison cannot be resolved from the plains bison even though the Elk Island woods bison are a relic herd that did not mix with 7,000 plains bison imported from the Flathead Reservation in Montana up to Canada's Wood Buffalo National Park in the 1920's. Clearly these animals are a mixture of the second major clade of bison with an earlier diverged lineage represented by wHap14 surviving (at least in mitochondrial dna) from the founder herd. This could represent allopatric separation during a glaciation epoch with subsequent reunification. However the prevalence of wHap14 needs to be established along with uniqueness of its nuclear dna.

NucAcc  	ProAcc  	PubMed  	ST	Locale	TYP	MUT 	BP Change 	Isolate	Haplo	Source Herd
GU946976	ADF48936	20870040	MT	plains	A98	V.98A	GTA to GCA	B790 	bHap2	Montana private herd
GU946977	ADF48949	20870040	MT	plains	A98	V.98A	GTA to GCA	B853 	bHap2	Montana private herd
GU946978	ADF48962	20870040	MT	plains	A98	V.98A	GTA to GCA	B854 	bHap2	Montana private herd
GU946981	ADF49001	20870040	MT	plains	A98	V.98A	GTA to GCA	B880 	bHap2	Montana private herd
GU946983	ADF49027	20870040	MT	plains	A98	V.98A	GTA to GCA	B925 	bHap2	Montana private herd
GU946984	ADF49040	20870040	MT	plains	A98	V.98A	GTA to GCA	B929 	bHap2	Montana private herd
GU946986	ADF49066	20870040	MT	plains	A98	V.98A	GTA to GCA	B959 	bHap2	Montana private herd
GU946993	ADF49157	20870040	MT	plains	A98	V.98A	GTA to GCA	B1029 	bHap2	Montana private herd
GU946995	ADF49183	20870040	MT	plains	A98	V.98A	GTA to GCA	B1050 	bHap2	Montana private herd
GU946996	ADF49196	20870040	MT	plains	A98	V.98A	GTA to GCA	B1051 	bHap2	Montana private herd
GU947001	ADF49261	20870040	NB	plains	A98	V.98A	GTA to GCA	BNBR1 	bHap2	National Bison Refuge
GU947004	ADF49300	20870040	YP	plains	A98	V.98A	GTA to GCA	BYNP1586 	bHap17	Yellowstone Natl Park
GU946990	ADF49118	20870040	MT	plains	A98	V.98A	GTA to GCA	B985 	bHap10	Montana private herd
GU946991	ADF49131	20870040	MT	plains	A98	V.98A	GTA to GCA	B1005 	bHap10	Montana private herd
GU947000	ADF49248	20870040	NB	plains	A98	V.98A	GTA to GCA	BFN5 	bHap10	Fort Niobrara
GU946994	ADF49170	20870040	MT	plains	A98	V.98A	GTA to GCA	B1031 	bHap11	Montana private herd
GU946988	ADF49092	20870040	MT	plains	A98	V.98A	GTA to GCA	B973 	bHap8	Montana private herd
AF036273	AAD51424	10603253	FR	plains	A98	V132D	 AT to TA 	.....	.....	Vincennes Zoo 1999
GU946979	ADF48975	20870040	MT	plains	V98	.....	..........	B855 	bHap3	Montana private herd
GU946992	ADF49144	20870040	MT	plains	V98	.....	..........	B1018 	bHap3	Montana private herd
GU946998	ADF49222	20870040	MT	plains	V98	.....	..........	B1191 	bHap12	Montana private herd
GU946980	ADF48988	20870040	MT	plains	V98	.....	..........	B877 	bHap4	Montana private herd
GU946985	ADF49053	20870040	MT	plains	V98	.....	..........	B935 	bHap6	Montana private herd
GU946989	ADF49105	20870040	MT	plains	V98	.....	..........	B979 	bHap9	Montana private herd
GU946997	ADF49209	20870040	MT	plains	V98	.....	..........	B1091 	bHap9	Montana private herd
GU946982	ADF49014	20870040	MT	plains	V98	.....	..........	B897 	bHap5	Montana private herd
GU947006	ADF49326	20870040	EI	woodsB	V98	V123M	ATA to GTA	wEI14	wHap14	Elk Island
EU177871	ABV70945	18302915	IT	plains	V98	I.42T	ATC to ACC	.....	.....	unknown Italy
GU946987	ADF49079	20870040	MT	plains	V98	N..3S	AAC to AGC	B961 	bHap7	Montana private herd
GU946999	ADF49235	20870040	MT	plains	V98	.....	..........	B1428 	bHap13	Montana private herd
GU947002	ADF49274	20870040	TX	plains	V98	.....	..........	BTSBH1001 	bHap13	Texas Sate Bison Herd
GU947003	ADF49287	20870040	TX	plains	V98	.....	..........	BTSBH1005 	bHap16	Texas Sate Bison Herd
GU947005	ADF49313	20870040	EI	woodsB	V98	.....	..........	wEI1	wHap15	Elk Island

Variation in all 13 bison and yak mitochondrial proteins

BisonSumVar.gif
Bisyakcomp.gif
YakSumVar.gif


The variation observed in the entire mitochondrial proteome can be readily interpreted along the lines of CYTB above. First note COX1 COX2 COX3 ND2 ND3 ND4L ND6 are completely conserved in all bison complete genome data. ND1 is also conserved with the exception of a sporadic near-neutral substitution S269L in a woods bison (ND1: EI_GU947006_wHap14). This degree of conservation makes it unlikely that the founder population harbored a high frequency deleterious allele of hyper-mutating nuclear genes such as POLG.

Next note that 9 of 10 overall sporadic substitutions (F138L in ND5: A98_MT_GU946988_bHap8 being the exception), are concentrated in the CYTB-determined V98 clade even though it represents fewer animals (the multiplicity column shows 17 bison in the A98 clade and 15 in V98). This is consistent with the A98 haplotype being of much less diverse (more recent origin) and indeed this is borne out by whole genome blastn comparisons.

Recall that sporadic substitutions -- even when clearly deleterious like N88I in ND4:V98_MT_GU946980_bHap4 -- may not be fully heritable but rather simply reflect heteroplasmic amplification of an uncommon germline haplotype in the tissue used for dna sequencing. Alternatively they could simply represent somatic mutation and not be represented at all in germline dna. Consequently even deleterious sporadic mutations may not have major phenotypic effects. However if the same mutation shows up again as more bison are sampled, the interpretation shifts towards heritability (since the same mutation would not arise independently in a still-small sample) and, for N88I, significant effect on fitness.

Semi-systemic (clade-level) variations affecting multiple animals are definitely maternally heritable to an extent determined by germline haplotype prevalence. Bison have 8 such substitutions in 5 of their 13 mitochondrial proteins, namely CYTB:V98, ND5:Y159H, ND4:A314T and ND4:L442M, ATP6:T182M and ATP6:A177T, ATP8:E38K.

All of these fall along the lines already established by CYTB:V98A with the exception of ATP6:A177T, ND4:L442M and ATP8:E38K which are restricted to (and define) subclades of the major CYTB:V98 subgroup. All three of these substitutions classify as somewhat sub-normal but not outright deleterious. That is, the same substitutions are observed in too many other species to be outright mutations (consistent with fairly benign amino acid attribute change), yet are not so common as to be on an equal fitness footing with the major components of the reduced alphabet. Bison carrying ATP6:A177T can be predicted to be least affected in view of threonine being the second most frequent residue at this position. Since we don't fully understand the significance of these changes, they represent genetic variation that should be protected by conservation genomic management as adaptive or adaptive in combination with other alleles in other mitochondrial or nuclear genes, now or under later environmental circumstances.

ATP6:A177T   ND4:L442M   ATP8:E38K
     998 A       221 L       215 E
      88 T        46 I        34 K
       4 S        19 M        17 S
       2 V         8 T        15 M
       1 P         2 V         5 T
       1 A         1 F         5 G
       3 V
       2 A

The remaining five major sites are distributed precisely along the clade lines established by V98A of CYTB. These are likely fully heteroplasmically penetrant in both clades and inherited in all descendants. ATP6:I60N can immediately be seen to be a second deleterious change in the A98 clade as asparagine never occurs here in thousands of other species and its polar nature is a substantial change from branched chain aliphatic isoleucine.

ND4:A314T is borderline deleterious -- while a very rare substitution within mammals, it has become established in all Camelidae with available sequence and so is unlikely to be harmful there. However bison and camel ND4 differ at many other sites -- 73 of 459 -- so the status of ND4:T314 in bison is likely sub-neutral but mild phenotypically, additionally as alanine and threonine are not too dissimilar. The peculiar appearance of T314 in an Italian zoo bison of the V98 I60 clade, if not sequence error, suggests lineage sorting and possible heteroplasmic persistence in some A314 bison.

ATP6:I60N   ATP6:T182M   ND5:Y159H   ND4:A314T   ATP6:A177T   ND4:L442M   ATP8:E38K
    531 M        553 S       225 Y       281 A        998 A       221 L       215 E
    392 I        286 M        73 H         8 T         88 T        46 I        34 K
    106 T         98 T                     5 I         4 S         19 M        17 S
     37 V         92 L                     3 V         2 V          8 T        15 M
      6 A         57 I                                 1 P          2 V         5 T
      5 N         10 A                                 1 A          1 F         5 G
      4 L          4 V                                                          3 V
      2 P          2 F                                                          2 A
      S 1          1 M     
      N 1          1 C    

The species sharing the errant variation with bison: note A314 forms a clade within Camelidae

ATP6:I60N                               ND4:T314A
 Panthera tigris          carnivore      Camelus ferus           artiodactyl  
 Erinaceus europaeus      insectivore    Camelus bactrianus      artiodactyl
 Cebus capucinus          primate        Lama glama              artiodactyl
 Cebus albifrons          primate        Lama guanicoe           artiodactyl
 Callicebus donacophilus  primate        Lama pacos              artiodactyl
 Callicebus donacophilus  primate        Vicugna vicugna         artiodactyl
                                         Pontoporia blainvillei  cetacean
                                         Physeter catodon        cetacean

ND5:Y159H           ND4:L442M          ATP8:E38K
11 carnivores        8 carnivores       20 cetaceans
11 artiodactyls      3 bats              4 ruminants
 2 cetaceans         3 carnivores
 1 ruminant          1 rodent

The third class of site variation is defined by bison changes relative to yak and other close-in Bovidae. Changes here affect all bison. They may either be adapative, part of what makes a bison a bison (and not a yak or cow) or less likely sub-normal haplotypes that rose to high frequency because of severe historical bottlenecking of the bison population. (to be continued shortly)

Methods here are important to understand because a vast amount of empirical data is being compressed to a small but important bit of management information -- the healthy haplotypes. The screenshot below illustrates the simple desktop method used for extracting variation at a given site from 5000 Blastp matches to a given protein. In the example, A365T has been previously identified as a site of variation within the alignment of all available bison NADH dehydrogenase subunit 5 (ND5) proteins. Using a bison sequence with A365 as query, output formatting of blastp output is set to "query-anchored with dots for identity".

PhyloStripper.gif

Pasting the relevant section of higher quality sequences (which varies from protein to protein depending on indels and sequence divergence) into the spreadsheet causes its text-extracting formulas (provided below in the methods section) to separate the match into individual columns for accession number and each amino acid. Columns of interest are then processed to obtain frequencies of each of the 20 amino acids at the site under study (here only A and T occur).

Blast output order is by similarity, so it corresponds approximately to phylogenetic distance from bison. However the set of protein accessions corresponding to a particular variant (here A or T) can more precisely processed back at NCBI Entrez for phylogenetic position (indeed tree) using the NCBI Taxonomy extractor. This eliminates over-counting of frequencies attributable to high multiplicities of sequenced individuals of some species. A computer algorithm here is ill-advised since NCBI can and does change formatting practices without notice.

The frequency tables and phylogenetic distribution pattern then determine the interpretation of the amino acid variation. For A365T, a mild physical-chemical change (according to Dayhoff, Blosum, Kyte-Doolittle or Grantham value), both A and T occur in widely dispersed clades of mammals, as do proline, valine and isoleucine, so it follows that the substitution is near-neutral for bison. It may not have arisen by recent mutation but instead may reflect lineage sorting during bison cladogenesis or speciation, simply rising to prominence in the bison studied (V98_MT_GU946987_bHap7) via heteroplasmic amplification in the leucocytes sampled. Consequently it is part of normal natural variation at this site and its elimination should not be a conservation genomics management objective.

Synapomorphies in bison, yak and cattle mitochondrial proteomes

What makes a bison a bison? Synapomorphies (derived characters) are those amino acids invariant within bison but differing from the ancestral value as determined by yak, cow, water buffalo and other pecoran ruminant outgroups. To collect these across the entire mitochondrial proteome (13 proteins, 3790 amino acids in bison), 80 complete mitochondrial genomes were aligned at the protein level, with 102 sites of interest then extracted (below) and grouped by type.

It is important here to include all available bison, yak, cattle and water buffalo data so that bona fide synapomorphies -- rather than sub-clade features or one-off mutations -- are properly defined. (Here 15 cattle genomes and 11 cattle-bison hybrids were chosen from the many available.)

Yak and cattle synapomorphies provide important context to bison so they too are displayed along with synapomorphies applicable to the yak-bison ancestral divergence node. Note cattle have more synapomorphic sites than bison and yak put together, possibly a byproduct of breeding for certain features during their long history of domestication. Sub-clade features of bison and yak, considered above, are also shown along with a couple of bison + yak + cattle synapomorphies relative to water buffalo (which can be determined with less precision because only 4 genomes are available).

These changes may represent adaptive variations that swept across the entire population because of their selective value. Alternatively, they might represent maladaptive alleles that nonetheless rose to high frequency because historic bottlenecks accidentally favored a small subpopulation of animals carrying a mutation (much as seen above for V98a and N10I in bison sub-clades). Another option is simply neutral drift that fixed a particular amino acid from the normal reduced alphabet at that position. Only the maladaptive alleles have relevance to conservation genomics management.

One striking feature of the table is the apparent over-representation by amino acids from the first two columns of the genetic code, especially methionine. Many variations are known for the mitochondrial genetic code, so this raises concerns that bison, yak or cattle might use a slightly differant translation table (making the synapomorphies into artifacts), perhaps only at certain sites along certain proteins. This scenario would be consistent with the many oddities in bison mitochondrial tRNAs reported by the Derr group. Recall here protein sequences have not been experimentally determined but rather inferred from dna (the exceptions being bovine cytochrome b and cytochrome oxidases used in xray crystallography -- which agree with standard table translation).

The first five columns of table below is sorted by over-representation of synapomorphies. For example, ATP8 has five of these in a very short sequence of 66 residues. While ND5 has more synapomorphies (25), it is a much longer protein at 606 residues and so has a lower density of synapomorphies than ATP8 (1.5 vs 2.8). The next three columns show the composition of the bison mitochondrial proteome, sorted according to decreasing occurrence. Note the very low abundances of charged amino acids attributable to the many transmembrane domains which utilize apolar amino acids. Overall, the abundances of amino acids does not resemble that of nuclear-encoded cytoplasmic proteins. The last three columns show abundances taken from the large synapomorphy table below. The top line of that was provided by the Yellowstone bison sequence and dots indicate identity relative to it.

								Bison			Synapomorphy Table
Gene	#AAs	Syn	%syn	%AA	Over		AA	Freq	%		
	3790	102	100	100			3790	100			.	3683
ATP8	66	5	4.9	1.7	2.8		L	593	15.7		T	589
CYTB	379	14	13.7	6.9	2		I	328	8.7		I	487
ATP6	226	11	10.8	6	1.8		T	308	8.1		V	478
ND3	115	5	4.9	3	1.6		S	274	7.2		M	414
ND5	606	25	24.5	16	1.5		M	265	7		L	292
ND2	347	10	9.8	9.2	1.1		A	250	6.6		A	277
ND6	175	5	4.9	4.6	1.1		F	244	6.4		Y	260
ND4	459	12	11.8	12.1	1		G	219	5.8		S	172
ND4L	98	2	2	2.6	0.8		P	191	5		P	89
COX3	260	4	3.9	6	0.7		V	185	4.9		F	81
ND1	318	5	4.9	8.4	0.6		N	164	4.3		H	73
COX2	227	3	2.9	13.6	0.2		Y	132	3.5		N	35
COX1	514	1	1	10	0.1		W	104	2.7		
							H	98	2.6		
							K	99	2.6		
							E	95	2.5		
							Q	87	2.3		
							D	69	1.8		
							R	63	1.7		
							C	22	0.6	
        (The table will paste properly into common desktop spreadsheets if all spaces are replaced with tabs:)
.gene.. A A C N N N N A N C N C A A C C N N C N N N A N N N N N N A N N N N N C N A N N N C N C C N N N N N N N N N A N N A N N N A N C N N A N N C N N C C N C N N N N C N A C A C C N N A A C C N N N N N N N N N C N 
.gene.. T T Y D D D D T D O D Y T T Y Y D D Y D D D T D D D D D D T D D D D D Y D T D D D Y D O O D D D D D D D D D T D D T D D D T D Y D D T D D Y D D Y O D O D D D D O D T O T Y Y D D T T O Y D D D D D D D D D Y D 
.gene.. P P T 4 5 2 2 P 4 X 5 T P P T T 5 5 T 6 2 2 P 4 5 5 3 4 3 P 1 2 2 5 3 T 1 P 5 1 5 T 5 X X 5 5 6 5 5 5 4 4 5 P 4 3 P 2 6 5 P 2 T 4 5 P 5 2 T 2 4 T X 2 X 1 5 6 3 X 4 P X P T T 4 1 P P X T 4 5 4 6 4 5 5 5 5 T 2 
.gene.. 6 6 B . . . . 8 . 2 . B 6 6 B B . . B . . . 8 . . . . . . 6 . . . . . B . 6 . . . B . 2 1 . . . . . . . . . 8 L . 8 . . . 6 . B L . 6 . . B . . B 3 . 3 . . . . 2 . 6 3 6 B B . . 6 8 3 B . . . . . . . . . B . .
..pos.. 6 1 9 3 1 9 3 3 4 9 4 9 1 2 4 3 8 5 2 1 6 2 6 3 6 3 1 1 2 1 6 7 2 2 2 1 1 1 5 9 2 6 1 9 4 1 3 1 4 4 4 1 3 7 4 5 1 3 2 9 3 3 3 2 1 1 6 2 3 1 6 5 7 3 1 4 7 2 1 8 1 1 7 4 1 6 9 4 1 1 2 6 1 2 4 1 1 1 9 2 1 5 1 1 .
..pos.. 0 9 8 1 5 2 2 4 2 4 4 9 9 0 . 9 7 1 1 0 3 8 3 9 2 8 6 8 4 2 7 . 0 1 8 9 7 8 6 2 7 2 2 9 5 7 4 6 3 9 7 6 8 . 1 . 9 5 4 3 9 9 0 4 3 7 3 9 2 0 . 4 3 2 5 0 6 2 4 1 8 0 . 1 9 7 5 4 6 3 8 1 0 3 4 9 1 1 0 4 0 1 9 6 .
..pos.. . 2 . 4 9 . 0 . 1 . 9 . 5 1 . . . 9 5 1 . . . 8 . 3 . 5 . 3 . . 9 . . 0 1 6 . . 2 . 9 . 3 2 6 1 8 9 7 9 2 . . . . . 1 . . . 1 6 . . . . 5 3 . . . . 8 . . . . . 7 0 . . 4 . . 8 6 . . . 2 3 1 2 4 6 . . 9 3 2 2 .
bisBisY N M A T H V T H H F F M V V L M V T M S P V L V I M L V V A A I I M A M M T H Y Y I M A V V M F S T V M M F S H G Y I I T N A T A V P S T M T M T A I T L T I T T L A A I A I S I M I I I I I I I F V L H F I L ATP6_GU947004_bHap17_A98_YP_Bison
bisBisA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU947001_bHap2_A98_BR_Bison
bisBisA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU947000_bHap10_A98_FN_Bison
bisBisA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946994_bHap11_A98_MT_Bison
bisBisV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946988_bHap8_A98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946979_bHap3_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946998_bHap12_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946980_bHap4_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946985_bHap6_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946989_bHap9_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946982_bHap5_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU947006_wHap14_V98_EI_Bison
bisBisV I T V . Y . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_EU177871_bHapX_V98_IT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU946987_bHap7_V98_MT_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU947002_bHap13_V98_TX_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU947003_bHap16_V98_TX_Bison
bisBisV I T V A Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GU947005_wHap15_V98_EI_Bison
bosGruA I T V A Y M M Y Y S L L I I I V M A A F S I S . . . M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464262_hapA_yak
bosGruA I T V A Y M M Y Y S L L I I I V M A A F S I S . . . M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464261_hapA_yak
bosGruA I T V A Y M M Y Y S L L I I I V M A A F S I S . . . M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464257_hapA_yak
bosGruA I T V A Y M M Y Y S L L I I I V M A A F S I S . . . M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464251_hapA_yak
bosGruA I T V A Y M M Y Y S L L I I I V M A A F S I S . . . M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464250_hapA_yak
bosGruA I T V A Y M M Y Y S L L I I I V M A A F S I S . . . M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464249_hapA_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M . . M M I . T T T T T T . . Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464260_hapA_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464266_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464265_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . ATP6_GQ464264_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464263_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464259_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464258_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464256_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464255_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464254_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464253_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464252_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464248_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATP6_GQ464247_hapB_yak
bosGruB I T V A Y M M Y Y S L L I I I V M A A F . . . M V V M M I T T T T T T T T A Y H H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . ATP6_GQ464246_hapB_yak
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947021_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A . T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947020_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947019_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_HM045018_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . F M L T I . I L P I I T V L P Y N H V V A S T A T M S P M L I V A T V M M M M . S S . T T T T T . V . V . V V V V L T . . Y . . ATP6_AF492351_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . F M L T I . I L P I I T V L P Y N H V V A S T A T M S P M L I V A T V M M M M . S S . T T T T T . V . V . V V V V L T . . Y . . ATP6_EU177870_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A . T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L T F . Y . . ATP6_EU177868_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177866_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177864_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177862_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . V . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177860_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177858_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177856_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_EU177854_BosTau
bosTauD I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V . V V V V L I F Y Y . . ATP6_EU177852_BosTau
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947018_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947017_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947016_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947015_BosHyb
bosTauH I T V A Y M . Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947014_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947013_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947012_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947011_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947010_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947009_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947008_BosHyb
bosTauH I T V A Y M M Y Y S L L I I I I M A V F . . . . . . . . . . . . . . . . . . . . . M L T I I I L P I I T V L P Y N H V V A S T A T M S P M L I L A T V M M M M A S S T T T T T T T V V V V V V V V L I F Y Y . . ATP6_GU947007_BosHyb
bubBub. I T V A . . . . . S L L I I I I M A T L S I S A . . . M I . . . . . . A T M . . F M L T I I I L P I I T V L P Y N H V V A S T A . L T N L T F V I I M . . . . . . . . . . . . . . . . . . . . . . . . . . . L M ATP6_NC_006295_Bubalus
bubBub. I T V A . . . . . S L L I I I I M A T L S I S A . . . M I . . . . . . A T M . . F M L T I I I L P I I T V L P Y N H V V A S T A . L T N L T F V I I M . . . . . . . . . . . . . . . . . . . . . . . . . . . L M ATP6_AY702618_Bubalus
bubBub. I T V A . . . Y . . L L I I I I M A T L S I S A . . . M I . . . . . T A T M . . L M L T . I I L P I I T V L P Y N H V V A S T A T M T N L T F V I I . . . . . . . . . . . . . . . . . . . . . . . . . . . . L M ATP6_AY488491_Bubalus
bubBub. I T V A . T . . . S L L I I I I M A T L S I S A . . . M I . . . . . T A T M . . L M L T I I I L P I I T V L P Y N H V V A S T . T M T N L T F V I I M . . . . . . . . . . . . . . . . . . . . . . . . . . . L M ATP6_AF547270_Bubalus
antCer. I T V A Y T . Y Y S . L I I T I M . A L S I S L V . . M I . I L . . T A V . . . L M L T I I I L P I . T V . P Y S H V V . . . S T I . T L . I I V I . . . . V . . . . T . T . . . . . V V . . V . . . S . . L M ATP6_NC_012098_Antilope
ammLer. I T V A Y T M Y . S L L I I I I . M A F S I S L V . . M I . I . . T L T V . . . F M L T I I I L P . I T V L A Y N . . V . . T . . A T Y L . I I I I . M . I M . . . T T . . . . T . . V . . . V . L . . . . L M ATP6_NC_009510_Ammotragus
budTax. T T V A Y M . Y . S . L I I I T T M T F S I S L . . . M I . I . . T L V V . . . L M L T I I I L P I I T V L T Y N . . . . . T . M . T . L . I T L I . . . . M . . . T . T T . . . V . V . . . T . L I . . . L M ATP6_NC_013069_Budorcas
hydIne. I T V A Y . . Y . S . L I . M I M I A V S . . L . . . L I . I L . I T A V . . . F M L T I I I L P I I I V . . Y S F V . . S . S T I T T L L I L I I . I . . L D . . . I . T . . . . V . . . V V . . . S . . L M ATP6_NC_011821_Hydropotes
elaCeo. I T V A Y T . . . S . L I . I I M I S M . . . L . . . L I . I . . T T A V . . . F M L T I I I L P I I I V . L Y . F V . . . . A T I T T L L I . I I . L . . L D . . . I . T . . . . . . V V . V . . . S . . L M ATP6_NC_008749_Elaphodus
panHod. I T V A Y T . Y . S L L I I I I M I A F S I S L V . . M I . I . . . L A . . . . F M L T I I I L P I I T V . L Y . . V . . . . S . M I T L . I T I I . M . . . . . . T I . T . . . . . . . . . T . . . F . . L M ATP6_NC_007441_Pantholops
capCri. I T V A Y M . Y . S L L I I I I M . T F S I S L V . . M I T I . . T L T V . . . F M L T I I I L P I . T . . T Y N . . . . . . . . T V . L . I T I T . M . . L . . . T I . T . . . . . V . . . T . . . . . . L M ATP6_NC_012096_Capricornis
mosBer. I T V A Y . . F . S . L I I I I M . T V S I S . . I . M I . I L . . T A V . . . F M L T I I I L P I . T V . L Y N H A . . . T . T M T T L . I V . I . . . . . . . . . T . T . . T . . . . . . V V L . F . . L M ATP6_NC_012694_Moschus
cerUni. I T V A Y M M . . S . L I I I I M I A M S . . . V . . L I . I . . T T A V . . . F M L T I I I L P I I I V . L Y . F . V . . . A M I A T L L I V I I . M . . L N . . . I . T . . . . . . . . . V . . . S . . L M ATP6_NC_008414_Cervus
ranTar. I T V A Y T . F . S . L I . I I M I S V S I . L . . . L I . I . . T T A V . . . F M L T I I I . L I I I V . . Y . F V M . . . A M I A T L L I V I I . . . . L N . . . I . T T . . V V . . . T V V . . S . . L M ATP6_NC_007703_Rangifer
girCam. I T M A . H . Y . S T L . I I I M S . L . I S . V . . . I . T . . . . A T A . H L T L T I I I . P I I . . L P . N H . . A . T . . I T T L . . V I T . . . . . N . . . I A T . . . T . V . V . . . L I S . . L M ATP6_NC_012100_Giraffa

Steppe bison to the rescue?

SteppeBison.jpg

It is not any great technical feat today to sequence an entire nuclear genome or provide population-scale mitochondrial genomes of late Pleistocene fossils. That would be particularly useful in the case of the steppe bison, Bison priscus, because of its implications for conservation genomics management of contemporary bison herds. Even though a particular frozen carcass may lay off to the side (not literally be a grandparent) of living bison, it still provides a more immediate outgroup than wild yak or aurochs, thus allowing more reliable reconstruction of the last common ancestor of steppe and plains bison. The european bison (wisent) cannot serve this purpose because all examined to date are hybrids carrying cattle mitochondria.

However an individual steppe bison mitochondrial genome might not be representative of the population at the time. Indeed mitochondrial mutations were as common then as now. An ideal situation would arise from a mass die-off (eg flood event), allowing a population level survey of the mitochondrial genomes prevailing in that herd at that time. If these could be supplemented by isolated sequences dating to the same era representing different herds, an accurate picture of prevailing mitochondrial haplotypes could be obtained.

Note an individual chosen for nuclear genome sequencing generates (as byproduct) colossal coverage of mitochondrial dna which is present in great copy number excess. Consequently the haplotype heteroplasmy (differing mitochondrial genomes within the same cell) could be determined. This still does not get at heritable heteroplasmy unless oocytes are dissected out from frozen carcasses and used as source of dna.

Some steppe bison dna has indeed been sequenced already. Unfortunately these studies focused on the hyper-variable D loop which has little relevance to either fitness or conservation proteomics. As of 16 Dec 10, none of the 298 steppe bison GenBank submissions relate to mitochondrial proteins. These range from 313 to 761 bp in length, average 599 bp and so capture together only 5% of the mitochondrial genome. A single short 23S rRNA read from another extinct species Bison antiquus (ancient bison) is available; its genetic relationship to steppe bison and Bison latifrons is muddled.

Number  Type                   PubMed   Date         Authors                    Title
  5     A5630 D-loop           -------- 14-JUL-2010  Chen K Llamas B Cooper A   ---
  1     voucher NWT 984.80     in press 11-JUN-2009  Zazula GD MacKay G         Late Pleistocene steppe bison partial carcass from Tsiigehtchic pdf
 10     BP100 control region   msthesis 04-DEC-2006  Douglas KC Baker LE        Comparing Genetic Diversity of Late Pleistocene Bison with Modern Bison 
  7     IB73 control region    15567864 14-MAR-2008  Shapiro B Cooper A         Rise and fall of the Beringian steppe bison
274     BS163 control region   15567864 14-MAR-2008  Shapiro B Cooper A         Rise and fall of the Beringian steppe bison
 

The alignment of these sequences below, compressed beyond the point of readability to keep file size manageable, nonetheless shows the overall relation of these sequences and the locations of differences relative to the top sequence, chosen arbitrarily as BS249 control region AY748559. The major haplotype groups are readily apparent as blocks of very similar sequences. Near the bottom, 46 plains bison sequences appear, distinguished by a block of gaps and a few systemic differences.

Three aurochsen genomes have survived into contemporary cattle. Could something comparable have happened with steppe bison and plains bison? This would require separation and contemporaneous existence of two separate lineages, followed by introgression into the lineage that descended to today's bison while the other which went extinct. The first condition might have been repeatedly satisfied by glacial barriers and Beringian passages and for the second, steppe bison are assuredly extinct. If this amounts to two steppe bison lineage, one of which we decided to call bison at some point in time, then the question amounts to whether the second lineage is still represented.

This amounts to observing a seriously outlying sequence of bison control region that clusters in the joint phylogenetic tree with steppe bison. That would be very surprising given the tremendous bottleneck experienced in nineteenth century and more plausibly encountered if more diverse plains bison museum specimens were aded. Indeed, the tree below -- which could be refined by pruning sequences to the same length and dropping poor quality entries -- does not show any bison sequence stepping 'out of line'. The indels alone strongly cluster them to the exclusion of steppe bison. The latter has distinct haplotype groups that may reflect introgression of another lineage (not necessarily) after a long separation or simply a long separation of two populations or data from different ages of steppe bison. These issues were addressed in the 2004 paper that collected the bulk of steppe bison data.

SteppD2.jpg

For the nuclear genome, only a single 49 aa of osteocalcin fragment dating from a 55.6 kyr permafrost steppe bison is available. corresponding to gene BGLAP. The protein was sequenced directly by mass spectroscopy so the dna sequence remains unknown. This gene has not been sequenced in either bison or yak. It is only moderately conserved within laurasiatheres. Given that BGLAP has numerous but variably implemented post-translational modifications mediated by vitamin K, it is not a good choice for MS comparative genomics. The sequence does appear accurate however -- cow, not steppe bison, has the odd tryptophan variation.

Steppe bison YLDHGLGAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV Bison priscus
NP_776674    ....W............................................ Bos taurus
NP_001035098 ...P..............R.............................. Ovis aries
NP_001157476 ..................R............................   Sus scrofa
XP_002927002 ...S.......................N...........D......... Ailuropoda melanoleuca
XP_547536    ...S.....V.................N..............Q...... Canis familiaris

Bison bone collagen isotopic values track climate fluctuations and vegetation change in Late Pleistocene and Early Holocene Northern Eurasia and North America
Richards M, Shapiro B, Ditchfield P, Cooper A
Geological Journal (in press 2011)

In summary, steppe bison mitochondrial protein sequences -- though not available as yet -- would be very helpful in establishing wildtype in bison. In particular they could help time the origin of V98A and address the significance of the other sites where contemporary bison differ from the conserved value seen in yak and other Bovidae cytochrome b, namely I4L I39M T67A V215M A246T M316I T349I L353M V372I. The predictions here for steppe bison are I39M, M316 and L353. The rest are lineage-sorted alleles that require population-level sampling.

Interpreting yak CYTB variation

CytoYak.jpg

Yaks are the closest living sister species to bison. Although 15,000 wild yaks still persist, they have been subject to very similar pressures to those experienced by bison: bottlenecks, population fragmentation, introgression from long domesticated yaks and hybridization with cattle. Adaptations specific to mitochondria may exist as yak live at altitudes exceeding 4000 meters with average annual temperatures in rearing areas –8°C, with animals surviving winter temperatures of –40°C.

Because yaks provide the immediate outgroup for bison genetics (and vice versa), their parallel mitochondrial proteomics are investigated in depth here. This further enables reconstruction of mitochondrial proteins of their last common ancestor (after consideration of lineage sorting) and correct placement of Pleistocene genomic sequences.

Data availability for yaks was greatly improved by a Dec 2010 paper by Zhaofeng Wang et al. that investigated yak phylogeographical structure and demographic history on the Qinghai-Tibetan Plateau. Complete mitochondrial genomes were determined for 48 domesticated and 21 wild yaks. The three lineages shown in article supplemental established diverged at 420 kyr and 580 kyr in accordance with extended but temporary allopatric migration barriers created by two large plateau glaciations.

The wild yaks are found in all three branches of the tree (solid circles in figure). Their entries at GenBank are distinguished by a W (for wild) prefix, eg isolate W77 GQ464266. There is potential for confusion here because NCBI taxonomy uses Bos grunniens mutus for wild yak, yet the subspecies concept is contradicted by the mixed distribution of wild and domestic yaks in the mitochondrial tree. Related taxa such as Bos mutus (Przewalski, 1883), Bos mutus grunniens, and Poephagus mutus also conflict with the facts. Yak and bison -- diverging at 2.5 million years -- need to reside in the same genus.

YakPhylo.jpg

The primary focus here are protein polymorphisms in wild yak because domesticated animals may exhibit inbreeding issues and other evolutionary artifacts due to their estrangement from darwinian selection. Consequently it is important to track which GenBank entries reference wild yaks.

Bos grunniens mutus has three GenBank entries relevant to cytochrome b: proteins AAX53006 and AY955226 both containing unique V195A, I348F mutations in an otherwise wildtype background and CAA76015, an older fragmentary wildtype sequence not considered further here. The first two animals add samples to the large, remote Xinjiang province but remain unpublished (Liu,Q Wu,M Li,Y) despite the 27 Mar 2005 submission date at GenBank. (A number of D-loop sequences submitted for this taxon on 19 Jan 2009 by 27-MAR-2005 by Ma,ZJ also remain unpublished.)

The Myanmar/Bhutan mithun sequence BAJ05329 attributed to Bos grunniens at GenBank has 12 differences to wild yak but is 100% identical to 94 Bos indicus entries, ie it is a hybrid and its mitochondrial genome is irrelevant here. Such GenBank errors are all but impossible to correct.

The 21 new genome accessions of wild yak are GQ464266, GQ464265, GQ464264, GQ464263, GQ464262, GQ464261, GQ464260, GQ464259, GQ464258, GQ464257, GQ464256, GQ464255, GQ464254, GQ464253, GQ464252, GQ464251, GQ464250, GQ464249, GQ464248, GQ464247, GQ464246. These were not labelled on the published tree.

In terms of protein accessions (which will be shown at NCBI blastp output), these are ACU81659, ACU81646, ACU81633, ACU81620, ACU81607, ACU81594, ACU81581, ACU81568, ACU81555, ACU81542, ACU81529, ACU81516, ACU81503, ACU81490, ACU81477, ACU81464, ACU81451, ACU81438, ACU81425, ACU81412, ACU81399 to which AAX53006 and AY955226 can be added.

Of these, 16 fall in the main reference sequence group (wildtype) but 5 wild plateau yaks exhibit polymorphisms that cannot be attributed to domestication. As noted, two additional wild yaks from extreme NW China have additional double mutations but no associated PubMed publication nor tissue source indicated. As either change alone would inactivate an essential enzyme, these represent either heteroplasmic oddities or sequence error (to be pursued as other proteins are considered). The remaining sequences were derived from muscle and skin dna.

There is no overlap between wild yak polymorphism sites and the five of domestic yak. Alleles occurring in full length sequences are analyzed further below.

The summary table of yak CYTB amino acid polymorphisms below arises from alignment of 5000 full-length mammalian cytochrome b orthologs. Magenta indicates a deleterious change at an invariant position, red a deleterious mutation at a naturally polymorphic site, green a possibly acceptable change but of restricted distribution and fitness, and blue a near-neutral substitution. Gray is reserved for probable sequencing error. It can be seen that the smallish yak population sampled (72 animals) already contains 5 deleterious alleles in CYTB which represents only 10% of the amino acids of the mitochondrial proteome.

In summary, out of 70 individual yaks, 10 are carrying deleterious mutations at five sites. That seems like an extraordinary number for a central enzyme in energy metabolism for which it is difficult to envision compensation by another gene. Restricting to the 21 wild yaks, 3 have deleterious polymorphism and 1 has a marginal change. Overall 1 in 7 animals is affected just in this one gene. However CYTB is but one of 13 encoded by the mitochondrial genome -- what sort of genetic burden are yaks carrying overall?

1 ACU81568 A017T       wild yak   isolate W50   GQ464259
2 ACU81399 I192T       wild yak   isolate W02   GQ464246
  ACU81633 I192T       wild yak   isolate W75   GQ464264
3 ACU81555 D214N       wild yak   isolate W40   GQ464258
4 AAX53006 V195A I348F mutus      isolate Xinjiang01 unpublished Liu,Q Wu,M Li,Y 
  AAX53007 V195A I348F mutus      isolate Xinjiang02 unpublished Liu,Q Wu,M Li,Y
5 ACU81529 V329M       wild yak   isolate W1313 GQ464256

6 ABI15999 V039I A067T domestic yak              fragment   PUBMED:17257194 Poephagus
7 ABI16000 V039I A067T domestic yak              fragment   PUBMED:17257194 Poephagus
  ACU82153 A084T       domestic yak isolate HY5
8 ACU82101 V098L       domestic yak isolate HY1
9 AAU89116 I118T       domestic yak             =SP:Q5Y4Q0  PUBMED:16942892
  ACU81711 I118T       domestic yak isolate HZ3 
  ACU81737 I118T       domestic yak isolate MQ1
  AAS93096 I118T       domestic yak              fragment   PUBMED:17257194
  AAS93099 I118T       domestic yak              fragment   PUBMED:17257194

Although the mitochondria encodes the usual 20 amino acids, only a subset of chemically similar residues ever appear at a given position in a given protein -- its reduced alphabet. This subset describes the evolutionarily acceptable substitutions that do not significantly disrupt protein functionality. Discovery of this reduced alphabet can be achieved with greater precision the higher the number of available species and individual sequences multiplicities. For mitochondrial proteins, that sensitivity is 1 in 10,000 (0.01% occurrence frequency) for a given amino acid, much better than even the much-studied human nuclear genome.

Interpretive certainty is never attained without experimentation (yeast is a surprisingly informative model system) but improves up to a point with more sequence data. Here it is important to check whether less common substitutions have persisted over evolutionary time in a phylogenetically coherent manner (ie a sub-clade) or are novel adaptations perhaps in conjunction with a co-evolving residue at another site (or another protein, perhaps nuclear-encoded). After these considerations, the remaining rare changes are mostly deleterious (or sequencing error) but rarely adaptive. Polymorphism significance can be pursued at the xray structural level for only 3 of the 13 mitochondrial proteins (CYTB, COX2, COX1) and even this is complicated in the case of CYTB by its oligomeric association with 3 nuclear encoded proteins.

Aligning CTYB from the 72 complete yak mitochondrial genomes available on 1 Dec 10 shows variation at just 9 sites along the protein (ie 9 nsSNPs). These are quickly found when the web alignment tool retains input sequence order, displays residues identical to the top sequence as dots, gaps fragmentary data correctly, and allows a wide display permitting effective cross-species comparisons.

Yak and bison -- despite being sister species -- share variation only at one site, position 98. Here yak is exclusively valine with the exception of a single deleterious occurrence (see below) of leucine, whereas bison have a mix of valine and alanine (which otherwise is very rare at this position in mammals), ie the ancestral residue was valine. Thus no lineage sorting occurred at any amino acid position in CYTB at the time these two species diverged at 2.5 myr. Lineage sorting however may be important in the overall evolution of the Bovini: 53 ancient polymorphisms (at the dna level) are said to have persisted since Bos and Bison diverged from Bubalus 5–8 million years ago.

The changes can also be displayed in context by coloring the appropriate residues in a reference sequence relative to a composite sequence consolidating all the polymorphisms from distinct animals (no one animal has more than two of the 9; V195A + I348F occurs in two animals). The composite sequence is quite useful in comparing polymorphism sites across species as explained in the annotation tricks section.

>CYTB_bosGruR Bos grunniens cytochrome b ref seq taken as gi|147744503 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITAIAMVHLLFLHETGSNNPTGISSDADKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bosGruP Bos grunniens composite polymorphisms: A017T A084T V098L I188T I192T V195A D214N V329M I348F
MTNIRKSHPLMKIVNNTFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHTNGASMFFICLYMHLGRGLYYGSYTFLETWNIGVTLLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITATAMAHLLFLHETGSNNPTGISSNADKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLMADLLTLTWIGGQPVEHPYFIIGQLASIMYFLLILVLMPTAGTIENKLLKW

   wild        dom       dom       dom      wild       dom      wild      wild       dom       
  A017T      A084T     V098L     I118T     I192T     V195A     D214N     V329M     I348F  
  4018S      4994A     4522V     4309I     4353L     4528V     4429D     4610V     4232I
   927A*        3T      430I      667S      505M      427I      512N      188T      651V
    46T         1P       34M       14I       94I*      25T       43E      133A       63T
     3L         1V       11A       1T        31T        4G        8S       44I       45M
     3M                   1L                  3F        4M        2Y       22M        4N
     1F                   1N                  2V        1A        1H        2G        2F
     1P                                       1A                            1E        1A                       
                                              1S        
A017Tphylo.jpg

A017T: At position 98, the mammalian reduced alphabet consists primarily of serine but with yak alanine also well represented at 18%. Threonine occurs in 46 sequences so cannot be sequence error or serious mutation. Bulk seems to be the main criterion at this site rather than polarity -- threonine though polar is bulkier residue than serine or alanine. To determine whether it has arisen multiple times or just in one clade, the phylogenetic distribution of the 46 occurrences needs consideration.

It can be seen from the graphic at left that A017T has arisen multiple times with no common denominator (such as high elevation lifestyle) but -- with the exception of monotremes -- never in a deep stem ancestor. That is, A017T occurs here and there but only in recently speciated clades. This suggests that while not lethal, over time it gets replaced by more adaptive serine or alanine.

A017T
 4018  S
  927  A yak do not have the most common amino acid at position 17
   46  T
    3  L
    3  M
    1  F
    1  P


A084T: At position 84, alanine is strictly invariant. Thus threonine is an unmistakable deleterious mutation in domestic yak.

A084T  
 4994  A
    3  T
    1  P
    1  V

V098L: At position 98, the reduced alphabet consists of valine 90% of the time regardless of mammalian clade with the similar (branched chain aliphatic) isoleucine having substantial dispersed representation at nearly 9%. The 430 species in which it occurs are scattered incoherently within mammal clades, meaning that it has arisen independently many times. V098I may be slightly suboptimal as there is an evident bias (at some level) against equal occurrence. It likely co-exists with valine in most non-bottlenecked populations of mammals, observed if enough individuals of a given species are sequenced.

However leucine, the seemingly similar third aliphatic residue, occurs one once despite being but a single base change transition away from the dominant residue. Were leucine a near-neutral substitution, its incidence would be vastly higher. Thus the change V098L reported for yak represents either a deleterious mutation or an unprecedented adaptation (eg to high altitude) or sequencing error in GenBank entry ACU82101. The same can be said for the more overtly radical change V098N in lemur AAS00156. The 34 methionines occur sporadically in the phylogenetic tree suggesting they are sub-adaptive and blink out over time. Indeed, canine spongiform leukoencephalomyelopathy is attributed to V98M. Dog CYTB is 89% identical to that of yak and numbering corresponds.

V098L
4522	V 
 430	I
  34	M
  11	A bison
   1	L yak
   1	N lemur

I118T: At position 118, the reduced alphabet consists predominantly of ILV with some A and M, a very common occurrence proteomewide. TSF are all deleterious mutations in domestic yak.

I118T
 2597  I 
 1843  L
  404  V
   87  A
   61  M
    6  T (all yak)
    1  S
    1  F

I192T: At position 192 of wild yak, the dominant residue is leucine instead of the yak ancestral value isoleucine, which is disfavored relative to methionine, ie isoleucine is a mild polymorphism in its own right but the associated taxonomy shows it narrowly restricted to 83 sequences in Bos, Bison, and separately in 5 Kobus (waterbucks), too persistent to be dysfunctional and indeed a candidate for adaptive. However change to polar threonine is seen in 31 nominal species but after removal of redundancy, only in two species of pocket mice. Thus the yak change is deleterious.

I192T  
 4353  L
  505  M
   94  I
   31  T
    3  F
    2  V
    1  A
    1  S

V195A: This allele occurs together with I348F in two wild yaks from a remote region in NW China. Despite sequence submission, no article has appeared in the three subsequent years. It can be seen from the reduced alphabet frequencies that this is a severe mutation (as is I348F) so taken together likely sequence just error. No further analysis will be done here until such time as the polymorphisms are confirmed.

V195A  
 4528  V
  427  I
   25  T
    4  G
    4  M
    1  A

D214N: This polymorphism of wild yak is seen quite widely, in some 10% of mammals. The 223 taxa with D214N are mostly confined to laurasiatheres and glires but are not a hallmark of these clades. Nor do the species with asparagine have any common lifestyle denominator. Asparagine is an acceptable variation for aspartate at this site if perhaps not optimal.

D214N  
 4429  D
  512  N
   43  E
    8  S
    4  X
    2  Y
    1  H

V329M: This allele occurs in wild yak. Methionine is not a radical substitution in terms of physical/chemical properties and similar additional amino acids appear at low levels, even though valine occurs in a huge majority of species. Methionine occurs in 17 other species phylogenetically scattered species include Bos javanicus, Ovis, Budorcas, Naemorhedus, Mus, Rattus, bats and sloth. Thus it is likely suboptimal but not significantly deleterious.

V329M  
 4610  V
  188  T
  133  A
   44  I
   22  M
    2  G
    1  E

I348F: This allele occurs together with V195A in two wild yaks from a remote region in NW China. Despite sequence submission, no article has appeared in the three subsequent years. In can be seen from the reduced alphabet frequencies that this is a severe mutation but more likely sequence error, as is V195A.

I348F  
 4232  I348F
  651  V
   63  T
   45  M
    4  N
    2  I348F
    1  A

Human CYTB polymorphism and disease

Polymorphisms and pathogenic mutations disease for human CYTB have been very helpfully compiled by mtDB and MitoMap, with other mammals at OMIA. Fortunately, numbering systems carry over without change to bison and yak since no indels occur in this gene within mammals.

A poor tradition in mitochondrial research allows amino acid changes to be described just by a single nucleotide coordinate relative to the Cambridge Reference Sequence, NC_012920. That requires the user to have a numbered translation via the mitochondrial genetic code showing in-frame amino acids; however the change from the protein perspective (eg V98T) is often conveniently displayed at Uniprot. Coordinates for all mitochondrial features are tabulated here; CYTB extends from position 14747-15887.

Given over 7000 complete human mitochondrial genomes and a high mutation rate, some human polymorphic sites will inevitably overlap with yak and bison alleles. Thus any information about associated human disease at the 16 known disease sites might be transferable. However many rare and obviously dysfunctional alleles were collected for population haplotype mapping and no disease information was collected.

Annotation transfer is vastly complicated by heteroplasmy, experimentalist inability to establish the heritability of the allele, and differences in tissues used to obtain dna for sequencing and so neglects the possibly compensatory effect of changes elsewhere in this or another gene (eg S152P and G291D are suppressed by [compensatory hinge region substitutions in nuclear-encoded rieske protein), a substantial issue in a protein like cytochrome b where 10% of the residues between bovids and human are non-identical and so many proteins participate in Complex III. At such sites (eg H214Y human, D214N yak), transfer of phenotypic information is dubious.

brown in the human allele table below indicates human polymorphisms corresponding to an allele of concern in yak or bison. In two significant cases -- both in domestic yak -- the initial and final residue of human are identical to that of yak, namely A084T* and I118T*. Both are predicted to be deleterious in both human and yak. Unfortunately no clinical information was collected on the human side and the health status of the yaks is unknown (eg level of exercise intolerance).

However even A084T (a strongly invariant site in all mammals) was evidently not early-lethal for its adult human carrier (dna samples are collected from adult volunteers whose health status is not recorded). Here the vast and still unsettled complexities of mitochondrial genomics may come into play:

  • a single mitochondrion may up to 10 replicated copies of its genome which need not be identical
  • cells can carry thousands of mitochondria inherited erratically during embryogenesis and later stem cells
  • dna samples, not being collected from germline cells, may represent non-heritable somatic mutations in restricted descendent cells of the tissue sampled
  • disease onset is often in late adulthood due to the nature of mitochondrial replication and dispersal to daughter cells and so may not be applicable to shorter-lived species

Symptoms of severe heteroplasmic mitochondrial disorders frequently do not appear until adulthood because many cell divisions and much time is required for a cell to receive enough mitochondria containing the mutant alleles to cause symptoms. An example of this phenomenon is Leber optic atrophy (LHON). Affected individuals may not experience vision difficulties until they have reached adulthood. Another example is MERRF syndrome (Myoclonic Epilepsy with Ragged Red Fibers). Heteroplasmy here explains the variation in severity of the disease among siblings. The incidence of heteroplasmy in human mtDNA is unknown, as the number of individuals who have been subjected to mtDNA testing for reasons other than the diagnosis of mitochondrial disorders is small."

The oft-observed disease Leber Hereditary Optic Neuropathy (LHON) is genetically heterogeneous, arising from mutations in other mitochondrial genes (R340H in ND4, A52T in ND1 and M64V in ND6, subunits of complex I of the oxidative phosphorylation chain in mitochondria) as well as from CYTB variants A29T and secondarily D171N and V356M.

tRNA disruptions in bison were analyzed by Douglas et al. Here it is known the human disease MERRF disrupts mitochondrial tRNA-Lys in 80% of cases and so biosynthesis of mitochondrial proteins essential for oxidative phosphorylation. It too is genetically heterogeneous as tRNAs for leucine, histidine, serine and phenylalanine can be affected in other individuals.

human yak

A084T A084T* seen twice in Japanese population
I098V V098L
I118V I118T
I118T I118T* seen once in Japan and once in India
H214Y D214N
A329T V329M

T2A	S56A	I117V	D171N	I211T	G251S	M316T	A354T
T2I	S56L	I118V	D171G	T212A	E251D	Y325H	V356M
M4V	T61A	I118T	S172N	T212I	Y256H	A329T	V356A
M4T	T70A	L121F	P173S	H214Y	T257I	A330T	T360A
R5G	Y75C	A122T	T174A	T219A	L258P	A330V	T360M
I7T	I78V	T123A	F181L	T219I	A259T	I334V	T368A
N8S	I78T	A125T	I184V	I226V	N260D	T336A	T368I
N15S	L82F	E136D	L185S	A229T	V284I	I338V	I369V
H16R	A84T	F140L	I189V	L230F	V291A	P342S	I369T
F18L	G86S	L149M	A190T	L233V	S297P	V343M	I372V
I19M	C93Y	I153T	A190V	F235L	I300T	V343A	M376V
A29T	I98V	Y155H	A191T	L236I	I304T	S344G	A380T
A39T	G101S	I156V	A191D	S238P	I306V	S344N	A380V
A39V	Y109H	I156T	A193T	S238F	I306T	Y345F	----
I42V	E111K	T158A	T194A	T241A	M309V	T348I	----
I42T	T112A	D159N	T194V	T241M	M309T	I349V	----
F50L	W113R	I164V	F199L	T243A	S310P	I349T	----
F50L	I115T	G167S	I211V	F245L	M316V	V353M	----

Of known disease mutations, only V98M corresponds to a bison allele. Disease alleles in blue have been thoroughly studied in yeast.
A29T  LHON Leber hereditary optic neuropathy
G34S  mitochondrial myopathy; sporadic	
S35P  exercice intolerance
V98M  dog leukoencephalomyelopathy
V98L  human polymorphism with unknown consequences	
S151P exercise intolerance	
G166E hyperthrophic cardiomyopathy
D171N secondary LHON
G231D 16026996 mouse	
G251D CMIH
G251S obesity
N255H cardiomyopathy
Y278C multisystem disorder
G290D exercise intolerance
S297P neonatal polyvisceral failure 
G339E mitochondrial myopathy	
V356M secondary LHON
  • Adaptive rates of evolution in all 13 genes from an alignment of 214 mammalian mitochondrial genomes

Cytochrome b mutations in Leber hereditary optic neuropathy CYTB:D171N CYTB:V356M ND5:A458T New mutations were discovered in the apocytochrome b gene in Leber hereditary optic neuropathy probands who did not harbor either of the two known Complex I mutations (positions 3,460 and 11,778). A mutation at position 15,257 was found in eight independent probands which changed a highly conserved D to N, was not found in controls, and appears to be pathogenetically significant. The 15,257 mutation occurred in association with a known synergistic mutation at position 13,708 in 7/8 probands (ie ND5 A458T) and in association with a new apocytochrome b mutation at position 15,812 (ie V356M) in 4/8 probands. Mutations in Complex III genes may be involved in Leber hereditary optic neuropathy and multiple, simultaneous mutations occur frequently.

Mazunin IO (2010) Mitochondrial genome and human mitochondrial diseases. Molecular Biology 44(5) Today there are described more than 400 point mutations and more than hundred of structural rearrangements of mitochondrial DNA associated with characteristic neuromuscular and other mitochondrial syndromes, from lethal in the neonatal period of life to the disease with late onset. The defects of oxidative phosphorylation are the main reasons of mitochondrial disease development. Phenotypic diversity and phenomenon of heteroplasmy are the hallmark of mitochondrial human diseases. It is necessary to assess the amount of mutant mtDNA accurately, since the level of heteroplasmy largely determines the phenotypic manifestation. In spite of tremendous progress in mitochondrial biology since the cause-and-effect relations between mtDNA mutation and the human diseases was established over 20 years ago, there is still no cure for mitochondrial diseases.

Pathogenic mitochondrial DNA mutations in protein-coding genes

Lee-Jun C. Wong PhD Muscle Nerve, 2007

More than 200 disease-related mitochondrial DNA (mtDNA) point mutations have been reported in the Mitomap (http://www.mitomap.org) database. These mutations can be divided into two groups: mutations affecting mitochondrial protein synthesis, including mutations in tRNA and rRNA genes; and mutations in protein-encoding genes (mRNAs). This review focuses on mutations in mitochondrial genes that encode proteins. These mutations are involved in a broad spectrum of human diseases, including a variety of multisystem disorders as well as more tissue-specific diseases such as isolated myopathy and Leber hereditary optic neuropathy (LHON). Because the mitochondrial genome contains a large number of apparently neutral polymorphisms that have little pathogenic significance, along with secondary homoplasmic mutations that do not have primary disease-causing effect, the pathogenic role of all newly discovered mutations must be rigorously established. A scoring system has been applied to evaluate the pathogenicity of the mutations in mtDNA protein-encoding genes and to review the predominant clinical features and the molecular characteristics of mutations in each mtDNA-encoded respiratory chain complex.

S297P homoplasmic in all tissues tested, undetectable in mother PMID: 19563916

Eur J Hum Genet. 2004 Mar;12(3):220-4.

The deleterious G15498A mutation in mitochondrial DNA-encoded cytochrome b may remain clinically silent in homoplasmic carriers.

We report on a patient with severe growth retardation and IgF1 deficiency, in which a mitochondrial abnormality was suspected. An isolated mitochondrial respiratory chain complex III deficiency was found in blood lymphocytes and skin fibroblasts. Sequence analysis of the cytochrome b, which is the only mitochondrial DNA-encoded subunit of complex III, revealed a homoplasmic G15498A mutation, resulting in the substitution of a highly conserved amino acid (glycine 251 into an aspartic acid). The mutation was found to be homoplasmic in all tissues examined from the mother and her brother (lymphocytes, fibroblasts, hair roots and buccal cells). Complex III deficiency was also demonstrated in these cells. Nevertheless, the mother and the brother were asymptomatic. This mutation had been considered as a cardiomyopathy-generating mutation in a previously reported case, and its pathogenicity has been demonstrated recently in yeast. However, it seems not to fulfil the classical criteria for pathogenicity of a mitochondrial DNA mutation, especially the heteroplasmic status, and to be clinically silent, albeit present, in nonaffected relatives. We suggest that other factors are contributing to the clinical variability expression of the G15498A mtDNA mutation.

Mitochondrial DNA mutations cause disease in >1 in 5000 of the population and approximately 1 in 200 of the population are asymptomatic carriers of a pathogenic mtDNA mutation. Many patients with these pathogenic mtDNA mutations present with a progressive, disabling neurological syndrome that leads to major disability and premature death. There is currently no effective treatment for mitochondrial disorders, placing great emphasis on preventing the transmission of these diseases. An empiric approach can be used to guide genetic counseling for common mtDNA mutations, but many families transmit rare or unique molecular defects. There is therefore a pressing need to develop techniques to prevent transmission based on a solid understanding of the biological mechanisms. Several recent studies have cast new light on the genetics and cell biology of mtDNA inheritance, but these studies have also raised new controversies.

Nuclear proteins that raise mitochondrial mutation rates

The genetic stability of mtDNA in every mammal (indeed every eukaryote) depends critically on the accuracy of dna replication. The consequences of any mutation in this machinery would be greatly amplified (like the broomsticks in the Sorcerer's Apprentice) by subsequent somatic errors created in replicating mitochondrial genomes. It is essential to consider these genes given the apparent elevated rate of mitochondrial polymorphism reported for bison and yak.

The nuclear encoded, mitochondrially functioning dna polymerase POLG on chr 15, the catalytic subunit The catalytic subunit (dna polymerase itself, 3’-5’ exonuclease for proofreading, 5’deoxyribosephosphate lyase for base excision repair), deserves special mention in regards to the extraordinary observed rates of yak and bison coding polymorphisms. Some 90 distinct [human disease alleles are known along the 1239 residue protein, causing progressive external ophthalmoplegia, sensory and ataxic neuropathy, Alpers syndrome, and male infertility (see PEOA1, SANDO, AHS, MNGIE at OMIM). POLG also contains a polyglutamine tract near its N-terminus of length 13 in human that may be subject to polymorphic replication slippage.

POLG is accompanied by an accessory dimer of POLG2. Now receiving considerable attention, two mitochondrial disease alleles have been found, G416A and G451E (causing adPEO). A helicase (PEO1 or twinkle) causing an adult-onset progressive external ophthalmoplegia PEO and topoisomerase TOP1MT are other nuclear encoded proteins critical to mitochondrial dna replication. The latter binds a specific site in the D loop control region. These too have been implicated in rare mitochondrial diseases.

These enzymes, especially POLG, needs extensive sequencing in bison and yak (indeed every once-bottlenecked endangered species). That might done economically on a population scale with whole-exome chips rather than sequencing whole genomes. The POLG gene itself is difficult to study in isolation, being comprised of 23 exons spread out over 18490 bp.

No sequencing of yak or bison POLG has been done yet but that of cow, sheep and pig etc are readily retrieved from their respective genome projects. The Bos taurus POLG protein is 90% identical to human; it has not been specifically studied.

Numts: excluding mitochondrial pseudogenes

Mitochondrial research has been plagued by numt pseudogene alleles mistakenly obtained from the nuclear genome by primer cross-over. Here rna transcribed from mitochondrial genes somehow exits the mitochondria, enters the cell nucleus, gets reverse-transcribed into dna, and then gets heritably integrated into the nuclear junk genome (where it generally is not transcribed and rapidly accrues the mutation pattern of a pseudogene), sometimes becoming fixed across the entire population and even diagnostic of it.

This seemingly implausible sequence of events is surprisingly common. Counts for any species with assembled genome can quickly be conducted by Blat at the UCSC genome browser, though very old events would be missed. Querying cow genome for CYTB nuclear pseudogenes shows 19 nuclear genome matches to cytochrome b, ranging from quite strong to barely significant.

The best match occurs on cow chr28:34924178-34924995. Not quite full length 3', it contains 7 internal stop codons, 52 addition missense mutations (that characteristically do not follow site conservation patterns), and various indels and frameshifts. Not particularly recent, its date of formation could be bracketed by examining sheep and pig genomes for an orthologous numt at syntenic location (+psCYTB +SFTPD (or CGN1, bovine conglutinin).

It's not clear pig contains the orthologous pseudogene at chr14:34281258-34281963 because this feature is not immediately syntenic to porcine SFTPD at chr14:85511174-85522800. If so, the most recent CYTB pseudogene in cow predates the divergence of cow and pig. It then will be found in both yak and bison genomes unless lost through large-scale deletion. Note pig has a much more recent CYTB at chr2:104178282-104179415.

The sheep genome is not currently in a satisfactory state of assembly. This is far more likely than pig to contain a demonstrably syntenic CYTB pseudogene. No sheep pseudogenes are posted at GenBank nr nor locatable by tblastn against wgs or hgts databases. Although 31 CYTB pseudogenes from 11 pecoran species are available, these species all lack genome projects. However upon blastn of the cow chr28 feature, Kobus kob (AF052940) and Capra hircus (GU120393) have very strong matches.

Recent numt pseudogenes can capture ancestral values that prevailed in the mitochondria at the time of formation. Unlike bone fossils, this dna has steadily accrued changes up to the present, but the benefit is nuclear pseudogenes evolve up to 12 times slower than the mitochondrial parental gene. Thus it might represent an atypical heteroplasmic allele existing at that time, be affected by lineage sorting, or reflect a parallel nuclear mutation and so not really settle the issue of ancestral value. A joint tree (1, 2) containing both mitochondrial CYTBs and nuclear pseudogenes (as outgroups) considered in chamois (Rupicapra) has many complexities because genes evolve so differantly in the two compartments (for example the pseudogene might have arisen from a heteroplasmic variant that existed at the time).

The yak study specifically considered whether numts could explain divergent, low-frequency mtDNA haplotypes, but ruled out all but the very most recent on the basis of the separate confirmatory D-loop haplotype phylogeny and great similarity to other haplotypes without unusual sequence features.

Recommendations for bison conservation genomics

The deleterious mutation V98A in the mitochondrial gene CYTB has reached high frequencies in North American bison herds, very likely causing mitochondrial disease. Affected bison can be predicted to experience signficant impairment of oxidative phosphorylation and exhibit significant exercise intolerance relative to healthy bison, based on known phenotypes in dogs and human with similar substitutions. Thus the alanine genotype may well be maladaptive with respect to predators, cold winters, and competition with other bison for forage and breeding.

The mutation has nothing to do with whether the herd history includes hybridization with domestic cattle as it occurred in maternally inherited bison mitochondrial dna. It has spread heritably to many animals so has nothing to do today with heteroplasmy.

How is it possible that a bad polymorphism can spread so widely? It has to do with a severely bottlenecked nineteenth century population and a greatly diminished role for natural selection ever since (ie disruptive management practises such as random culling and trophy hunting).

The V98A bison should be considered for priority culling in public and private herds where haplotypes are known in consideration of the overall genetic picture. The mitochondrial mutational situation will eventually be supplemented by that of the nuclear genome. If human variation approximates that of bison, we can expect a bison genome to have 275 coding genes with a dysfunctional allele and another 75 with troubling amino acid substitutions. While this is perhaps a natural level of genetic burden, certain deleterious changes may have attained unnatural frequencies relative to ancestral bison populatioons and thus be targets for reduction:

1000 Genomes Project Oct 2010: On average, each individual nuclear genome carries 275 loss-of-function variants in annotated genes and 75 variants previously implicated in inherited disease [both classes typically heterozygous]. We estimated that an individual typically differs from the reference human genome sequence at 10,488 non-synonymous sites [out of 9,000,000 proteomewide, O.12%]. Each individual has 200 in-frame indels, 90 premature stop codons, 45 splice-site-disrupting variants and 235 deletions that shift reading frame."[small edits made]

Bison also carry various deleterious private alleles. Here it is not possible without sequencing many more mitochondrial genomes whether these simply reflect heteroplasmy in tissues sampled (typically leucocytes, skin or muscle) and are not currently heritable via oocytes. While these mutations adversely affect individual animals, the haplotype frequency appears low. Again it may be feasible to preferentially cull these animals in well-characterized private herds so that the haplotypes do not attain higher frequencies.

In free-ranging bison herds such as Yellowstone where it is impractical to track individual animal genotypes, natural selection must replace random culls. The discussion of whether the Yellowstone herd size of 3900 suffices to maintain genetic diversity over time overlooks a key point -- we don't want to maintain the genetic diversity as it stands. Certain haplotypes are bad and way too common. Even genetically pure bison have been adversely affected by past human actions.

Random culls will never winnow out V98A or other deleterious phenotypes, merely hold haplotype frequencies constant while interfering with beneficial losses that would have come from natural selection. In effect, no management may be the best management -- just as Beaver taught Salmon how to jump, wolves and winter taught bison how to keep aerobically fit.

Recommendations for yak conservation genomics

When yak and bison mitochondrial genomes are sequenced and a polymorphism is reported to GenBank, what does that mean? Presumably it reflects an overwhelmingly dominant value of whatever heteroplasmy existed in the tissue sample used to sequence the dna.

The key bison study used white blood cells as dna source, rather than muscle/skin of yak data. One might imagine this fraction of whole blood is quite heterogeneous in terms of stem cell origin -- five different, diverse leukocyte types exist -- but in fact these all derive from a single hematopoietic stem cell type of bone marrow. Consequently, no other cell types were sampled. Thus we do not know whether the observed polymorphisms are heritable, apart from those observed in multiple animals.

For yak, the observed polymorphisms are again not necessarily heritable even for female individuals (male mitochondria are not passed on). However in the case of yak polymorphisms I118T (domestic) and I192T (wild), multiple individuals (5, 2 respectively) sampled carried the same rare change, strongly implying (unless these are mutational hotspots) that these are entrenched in the germline and so inherited. Oocyte heteroplasmy however is also heritable so wildtype may still persist. The other polymorphisms may be mere somatic mutations that attained abundance in the sampled tissue but are still complemented by residual wildtype. This would have to be pursued in additional tissues or more definitively by sequencing offspring, perhaps not feasible in wild yak.

In summary, even deleterious polymorphisms may have limited effects, depending on stem cell origin and compensation by the wildtype component of heteroplasmy. On the other hand, should a bad alleles exert a negative dominant effect even as the minority allele in the mitochondria in which it resides (eg tainting oligomeric proteins), it would still have a deleterious phenotype even though it never comes to 100% frequency in any particular cell type. Somatic mutations in bison and yak may have limited impacts if onset of disease is delayed to late adulthood as in human. For conservation genomics, we are primarily concerned with heritable mitochondrial mutations, though enhanced levels of somatic mutations (due say to a faulty POLG dna polymerase) are also a concern.

In domestic yak, animals bearing I118T should not be encouraged to reproduce. To be on the safe side, higher frequencies of the other deleterious alleles are also undesirable, even though not quite proven to be heritable.

In wild yak, I192T is the primary cause of concern. It should be avoided if captive breeding comes into play. A017T, D214N, and V329M are not deleterious mutations but rather natural and possibly adaptive parts of yak diversity whose continuation should be encouraged.

These preliminary recommendations are based solely on CYTB. Since only rare recombination occurs in mitochondria (that could bring good alleles on different genes together) and no paternal contribution can dilute out undesirable heteroplasmy, it is unclear how these recommendations can be implemented, much less reconciled from those emerging from independent considerations of the other 12 mitochondrial genes.

Methods: bioinformatic tips and tricks

New sequencing technologies have greatly affected the amount of mammalian mitochondrial genomic data available at GenBank. Five years ago, it was acceptable to publish population-level D loop sequences accompanied by a few fragmentary coding reads; today, a publication might offer 60-70 entire mitochondrial genomes. This favors evolutionary study of mitochondrial proteins over comparative genomics of nuclear genome products because the latter is still restricted to around 50 species (Dec 2010) almost all incompletely sequenced.

Many long-standing issues such as introgression, historic bottlenecks, population mixing, accrual of deleterious coding variants, hard polytomies, and lineage sorting during speciation can now be approached and resolved, especially with the increasing sequencing of end-Pleistocene frozen dna. This may allow more enlightened management of endangered species such as bison where populations reached rock bottom -- recovering numbers is not enough if genomic integrity is still at risk.

However, the flood of data raises significant issues in extraction of significant information: it is not instructive to align the tens of thousands of sequences available for each of 13 mitochondrial proteins -- that give a an intractable array of 3789 amino acids by 12500 sequences, enough to fill 20 x 100 = 2000 screens on the largest possible computer monitor. That data must be distilled down somehow to take-away information.

This section explains a practical desktop protocol for extracting the 'reduced phylogenetic alphabet' at each residue of the mitochondrial proteome. The method depends heavily on current capabilities of Blastp at NCBI and so may not be completely stable to changes made there over time.

First note that tBlastn cannot be used against the nr or wgs nucleotide databases at NCBI (or with Blat at UCSC) since the significantly different genetic code of mammalian mitochondria is no longer supported as a parameter option. Other oddities involve missing terminal nucleotides that are added before translation. However mitochondrial dna is usually translated sensibly at GenBank protein entries.

The vertebrate mitochondrial code:

TTT F Phe      TCT S Ser      TAT Y Tyr      TGT C Cys  
TTC F Phe      TCC S Ser      TAC Y Tyr      TGC C Cys  
TTA L Leu      TCA S Ser      TAA * Ter      TGA W Trp  
TTG L Leu      TCG S Ser      TAG * Ter      TGG W Trp  

CTT L Leu      CCT P Pro      CAT H His      CGT R Arg  
CTC L Leu      CCC P Pro      CAC H His      CGC R Arg  
CTA L Leu      CCA P Pro      CAA Q Gln      CGA R Arg  
CTG L Leu      CCG P Pro      CAG Q Gln      CGG R Arg  

ATT I Ile      ACT T Thr      AAT N Asn      AGT S Ser  
ATC I Ile i    ACC T Thr      AAC N Asn      AGC S Ser  
ATA M Met i    ACA T Thr      AAA K Lys      AGA * Ter  Bos can use ATA as initiation codon
ATG M Met i    ACG T Thr      AAG K Lys      AGG * Ter  

GTT V Val      GCT A Ala      GAT D Asp      GGT G Gly  
GTC V Val      GCC A Ala      GAC D Asp      GGC G Gly  
GTA V Val      GCA A Ala      GAA E Glu      GGA G Gly  
GTG V Val i    GCG A Ala      GAG E Glu      GGG G Gly  

    AAs  = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
  Start  = --------------------------------MMMM---------------M------------
  Base1  = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Base2  = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
  Base3  = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG

Blastp output at NCBI now has a very useful feature: clustering of identical individual sequences into single alignments, display of multiplicities, with all the accessions visible with an extra click. The only exception involves double-counting of UniProt entries which, since that institution conducts no sequencing, always arise from another entry. These entries are poorly formatted and inconsistent in many ways with GenBank protocols but very well done at UniProt itself. This does not affect overall analysis because individual alleles are scrutinized at the end of the procedure.

However the multiplicities are not retained in the output format needed, query-based with dots for identities. This results in a single representative accession for multiplexed matches. However upon pasting accession containing a given allele into GenBank taxonomy, species redundancy is discarded, yield a count of the number of distinct species with that allele as well as a measure of overall multiplicity.

After collecting high resolution amino acid frequencies at a given site, it is necessary to determine the phylogenetic distribution of each variant (in practice just those of moderate occurrence). That is now very convenient to do provided the associated accessions have been saved:

Simply paste the blastp match list of protein accessions having the chosen amino acid variant into the Entrez text query box. Never mind if it only returns 20 out of your 157 input sequences -- it hasn't forgotten. It doesn't matter if the list has redundant entries (typically SwissProt and the protein giving rise to the SwissProt entry). After retrieval, set the "Find Related Data" to "Taxonomy" and wait for the options to load, then click "Find Items".

Miraculously, this returns a page that can be set to display a text phylogenetic tree your input sequences, the full set entered with all redundancy removed. That text tree has labelled higher taxonomic nodes and individual species deeper down. Final edits can be made quickly that capture the phylogenetic spread of the variant allele for interpretive purposes.

The two most common outcomes:

  • all the species carrying the variant comprise a monophyletic clade. If the origin of the clade is fairly ancient, then the variation is a derived informative adaptive change relative to ancestral (synapomorphy). If the site is invariant in all members of the co-clade (meaning the ancestral state has persisted to all other extant species), then the site is a phyloSNP (definition and examples: 1 2 3 4).
  • species carrying the variation are scattered incoherently across the mammalian phylogenetic tree. This means that the variation has arisen multiple times (all fairly recently) but has not persisted when it arose earlier, ie it is not a preferred allele for this protein at this site and gets replaced.

Below is the magic spreadsheet formula that correctly strips NCBI blastp output into individual columns: use exactly as tabbed in excel, iworks, apple numbers etc. with blast output occupying the first column beginning at row 8:

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	46	47	48	49	50	51	52	53	54	55	56	57	58	59	60
	61	62	63	64	65	66	67	68	69	70	71	72	73	74	75	76	77	78	79	80	81	82	83	84	85	86	87	88	89	90	91	92	93	94	95	96	97	98	99	100	101	102	103	104	105	106	107	108	109	110	111	112	113	114	115	116	117	118	119	120
	121	122	123	124	125	126	127	128	129	130	131	132	133	134	135	136	137	138	139	140	141	142	143	144	145	146	147	148	149	150	151	152	153	154	155	156	157	158	159	160	161	162	163	164	165	166	167	168	169	170	171	172	173	174	175	176	177	178	179	180
	181	182	183	184	185	186	187	188	189	190	191	192	193	194	195	196	197	198	199	200	201	202	203	204	205	206	207	208	209	210	211	212	213	214	215	216	217	218	219	220	221	222	223	224	225	226	227	228	229	230	231	232	233	234	235	236	237	238	239	240
	241	242	243	244	245	246	247	248	249	250	251	252	253	254	255	256	257	258	259	260	261	262	263	264	265	266	267	268	269	270	271	272	273	274	275	276	277	278	279	280	281	282	283	284	285	286	287	288	289	290	291	292	293	294	295	296	297	298	299	300
	301	302	303	304	305	306	307	308	309	310	311	312	313	314	315	316	317	318	319	320	321	322	323	324	325	326	327	328	329	330	331	332	333	334	335	336	337	338	339	340	341	342	343	344	345	346	347	348	349	350	351	352	353	354	355	356	357	358	359	360
	361	362	363	364	365	366	367	368	369	370	371	372	373	374	375	376	377	378	379	380	381	382	383	384	385	386	387	388	389	390	391	392	393	394	395	396	397	398	399	400	401	402	403	404	405	406	407	408	409	410	411	412	413	414	415	416	417	418	419	420
=mid(a8,1,14)	=mid(a8,20,1)	=mid(a8,21,1)	=mid(a8,22,1)	=mid(a8,23,1)	=mid(a8,24,1)	=mid(a8,25,1)	=mid(a8,26,1)	=mid(a8,27,1)	=mid(a8,28,1)	=mid(a8,29,1)	=mid(a8,30,1)	=mid(a8,31,1)	=mid(a8,32,1)	=mid(a8,33,1)	=mid(a8,34,1)	=mid(a8,35,1)	=mid(a8,36,1)	=mid(a8,37,1)	=mid(a8,38,1)	=mid(a8,39,1)	=mid(a8,40,1)	=mid(a8,41,1)	=mid(a8,42,1)	=mid(a8,43,1)	=mid(a8,44,1)	=mid(a8,45,1)	=mid(a8,46,1)	=mid(a8,47,1)	=mid(a8,48,1)	=mid(a8,49,1)	=mid(a8,50,1)	=mid(a8,51,1)	=mid(a8,52,1)	=mid(a8,53,1)	=mid(a8,54,1)	=mid(a8,55,1)	=mid(a8,56,1)	=mid(a8,57,1)	=mid(a8,58,1)	=mid(a8,59,1)	=mid(a8,60,1)	=mid(a8,61,1)	=mid(a8,62,1)	=mid(a8,63,1)	=mid(a8,64,1)	=mid(a8,65,1)	=mid(a8,66,1)	=mid(a8,67,1)	=mid(a8,68,1)	=mid(a8,69,1)	=mid(a8,70,1)	=mid(a8,71,1)	=mid(a8,72,1)	=mid(a8,73,1)	=mid(a8,74,1)	=mid(a8,75,1)	=mid(a8,76,1)	=mid(a8,77,1)	=mid(a8,78,1)	=mid(a8,79,1)	=mid(a8,80,1)

Curated reference sequences

The CYTB sequences retrieved from these genomic entries show haplotype notation. The 15 previously existing bison sequences at GenBank (some just fragments) are also provided. Older fragmentary sequences are demonstrably error-prone and will be used here only as support -- never as sole source -- of a polymorphism. Redundancy introduced via non-standard SwissProt (UniProt) entries also has to be manually removed -- the Swiss did no sequencing on their own, simply deriving protein sequences from existing GenBank entries. This leaves 5 older complete sequences for Bison bison and 4 fragments, 2 attributed to Bison bonasus and 1 fossil dna sequence from Bos primigenius to serve as outgroup (rather than an inbred domestic cow).

Here it is necessary to pick a terminology. This must accommodate NCBI taxonomy -- regardless of its correctness -- because otherwise blastp searches cannot be restricted by taxon. Note although bison are definitely sistered with yak to the exclusion of all other extant species, that creates problems because yak has been put in the genus Bos. Many relic wild cattle have no english language common name but rather that of a local language. Terminology table must show synonyms to allow PubMed and google searches -- especially important in a fast-moving field to locate preprints and conference proceedings. The table below does not attempt to implicitly resolve any scientific issue; it simply states preferred terminology at this site along with synonyms in common use.

>CYTB_bisBis.ADF49092 bHap8 plains bison b973 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49170 bHap11 plains bison b1031 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49118 bHap10 plains bison b985 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49248 bHap10 plains bison bFN5 Niobrara A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49131 bHap10 plains bison b1005 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49300 bHap17 plains bison bYNP1586 Yellowstone  A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF48936 bHap2 plains bison b790 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF48949 bHap2 plains bison b853 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF48962 bHap2 plains bison b854 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49001 bHap2 plains bison b880 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49027 bHap2 plains bison b925 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49040 bHap2 plains bison b929 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49157 bHap2 plains bison b1029 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49183 bHap2 plains bison b1050 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49196 bHap2 plains bison b1051 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49261 bHap2 plains bison bNBR1 National Bison Range A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49066 bHap2 plains bison b959 Montana A98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW
 
>CYTB_bisBis.ADF49105 bHap9 plains bison b979 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49209 bHap9 plains bison b1091 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49014 bHap5 plains bison b897 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49079 bHap7 plains bison b961 Montana N3S V98
MTSLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF48975 bHap3 plains bison b855 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49144 bHap3 plains bison b1018 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49222 bHap12 plains bison b1191 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49287 bHap16 plains bison bTSBH1005 Texas State Bison Herd V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49235 bHap13 plains bison b1428 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49274 bHap13 plains bison bTSBH1001 Texas State Bison Herd V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisAth.ADF49313 wHap15 woods bison wEI1 Elk Island V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF48988 bHap4 plains bison b877 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis.ADF49053 bHap6 plains bison b935 Montana V98
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisAth.ADF49326 wHap14 woods bison wEI14 Elk Island V98 V123M
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTMMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis_V98_I42T ABV70945 V98 I42T Bison bison 
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLTLQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis_98A_V132D AAD51424 Bison bison 
MTNLRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGMCLILXILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYDLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bisBis_V98 AAW28804 Bison bison
NFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFW

>CYTB_bisBis_98A AAW28803 Bison bison
NFGSLLGMCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFW

>CYTB_bisBis_98A_Q322R AAL85955 Bison bison
ILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHAGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFI
LPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAILRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSRCLFWTLVADLLTL


>CYTB_bosPriW Bos primigenius gi|190360872|gb|ACE76876 
MTNFRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASVLYFLLILVLMPTAGTIENKLLKW
 
>CYTB_bosPriM Bos primigenius gi|291463835|gb|ADE05539 alleles F004I A023T I372V
MTNIRKSHPLMKIVNNAFIDLPTPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASVLYFLLILVLMPTAGTVENKLLKW

>CYTB_bosSau Bos sauveli AAV51239 
MTNIRKSHPLMKIVNNAFIDLPAPPNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLITVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIAAIAMVHLLFLHETGSNNPTGVSSDVDKIPFHPYYTIKDTLGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILILIPLLHTSKQRSMMFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYTTIGQLASIMYFLLILVLMPTAGTVENKLLKW

>CYTB_bosfroI Bos frontalis ABO07421 (maternal Bos indicus)
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASILYFLLILVLMPTAGTVENKLLKW 

>CYTB_bosFroW Bos frontalis ABO07423 I39V V215A A232T A302I A327T L357M non-hybrid
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITAIAMVHLLFLHETGSNNPTGISSDADKIPFHPYYTIKDILGTLLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILILIPLLHTSKQRSMMFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYITIGQLASIMYFLLILVLMPTAGTVENKLLKW

>CYTB_bosGau1 Bos gaurus ADB80894 V39I A62V Y95H T108P L105P T190M N206I ADB80893 ADB80892 EU878387
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITAIAMVHLLFLHETGSNNPTGISSDADKIPFHPYYTIKDILGTLLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILILIPLLHTSKQRSMMFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYITIGQLASIMYFLLILVLMPTAGTVENKLLKW

>CYTB_bosJav Bos javanicus ABS18295 S29A R80W E110K I121F K375N
MTNIRKSHPLMKIVNNAFIDLPAPPNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLITVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITAIAMVHLLFLHETGSNNPTGVSSDADKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILILIPLLHTSKQRSMMFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYITIGQLASIMYFLLILVLMPTAGTVENKLLKW

>CYTB_bosJav Bos javanicus ABW82495
MTNIRKSHPLMKIVNNAFIDLPAPPNISSWWNFGSLLGVCLILQILTGLFLAMHYTPDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWILMADLLTLTWIGGQPVEHPYITIGQLASIMYFLLILVLMPTAGTVENKLLKW

>CYTB_bosJav Bos javanicus ABW82494
MTNIRKSHPLMKIVNNTFIDLPAPPNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLITVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITAIAMVHLLFLHETGSNNPTGVSSDADKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILILIPLLHTSKQRSMMFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYITIGQLASITYFLLILVLMPTAGTVENKLLKW

>CYTB_bosInd Bos indicus ABO07435 T67I sporadic, differs from Bos taurus at I356V and V372I
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASILYFLLILVLMPTAGTVENKLLKW

>CYTB_bisBon Bison bonasus 295065508 YP_003587278 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTG ISSDTDKIPFHPYYTIKDILGALLLILTLMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILILIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bosTau1 Bos taurus AAM12814 208 instances
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICL
YMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFI
IMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAILR
SIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASVLYFLLILVLMPTAGTIENKLLKW

>CYTB_bosTau2 Bos taurus AAW78524 72 instances V356I I372V
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICL
YMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFI
IMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAILR
SIPNKLGGVLALAFSILILALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASILYFLLILVLMPTAGTVENKLLKW


>CYTB_synCafW Syncerus caffer 5777912 AAD51426 AF036275 
MTHIRKSHPLMKILNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVAHICrDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMIHLLFLHETGSNNPTGISSDTDKIPFHPYYTIKDILGALLLILALMLLVLFSPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALILSILILIIMPLLHTSKQRSMMFRPLSQCLFWILVADLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPTASTIENNLLKW

>CYTB_synCafP Syncerus caffer 1813355 BAA11624 H3N T56S I295V
MTNIRKSHPLMKILNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYSSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMIHLLFLHETGSNNPTGISSDTDKIPFHPYYTIKDILGALLLILALMLLVLFSPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILIIMPLLHTSKQRSMMFRPLSQCLFWILVADLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPTASTIENNLLKW

>CYTB_bubBubW Bubalus bubalis ACF17726 
MTNIRKSHPLMKILNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFAVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPTGISSDTDKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHTSKQRSMMFRPFSQCLFWILVANLLTLTWIGGQPVEHPYIIIGQLASITYFLLILVLMPTASMIENNLLKW

>CYTB_bubBubP Bubalus bubalis ABR08397 
MTNIRKSHPLMKILNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFAVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPTGISSDTDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHTSKQRSMMFRPFSQCLFWILVANLLTLTWIGGQPVEHPYIIIGQLASITYFLLILVLMPTASMVENNLLKW

>CYTB_traScr1 Tragelaphus scriptus AF036277
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTMTAFSSVTHICRDVNHGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPTGIPSDMDKIPFHPYYTIKDILGALLLILILMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVFSILILILMPLLHTSKQRSMMFRPLSQCLFWILAADLLTLTWIGGQPVEHPYIIIGQLASIMYFLIILVLMPATSMIENSFLKW

>CYTB_traScr2 Tragelaphus scriptus AAD13501 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTWDTMTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPTGIPSDMDKIPFHPYYTIKDILGALLLILILMLLVLFAPDLLGDPDNYAPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHTSKQRSMMFRPLSQCLFWILAADLLTLTWIGGQPVEHPYIIIGQLASIMYFLIILVLMPAVSMIENNLLKW

>CYTB_traScr3 Tragelaphus scriptus non-sporadic alleles S159N A190T M205T A232V L234M I238T V243T F296L I302V lower case
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTMTAFSSVTHICRDVNHGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTnLVEWIWGGFSVDKATLTRFFAFHFILPFIItALAMVHLLFLHETGSNNPTGIPSDtDKIPFHPYYTIKDILGvLLLILtLMLLtLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVlSILILvLMPLLHTSKQRSMMFRPLSQCLFWILAADLLTLTWIGGQPVEHPYIIIGQLASIMYFLIILVLMPATSMIENSFLKW

>CYTB_traEur Tragelaphus eurycerus (bongo) AAD51427
MINIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFTGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIITALAMVHLLFLHETGSNNPTGISSNMDKIPFHPYYTIKDILGALLLILTLMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHMSKQRSMMFRPLSQCLFWILAADLLTLTWIGGQPVEHPYIIIGQLASIMYFLIILVLMPVTSMIENNFLKW

>CYTB_traStr Tragelaphus strepsiceros (greater kudu) AAD51431 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYVHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLVLALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILIFLPLLHTSKQRSMMFRPLSQCLFWILVADLLTLTWIGGQPVEHPYMIIGQLASIMYfLLILVLMPVTSMIENNFLKW

>CYTB_traImb Tragelaphus imberbis  (lesser kudu) AAD13498 
MINIRKSHPLMKIVNNAFIDLPTPPNISSWWNFGSLLGICLVLQILTGLFLAMHYTSDTMTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALALVHLLFLHETGSNNPTGISSDTDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALILTILMPILMPLLHASKQRSMMFRPLSQCLFWILVADLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPMAGSIENNLLKW

>CYTB_traOry Tragelaphus oryx (eland) AAD13491 
MTNIRKSHPLMKIVNNAFIDLPTPSNISSWWNFGSLLGICLTLQILTGLFLAMHYTSDTTTAFSSVTDICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPTGISSDTDKIPFHPYHTIKDILGALLLILTLMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHTSKQRSMMFRPLSQCLFWVLAADLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPVASMIENNFL

>CYTB_traAng Tragelaphus angasii (nyala) AAD42706 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTMTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNVGVILLFMVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITALVMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMVLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHMSKQRSMMFRPLSQCLFWLLVADLLTLTWIGGQPVEHPYIIIGQLASIIYFLLILVLMPVISTIENNLLKW

>CYTB_traSpi Tragelaphus spekii (sitatunga) CAA10935 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFIFPFIIAALAMVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGVLLLILTLMLLVLFAPDLLGDPDNYTPANPLITPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVLSILILILMPLLHVSKQRSMMFRPLSQCLFWILAADLLTLTWIGGQPVEHPYIIIGQLASIMYFLIILVLMPATSMIENNFLKW

>CYTB_traDer Taurotragus derbianus (giant eland) AAD13496 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGMYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTSLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAIVHLLFLHETGSNNPTGISSDMDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRLIPNKLGGVLALVLSILVLMLMPLLHTSKQRSMMFRPLSQCFFWILAADLLTLTWIGGQLVEHPYIIIGQLASIMYFLLILVLMPVASMIENNLLKW


>CYTB_bseTra Boselaphus tragocamelus CAA10934
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTMTAFASVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMIHLLFLHETGSNNPTGISSDADKIPFHPYYTIKDILGALLLILALMMLVLFAPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVMALVLSILILILMPLLHTSKQRSMMFRPLSQCMFWILVANLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPTASMIENNLLKW