Opsin evolution: Cytoplasmic face

From genomewiki
Revision as of 14:57, 26 January 2009 by Tomemerald (talk | contribs) (New page: == Comparative genomics of the cytoplasmic face of GPCR proteins == The cytoplasmic 'face' of opsin (or any GPCR) is presumably responsible for all interactions with downstream signal re...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Comparative genomics of the cytoplasmic face of GPCR proteins

The cytoplasmic 'face' of opsin (or any GPCR) is presumably responsible for all interactions with downstream signal relaying partners because these latter are cytoplasmic proteins having no access to extracellular loops or transmembrane segments. Here it must be noted that ligand photoisomerization and release from Schiff base deep within the transmembrane region must drive a significant change in conformation in the cytoplasmic face that differentiates inactive from active states.

The cytoplasmic face is comprised of three loops and carboxy terminus. For bioinformatic purposes, it is convenient to 'reorganize' each linear protein sequence into its intracellular, membrane and outer regions for separate consideration. This is done below for the cytoplasmic face of 500 curated opsins from each of the 18 vertebrate opsin orthology classes using multiple representatives for each phylogenetic node and intense bracketing at eras of change (eg between DRY and GRY opsins of RGR class). A range of non-opsin GPCR are included to define properties common to all members of this large gene family (not specific to opsins).

The two critical goals in GPCR research are to determine the natural ligands (which largely concerns the extracellular and transmembrane regions) notably for orphan receptors and to determine their specific Galpha signaling partner among the 16 such paralogs in the vertebrate genome. For the 18 orthology classes of vertebrate opsins, the ligand is already known (11-cis retinal or related) but the signaling partner is generally not. As an example, does RGR opsin signal, to what purpose, and what is the meaning of the abrupt shift in the DRY motif to GRY in boreoeutheres?

Cytoplasmic loop C2 at 18 opsin genetic loci
           
           DRY loop motif       transmemb L  7 9 signaling
ENCEPH_hom ERYIRVVHARVINFSW     AWRAITYIW 16 V A G?
RGR_homSap GRYHHYCTRSQLAWNS     AVSLVLFVW 16 C R G?
RGR2_gasAc DRYHQYCTRQKLFWST     TLTMSAIIW 16 C R G?
RHO1_homSa ERYVVVCKPMSNFRFGENH  AIMGVAFTW 19 C P GNAT1
RHO2_galGa ERYIVVCKPMGNFRFSATH  AMMGIAFTW 19 C P GNAT2
SWS2_ornAn ERFLVICKPLGNLSFRGTH  AIFGCAATW 19 C P GNAT2
PIN_galGal ERYVVVCRPLGDFQFQRRH  AVSGCAFTW 19 C P G?
SWS1_homSa ERYIVICKPFGNFRFSSKH  ALTVVLATW 19 C P GNAT2
LWS_homSap ERWMVVCKPFGNVRFDAKL  AIVGIAFSW 19 C P GNAT2
VAOP_galGa ERYIVICRPVGNMRLRGKH  AAQGIAFVW 19 C P Gt
PARIE_utaS ERYNVVCQPLGTLQMSTKR  GYQLLGFIW 19 C P Gd+Go
PPIN_xenTr DRVFVVCKPMGTLTFTPKQ  ALAGIAASW 19 C P Gt
PER_homSap DRYLTICLPDVGRRMTTNT  YIGLILGAW 19 C P Go
NEUR1_homS DRYLKICYLSYGVWLKRKH  AYICLAAIW 19 C L G?
NEUR2_galG VCCLKICFPAYGNRFRRKH  GQILIACAW 19 C P G?
TMT_monDom ERYRTL-TLCPGQGADYQK  ALLAVAGSW 19 - L G?
MEL1_homSa DRYLVITRPLATFGVASKRR AAFVLLGVW 20 T P Gq
MEL2_anoCa DRYCVITKPLQSIKRTSKKR TCIIIVFVW 20 T P Gq

While it might seem straightforward to thread any opsin onto its best fit among the five newly available crystallographic structures, that does not work for distantly related paralogs beyond the universal 7-transmembrane feature because loop regions can be of quite different length and so lack discernable alignability, having diverged greatly in amino acid sequence (even though they are all ultimately homologous). While these structures entail various compromises (to enable stable crystallization), they are hugely important to annotation transfer of sequence/function relationships via comparative genomics. Yet most of the 18 vertebrate opsin orthology classes have only remote models to date:

Gene           PDB            Protein                     PubMed      Best human opsin   Next Best         Signaling

RHO1_bosTau    1JFP 3C9M 2J4Y bovine rod rhodopsin        17825322  RHO1_homSap 93%   SWS1_homSap   45%  Gt GNAT1 raises cGMP
MEL1_todPac    2Z73 2ZIY      squid melanopsin            18480818  MEL1_homSap 43%   PER1_homSap   30%  Gq GNAQ? inositol trisphosphate
ADORA2A_homSap 3EML           adenosine receptor 2A       18832607  MEL1_homSap 27%   ENCEPH_homSap 27%  Gs GNAT3 raises cAMP
ADRB1_melGal   2VT4           beta 1 adrenergic receptor  18594507  MEL1_homSap 29%   ENCEPH_homSap 25%  Gs GNAT3 raises cAMP
ADRB2_homSap   2R4R           beta 2 adrenergic receptor  17962520  MEL1_homSap 28%   PER1_homSap   29%  Gs GNAT3 raises cAMP

It has not proven feasible to predict loop conformations ab initio or from peptide libraries; it is folly to consider individual loop structure in isolation (rather than the cytoplasmic face in its entirety) or fail to specify the activation state being computed. Any predicted structure and special roles for individual residues be consistent with the comparative genomics of close and even distant orthologs because binding relationships to Galpha and other proteins do not change rapidly in evolutionary time (as seen from heterologous substitution experiments). Even when a cytoplasmic loop seems to lack a definable structure, individual residues can be conserved over vast branch length times. That conservation must ultimately be explained.

OpsinCyto3D.jpg

Two new high resolution structures of squid melanopsin establish that the cytoplasmic face is not structurally homologous as a whole across paralogous opsin classes. We knew this already from comparative genomics alone but not specifically why. The xray structure exhibits unprecedented rigid extensions of transmembrane helices 5 and 6 of order 25 angstroms out into the cytoplasm, greatly constraining the intermediate residues of cytoplasmic loop C3. The proximal carboxy terminus also contributes importantly to the overall structure here.

The squid melanopsin structure, used at SwissModel, could readily predict the structure of the cytoplasmic face of all opsins of melanopsin class, of which 48 vertebrate sequences, 9 lophotrochozoan, 43 arthropod, and 1 cnidarian sequences are available here. The Gq signalling partner will be used throughout these melanopsins, yet what features the Galpha protein specifically recognizes in the cytoplasmic face remains obscure. It cannot really be the helical extensions per se because the Gq protein is structurally still homologous to its 15 paralogs (in vertebrates) of different signaling types.

The second cytoplasmic loop

In squid melanopsin, first six residues of cytoplasmic loop C2 also form an extensional helix in squid melanopsin beginning with the DRY motif and surprisingly terminating three residues before the deeply conserved proline (normally a helix breaker as in adrenergic receptors). This proline alone cannot define the two states through its cis and trans configurations because glycine or leucine can also characterize whole opsin orthology classes at this position. The last 3 residues of basic character HRR of loop C2 also preface a transmembrane helix as RAR do in turkey receptor.

Cytoplasmic loop C2 has conserved length of 16-20 in all opsins with much more rigid constraint within individual opsin classes (eg all vertebrate imaging opsins have length 19. The structure of the C2 loop of over 100 melanopsins can readily be modelled based on its closest match among the determined structures. Because adrenergic loop C2 is a structural outgroup, yet has a very similar fold, means all opsin C2 loops have a very similar structure.

The adenosine and adrenergic receptor structures -- however useful they might be for annotation transfer to the other 350 non-oderant human GPCR -- ultimately will not prove helpful to modeling the second cytoplasmic loop of opsins (squid melanopsin does that better already). Note C2 in these three structures is consistently stablized by a mid-loop hydrogen bond to the DRY residues. This constraint is not observed in squid melanopsins; indeed it is not feasible because no hydrogen bond-capable residue occurs there (in the comparative genomics sense of conserved residue).

OpsinCyto2Five.jpg


The second cytoplasmic loop in melanopsin

MelSecStr.jpg


   Cytoplasmic loop C2 from 101 Melanopsins

species    helix bridge area  hel transmemb Le 7 9
MEL1_homSa DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_panTr DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_gorGo DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_ponAb DRYLV ITRPLATIGVAS KRR AAFVLLGVW 20 T P
MEL1_rheMa DRYLV ITRPLATIGVAS KRR AAFVLLGVW 20 T P
MEL1_calJa DRYLV ITRPLATIGVAS TKR AAFVLLGVW 20 T P
MEL1_micMu DRYLV ITRPLASVGTAS KRR AGLVLLGVW 20 T P
MEL1_otoGa DRYLV ITRPLTTVGVAS KRR AALVLLGVW 20 T P
MEL1_musMu DRYLV ITRPLATIGRGS KRR TALVLLGVW 20 T P
MEL1_ratNo DRYLV ITRPLATIGMRS KRR TALVLLGVW 20 T P
MEL1_nanEh DRYLV ITRPLATIGVAS KRR TALVLLGVW 20 T P
MEL1_phoSu DRYLV ITRPLATIGMGS KRR TALVLLGIW 20 T P
MEL1_dipOr DRYLV ITRPLATIGVTS KRR TAFVLLGVW 20 T P
MEL1_cavPo DRYLV ITRPLATIGVAS KRQ AALVLLGVW 20 T P
MEL1_speTr DRYLV ITRPLATIGMAS KKR AAFFLLGVW 20 T P
MEL1_oryCu DRYLV ITRPLAAVGMVS KKR AGLVLLGVW 20 T P
MEL1_ochPr DRYLV ITRPLAAVGMVS KRR TGLVLLGVW 20 T P
MEL1_bosTa DRYLV ITRPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_turTr DRYLV ITRPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_susSc DRYLV ITHPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_equCa DRYLV ITRPLATVGVVS KRW AALVLLGIW 20 T P
MEL1_felCa DRYLV ITHPLATIGVVS KRR AALVLLGVW 20 T P
MEL1_canFa DRYLV ITHPLAAVGVVS KRR AALVLLGVW 20 T P
MEL1_myoLu DRYLV ITRPLA-IGVVS KRR AALVLLGVW 19 T P
MEL1_pteVa DRYLV ITRPLAAIGVVS KRR AALVLLGVW 20 T P
MEL1_eriEu DRYLV ITRPLATIGVVS KRR VALVLLGVW 20 T P
MEL1_loxAf DRYLV ITRPLATIGVVS KRR AALVLLGIW 20 T P
MEL1_proCa DRYLV ITRPLATIGVVS KRR TALVLLGTW 20 T P
MEL1_echTe DRYLV ITRPLATIGVVS KRR AALVLLVIW 20 T P
MEL1_smiCr DRYFV ITRPLASIGMIS KKK TGLILLGVW 20 T P
MEL1_monDo DRYFV ITRPLASIGVIS KKK TGFILLGVW 20 T P
MEL1_ornAn DRYFV ITRPLASIGVIS KKR ALLILTGVW 20 T P
MEL1_anoCa DRYFV ITRPLASIGAMS TKK ALLILSGVW 20 T P
MEL1_taeGu DRYFV ITKPLASVGVTS KKK ALIILVGVW 20 T P
MEL1_galGa DRYFV ITKPLASVRVMS KKK ALIILVGVW 20 T P
MEL1_xenTr DRYFV ITRPLTSIGVMS KKR AVLILSGVW 20 T P
MEL1_danRe DRYFV ITRPLASIGVLS QKR ALLILLVAW 20 T P
MEL1_danRe DRYFV ITRPLASIGVMS RKR ALLILSAAW 20 T P
MEL1_takRu DRYFV ITRPLTSIGVLS RKR AFVILMTVW 20 T P
MEL1_gasAc DRYFV ITRPLTSIGMMS RRR ALLILMGAW 20 T P
MEL1_oryLa DRYFV ITRPLTSIGVLS RKR ALLILSAAW 20 T P
MEL1_calMi DRYFV ITRPLASIGVLS HRR AGLIILSLW 20 T P
MEL1_petMa DRYLV LTRPLASIGAMS KRR AMYITAAVW 20 T P
MEL2_galGa DRYLV ITKPLRSIQWTS KKR TIQIIAAVW 20 T P
MEL2_anoCa DRYCV ITKPLQSIKRTS KKR TCIIIVFVW 20 T P
MEL2_xenLa NRYIV ITKPLQSIQWSS KKR TSQIIVLVW 20 T P
MEL2_danRe DRYLV ITKPLQTIQWNS KRR TGLAILCIW 20 T P
MEL2_tetNi DRYVV ITKPLQTIRRSS KRR TALAILMVW 20 T P
MEL2_gasAc DRYLV ITKPLQAIHWGS KRR TTLAILLVW 20 T P
MEL1_plaDu DRFYV ITNPLGAAQTMT KKR AFIILTIIW 20 T P
MEL1_capCa DRYMV IAKPFYAMKHVS HKR SLIQIILAW 20 A P
MEL1_helRo DRYLV VGQPLAMLNQSH FRR SFYHVLIIW 20 G P
MEL1_todPa DRYNV IGRPMAASKKMS HRR AFIMIIFVW 20 G P
MEL1_schMe DRYFV IAQPFQTMKSLT IKR AIIMLVFVW 20 A P
MEL2_schMa DRYLV IATPFESVFQTT PRR TLLLMLFLW 20 A P
MEL1_lotGi DRYLV ITSPFTAMRNMT HKR AFLMIVGVW 20 T P
MEL1_sepOf DRYNV IGRPMAASKKMS HRR AFLMIIFVW 20 G P
MEL1_entDo DRYNV IGRPMAASKKMS HRR AFLMIIFVW 20 G P
UVV_camAb  DRYST IARPLDGKLS   RGQ VLLLIMLIW 18 A P
UVV_catBo  DRYST IARPLDGKLS   RGQ VILLIALIW 18 A P
UVV_apiMe  DRYST IARPLDGKLS   RGQ VILFIVLIW 18 A P
BLU_apiMe  DRYRT ISCPIDGRLN   SKQ AAVIIAFTW 18 S P
BLU_ DRoMe DRYKT ISNPIDGRLS   YGQ IVLLILFTW 18 S P
BLU_manSe  DRYKT ISSPLDGRIN   TVQ AGLLIAFTW 18 S P
UVV1_droMe DRYNV ITKPMNRNMT   FTK AVIMNIIIW 18 T P
UVV1_pedHu DRCET ITNPL-QKSG   KKK AFLLAAFTW 18 T P
UVV_manSe  DRHST ITRPLDGRLS   EGK VLLMVAFVW 18 T P
UVV_papXu  DRHST ITRPLDGRLS   RGK VLLMMVCVW 18 T P
UVV2_droMe DRFNV ITRPMEGKMT   HGK AIAMIIFIY 18 T P
UVV2_pedHu DRYQV IVHPLER-KT   KAA VYFQILLIW 18 V P
LWS_nemVe  DRYIV IVHPMKKIMT   RKK AALMIVGVW 18 V P
LWS_pedHu  DRYNV IVKGLSAKPMT  IKM ALLNILFVW 19 V G
LWS_vanCa  DRYNV IVKGIAAKPLT  ING AMLRVLGIW 19 V G
LWS_papXu  DRYNV IVKGIAAKPMT  ING ALLRILGIW 19 V G
LWS_helSa  DRYNV IVKGIAAKPMT  ING ALLRVFGIW 19 V G
LWS_pieRa  DRYNV IVKGIAAKPMT  INS ALLRILGVW 19 V G
LWS_manSe  DRYNV IVKGIAAKPMT  SNG ALLRILGIW 19 V G
MWS2_droMe DRYNV IVKGINGTPMT  IKT SIMKILFIW 19 V G
LWS_rhoPr  DRYNV IVKGISAKPMT  NKT AMLRILLVW 19 V G
LWS_meoOe  DRYNV IVKGISGTPLS  QKN TTLQVLFVW 19 V G
LWS_catBo  DRYNV IVKGLSAKPMT  ING ALLRILGIW 19 V G
LWS_schGr  DRYNV IVKGLSAKPMT  NKT AMLRILFIW 19 V G
LWS_triCa  DRYNV IVKGLSAQPLT  KKG AMLRILIIW 19 V G
LWS2_apiMe DRYNV IVKGLSGKPLS  ING ALIRIIAIW 19 V G
LWS_bomTe  DRYNV IVKGLSGKPLT  ING ALLRILGIW 19 V G
MWS_calEr  DRYNV IVKGMAGQPMT  IKL AIMKIALIW 19 V G
MWS1_droMe DRYQV IVKGMAGRPMT  IPL ALGKIAYIW 19 V G
LWS_droMe  DRYCV IVKGMARKPLT  ATA AVLRLMVVW 19 V G
LWS_arcGr  DRYNV IVKGVAAEPLT  SKG ASIRILFVW 19 V G
LWS_eupSu  DRYNV IVKGVAATPLT  NKG AFARNIFSW 19 V G
LWS_camLu  DRYNV IVKGVAGEPLS  TKK ASLWILTVW 19 V G
LWS_proMi  DRYNV IVKGVAGEPLS  TKK ASLWILIVW 19 V G
LWS_holCo  DRYNV IVKGVSAEPLT  SGG AMMRIAGTW 19 V G
LWS_homGa  DRYNV IVKGVSATPLT  TNG AMLRNLFSW 19 V G
LWS_neoAm  DRYNV IVKGVSGEPLT  NSG AMTRIAGTW 19 V G
LWS_neoOe  DRYNV IVKGVSGKPLS  QKN ATLQVLFVW 19 V G
LWS_mysDi  ERYNV IVKGVSSKPLS  VKG AITRIVLTW 19 V G
LWS1_apiMe DRYNV IVKGMSGTPLT  IKR AMLQILGIW 19 V G
LWS_limPo  DRYNV IVRGMAAAPLT  HKK ATLLLLFVW 19 V G
LWS_limPo  DRYNV IVRGMAAAPLT  HKK ATLLLLFVW 19 V G
LWS_ixoSc  DRYNV IVRGVAAAPLT  HKR AALMIFFVW 19 V G
ADRB2_homS DRYFA ITSPFKYQSLLT KNK ARVIILMVW 20 T P
ADRA2A_hom DRYWS ITQAIEYNLKRT PRR IKAIIITVW 20 T A
ADRA2C_hom DRYWS VTQAVEYNLKRT PRR VKATIVAVW 20 T A
HTR1A_homS DRYWA ITDPIDYVNKRT PRR AAALISLTW 20 T P
CHRM1_homS DRYFS VTRPLSYRAKRT PRR AALMIGLAW 20 T P
DRD2_homSa DRYTA VAMPMLYNTRYS KRR VTVMISIVW 21 A P
TAAR9_homS DRYIA VTDPLTYPTKFT VSV SGICIVLSW 20 T P
ADRA2B_hom DRYWA VSRALEYNSKRT PRR IKCIILTVW 20 S A

Reference collection of structurally determined GPCR

>RHO1_bosTau cow rod rhodopsin
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAI
ERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWL
PYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA*

>MEL1_todPac Todarodes pacificus (squid) Gq X70498 480 11106382 Mollusca 'squid rhodopsin' 3D: May 2008 Cys 337 palmitoyled
MGRDLRDNETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKKWIFGFAACKVYGFIGGIFGFMSIMTMAMISI
DRYNVIGRPMAASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFDYISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHEKEMAAMAKRLNAKELRKAQAGANAEMRLAKI
SIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEDDKDAETEIPAGESSDAAPSADAAQMKEMMAMMQKMQQQQAAYPPQGY
APPPQGYPPQGYPPQGYPPQGYPPQGYPPPPQGAPPQGAPPAAPPQGVDNQAYQA*

>ADRB1_melGal turkey Beta 1 adrenergic receptor with stabilising mutations And bound cyanopindolol
MGAELLSQQWEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAI
DRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREA
KEQIRKIDRASKRKRVMLMREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSPDFRKAFKRLLAFPRKADRRLHHHHHH*

>ADRB2_homSap beta 2 adrenergic receptor 365 aa  
MGQPGNGSAFLLAPNRSHAPDHDVTQQRDEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAV
DRYFAITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQEAINCYANETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQLQKIDKSEGRFHVQNLSQVEQDGRTGHGL
RRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCLRRSSLKAYGNGYSSNGNTGEQSG*

>ADORA2A_homSap adenosine adrenergic receptor 2A
MPIMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTGFCAACHGCLFIACFVLVLTQSSIFSLLAIAI
DRYIAIRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKEGKNHSQGCGEGQVACLFEDVVPMNYMVYFNFFACVLVPLLLMLGVYLRI
FLAARRQLKQMESQPLPGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFR
KIIRSHVLRQQEPFKAAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGSAPHPERRPNGYALGLVSGGSAQESQGNTGLPDVELLSHELKGVCPEPPGLDDPLAQDGAGVS*