MARCOS TADEU GERALDO Caracterização in silico dos mecanismos de interação entre sequências de localização nuclear e Importina-α Botucatu-SP 2016 MARCOS TADEU GERALDO Caracterização in silico dos mecanismos de interação entre sequências de localização nuclear e Importina-α Tese apresentada ao Instituto de Biociências, Campus de Botucatu, UNESP, para obten- ção do título de Doutor no Programa de Pós-Graduação em Ciências Biológicas (Ge- nética). Universidade Estadual Paulista – UNESP Instituto de Biociências de Botucatu Programa de Pós-Graduação em Ciências Biológicas (Genética) Orientador: Ney Lemke Coorientadora: Agnes Alessandra Sekijima Takeda Botucatu-SP 2016 FICHA CATALOGRÁFICA ELABORADA PELA SEÇÃO TÉC. AQUIS. TRATAMENTO DA INFORM. DIVISÃO TÉCNICA DE BIBLIOTECA E DOCUMENTAÇÃO - CÂMPUS DE BOTUCATU - UNESP BIBLIOTECÁRIA RESPONSÁVEL: ROSEMEIRE APARECIDA VICENTE-CRB 8/5651 Geraldo, Marcos Tadeu. Caracterização in silico dos mecanismos de interação entre sequências de localização nuclear e Importina -α / Marcos Tadeu Geraldo. - Botucatu, 2016 Tese (doutorado) - Universidade Estadual Paulista "Júlio de Mesquita Filho", Instituto de Biociências de Botucatu Orientador: Ney Lemke Coorientador: Agnes Alessandra Sekijima Takeda Capes: 20205007 1. Dinâmica molecular. 2. Transporte ativo do núcleo celular. 3. Carioferinas. 4. Cristalografia de raio X. Palavras-chave: Dinâmica molecular; Importação nuclear; Importina -α; Modos normais; Sequência de localização nuclear. Este trabalho é dedicado aos meus queridos pais José Eduardo e Maria Helena e à minha amada noiva Natália Totti, por tudo o que fizeram e ainda fazem por mim, sempre com muito amor. Agradecimentos À Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) pelas bolsas de doutorado e de estágio no exterior (processos: 2012/19447-2 e 2014/21976-9). Ao Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) pela bolsa de doutorado (processo: 142110/2012-4) no início do curso. Ao meu orientador Ney Lemke, por participar diretamente nas discussões e análises dos meus resultados, sempre contribuindo ativamente, além de ser um orientador exemplar por nos oferecer algo muito importante para a nossa carreira científica: autonomia e liberdade para tomadas de decisão. Estas qualidades pretendo levar durante toda a minha vida profissional. À minha coorientadora Agnes Takeda por ter sido a pessoa que acompanhou de perto o desenvolvimento do meu trabalho, além dos seus ensinamentos e conselhos que, com certeza, foram fundamentais para a produção desta tese. Ao professor Antônio Sérgio (UFABC) pela colaboração, me ensinando sobre a técnica de análise de modos normais que utilizei em meu trabalho. Ao professor David Perahia (ENS – Cachan, França) por me receber e ser o supervisor do meu estágio em seu laboratório por 5 meses, me auxiliando diretamente na preparação dos scripts para as análises dos dados de modos normais. A todos do Laboratório de Bioinformática e Biofísica Computacional pelo compa- nheirismo no dia a dia. Ao Programa de Pós-Graduação em Ciências Biológicas (Genética) do Instituto de Biociências de Botucatu. Ao Departamento de Física e Biofísica pelo apoio institucional e pelos espaços físicos. A todos da Seção de Pós-Graduação por toda ajuda referente ao controle da documentação e solicitude para atender a todas as minhas dúvidas. À minha família, em especial aos meus pais José Eduardo e Maria Helena por todo o carinho, apoio e dedicação em todos os passos de minha vida. A eles sou eternamente grato! Agradeço também aos meus irmãos Cássia e Eduardo pelo carinho, risadas e apoio durante minha vida. Aos meus sogros Wanda e Silvio por me receberem muito bem, além de terem sempre carinho e torcerem muito por mim. À minha noiva Natália Totti por seu amor, carinho e atenção em todos os momentos da minha vida desde que nos conhecemos. Não tenho palavras para expressar o quanto sou abençoado por estar ao seu lado! Todo o seu companheirismo e amor dedicados certamente foram essenciais durante estes 4 anos de doutorado, e sei que para as próximas etapas, tanto da minha vida científica e pessoal, você estará comigo. Muito obrigado, meu amor! Amo você imensamente! E, é claro, um agradecimento à nossa filhinha felina, Greta, que nos ensina sempre a meditar, com aquela sua carinha de paz e tranquilidade. Por fim, desejo concluir meus agradecimentos lembrando que apesar dos problemas e entraves que possam surgir durante qualquer atividade, sempre precisamos da humildade e atenção necessária para perceber tudo de bom que também nos acontece. E por isso, tenho um grande sentimento de gratidão por tudo que vivenciei durante o meu curso de doutoramento e estou muito satisfeito pelo o que foi produzido! When you complain, you make yourself a victim. Leave the situation, change the situation or accept it. All else is madness. Eckhart Tolle Resumo Os sistemas de importação nuclear são responsáveis pelo intercâmbio entre o citoplasma e o núcleo da célula, permitindo que proteínas com função nuclear migrem através da membrana que separa essas duas regiões. A via de importação mais estudada é a via clássica de importação nuclear mediada pela Importina-α (Impα). A Impα é uma proteína solenóide, composta por repetições em tandem do motivo Armadillo (ARM) que formam uma estrutura longa e contorcida, com pequenos arcabouços ao longo do eixo da proteína. As sequências de localização nuclear clássicas (cNLSs) presentes nas proteínas-alvo de importação são compostas por resíduos carregados positivamente e estabelecem pontes salinas, ligações de hidrogênio e contatos hidrofóbicos com esses arcabouços da Impα. Esse reconhecimento pode ocorrer em um ou em dois sítios da Impα, caracterizando a cNLS como monopartida ou bipartida, respectivamente. A maioria das informações estruturais do complexo cNLS-Impα provém de dados de cristalografia e pouco se sabe sobre a dinâmica conformacional deste sistema. Uma abordagem para tratar da dinâmica de um sistema é o uso de técnicas de simulação de biomoléculas, tais como dinâmica molecular e análise de modos normais. Com base nessas técnicas de simulação, o presente estudo teve como objetivo compreender os mecanismos de interação e dinâmica conformacional envolvidos no reconhecimento de cNLSs pela Impα. Particularmente, este trabalho focou nas cNLSs das proteínas Nucleoplasmina e Ku70 complexadas com a Impα. O estudo com a Nucleoplasmina determinou dois movimentos principais da Impα que podem estar associados na função de reconhecimento de cNLSs: dobramento e torção. Os movimentos de dobramento podem estar envolvidos na entrada da cNLS e na acomodação da proteína que a contém, dependendo do seu tamanho, enquanto que os movimentos de torção podem estar envolvidos no reconhecimento da cNLS e na sua acomodação aos sítios de ligação da Impα. Além disso, resíduos correspondentes à região linker, situada entre os grupos de resíduos básicos da cNLS bipartida, também podem auxiliar no ajuste da cNLS na Impα. Por fim, o estudo com a Ku70 verificou, com base na análise de contatos e correlações na interface peptídeo-proteína e nos perfis geométricos da Impα, que esta não se ligaria à Impα como uma cNLS bipartida. Em conclusão, os dados aqui obtidos podem auxiliar no entendimento das afinidades entre as cNLSs já descritas, como também na análise de outras potenciais cNLSs. Palavras-chave: importação nuclear. importina-α. sequência de localização nuclear. di- nâmica molecular. modos normais. Abstract Nuclear import systems are responsible for the exchange between the cytoplasm and the nucleus of a cell, allowing nuclear proteins to migrate through the membrane that separates these two regions. The most studied import pathway is the classical nuclear import mediated by Importin-α (Impα). Impα is a solenoid protein consisting of tandem repeats of the Armadillo (ARM) motif, forming an extended and twisted structure with small grooves along the protein axis. The classical nuclear localization sequences (cNLSs) of cargo proteins are composed of positively charged residues and establish salt bridges, hydrogen bonds and hydrophobic contacts with the grooves of Impα. Such recognition can occur at one or two sites of Impα, thus characterizing the cNLS as monopartite or bipartite, respectively. Most structural information of the cNLS-Impα complex is from crystallographic data and little is known about the conformational dynamics of this system. One approach to address the dynamics of a system is the use of biomolecular simulation techniques such as molecular dynamics and normal modes analysis. Based on these techniques, this study aimed to understand the mechanisms of interaction and conformational dynamics involved in the recognition of cNLSs by Impα. In particular, this work focused on the cNLSs of Nucleoplasmin and Ku70 proteins complexed with Impα. The study of Nucleoplasmin determined two main motions of Impα that may be associated to the cNLS recognition: bending and twisting. The bending motion may be involved in the cNLS entrance and the accommodation of cargo protein depending on its size, whereas the twisting motions may be involved in the cNLS recognition and accommodation into the binding sites of Impα. Furthermore, the residues corresponding to the linker region, situated between the groups of basic residues from the bipartite cNLS, may also assist in setting the cNLS into Impα. Finally, the study of Ku70 verified, based on the contact and correlation analyses of the peptide-protein interface and the geometric profiling of Impα, that its peptide would not bind to Impα as a bipartite cNLS. In conclusion, the data obtained here may help in understanding the affinities between the cNLSs already described, as well as the analysis of other potential cNLSs. Keywords: nuclear import. importin-α. nuclear localization sequence. molecular dynamics. normal modes. Sumário 1 INTRODUÇÃO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1 Transporte núcleo-citoplasma . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Importação nuclear mediada por Importina-α . . . . . . . . . . . . . 12 1.3 Simulação computacional de biomoléculas . . . . . . . . . . . . . . . 17 1.3.1 Dinâmica molecular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.2 Análise de modos normais . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 OBJETIVO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 MANUSCRITO I: NUCLEOPLASMINA . . . . . . . . . . . . . . . . 22 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.1 Model of study: Classical bipartite NLS . . . . . . . . . . . . . . . . . . . 25 3.3.2 MD simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3 NM analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Selection of Impα-NplNLS model . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 Standard MD combined with NM-displacement method . . . . . . . . . . . 30 3.4.3 Collective motions of Impα . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.4 Main contacts in Impα-NplNLS interface . . . . . . . . . . . . . . . . . . 32 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.1 Bending and twisting motions may be directly related to Impα function . . 32 3.5.2 The role of the linker residues in cNLS recognition . . . . . . . . . . . . . 35 3.5.3 NM analysis with classic MD simulations . . . . . . . . . . . . . . . . . . 36 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.9 Supporting Information . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 MANUSCRITO II: KU70 . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.1 In silico modeling: Ku70NLS complexed with Impα . . . . . . . . . . . . . 63 4.3.2 Search of similar structures to Impα . . . . . . . . . . . . . . . . . . . . . 63 4.3.3 MDeNM method procedure . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3.4 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Geometric analysis of Impα and selection of NMs for MDeNM . . . . . . . 66 4.4.2 NMs activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.3 Contacts in major and minor sites . . . . . . . . . . . . . . . . . . . . . . 68 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.1 Ku70NLS may not bind as a classical bipartite NLS . . . . . . . . . . . . . 68 4.5.2 The conformational changes from MDeNM were limited to Impα . . . . . . 69 4.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.7 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.8 Supporting Information . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 CONSIDERAÇÕES FINAIS . . . . . . . . . . . . . . . . . . . . . . . 84 Referências . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 11 1 Introdução 1.1 Transporte núcleo-citoplasma Um dos principais passos evolutivos que ocorreram no nível celular foi a aquisição de compartimentos internos, delimitados por membranas lipídicas. Internalização da mem- brana e a formação de organelas forneceram uma grande vantagem evolutiva pela realização simultânea de diferentes funções dentro destes compartimentos distintos, permitindo que as células aumentassem a sua robustez e complexidade (FLOCH; PALANCADE; DOYE, 2013). O núcleo celular é a organela que distingue células eucarióticas de procarióticas e o seu limite, a membrana nuclear, permite que as células eucarióticas regulem espacial e temporalmente as distintas fases de expressão do genoma (transcrição do mRNA, maturação e tradução), pois ela se torna uma barreira importante para proteínas e RNA, já que ambos precisam ser transportados de forma regulada (FLOCH; PALANCADE; DOYE, 2013). Além da transferência de grandes quantidades de proteínas nucleares constitutivas, tais como as histonas, alterações na expressão gênica em geral requerem a entrada controlada no núcleo de moléculas de sinalização (STEWART, 2007). No entanto, o conteúdo nuclear não é totalmente isolado a partir do citoplasma, graças a um sistema chamado complexo poro nuclear (NPC). Macromoléculas maiores que aproximadamente 40 kilo-Daltons (kDa) são transportadas ativamente através do envelope nuclear por meio dos NPCs, utilizando fatores de transporte solúveis ou moléculas transportadoras que circulam entre o citoplasma e o núcleo (GÖRLICH; KUTAY, 1999; FAHRENKROG; AEBI, 2003; PEMBERTON; PASCHAL, 2005). Os NPCs são grandes complexos de macromoléculas que formam um canal através da membrana nuclear e possibilitam a troca de moléculas de maneira bidirecional entre o citoplasma e o núcleo com um diâmetro de canal limitante variando de 25 a 30 nanômetros (nm) (FELDHERR; AKIN; COHEN, 2001). NPCs são construídos a partir de múltiplas cópias de aproximadamente 30 diferentes proteínas denominadas nucleoporinas (ROUT; WENTE, 1994; ROUT et al., 2000; CRONSHAW et al., 2002). A translocação de proteínas através de NPCs requer o suporte adicional de proteínas ou fatores de transporte. Muitas dessas transportadoras pertencem à superfamília das β-carioferinas (β-Kap) (ou Importina-β, Impβ). Todos os membros da família β-Kap são construídos de repetições em tandem HEAT (ANDRADE; BORK, 1995), cada um dos quais contém 40-45 aminoácidos que formam duas α-hélices antiparalelas ligadas por um loop. Esta estrutura repetitiva aloca as β-Kaps na categoria das proteínas em solenóide (KOBE; KAJAVA, 2000), que aparece com destaque entre proteínas envolvidas no transporte nucleocitoplasmático. 1. Introdução 12 Um componente-chave adicional das vias de transporte nuclear é a Ran-GTPase. Os ciclos da Ran entre os estados ligados a GDP e GTP (GÖRLICH et al., 1996), e o estado do nucleotídeo ligado são determinados por proteínas Ran reguladoras, incluindo a Ran-GEF (Fator de troca da guanosina: induz a Ran-GDP a trocar o GDP por GTP) (BISCHOFF; PONSTINGL, 1991) e Ran-GAP (Proteína ativadora da GTPase: induz a proteína Ran a hidrolisar o GTP) (BISCHOFF et al., 1994). O gradiente da Ran-GDP/GTP, gerado por Ran-GEF e Ran-GAP que estão localizados no núcleo e no citoplasma, respectivamente, estabelece a direção das vias de transporte nucleocitoplasmático. Como resultado, os receptores de importação que se ligam no citoplasma a alguma molécula a ser importada, podem liberá-la no núcleo por meio da ligação à Ran-GTP, enquanto que os receptores de exportação que se ligam no núcleo a alguma molécula a ser exportada, podem se ligar simultaneamente à Ran-GTP e, posteriormente, podem liberá-la no citoplasma após a hidrólise do GTP (GÖRLICH et al., 1996; MOROIANU; BLOBEL; RADU, 1996; REXACH; BLOBEL, 1995). A localização nuclear de uma proteína está geralmente associada à presença de sequências de localização nuclear (NLS). O primeiro sinal de direcionamento nuclear desco- berto, e o melhor caracterizado até o momento, é a sequência de localização nuclear clássica (cNLS) reconhecida pela proteína Importina-α (Impα) (ou Carioferina-α) (GÖRLICH et al., 1995). Impα é também uma proteína do tipo solenóide, mas é composta a partir de 10 repetições ou motivos chamados de Armadillo (ARM) (KOBE, 1999; CONTI; KURIYAN, 2000) (Figura 1). Cada motivo ARM é formado por três α-hélices (H1, H2 e H3) com uma rotação em seu eixo que confere à molécula uma estrutura alongada e contorcida (CONTI et al., 1998). A porção côncava da Impα é o local de ligação da cNLS e apresenta uma alta conservação de resíduos justamente na interface de contato com a cNLS (MARFORI et al., 2012) (Figura 1). A Impα também apresenta uma pequena estrutura auto-inibitória na região N-terminal, que ocupa a região de ligação da cNLS. Esta estrutura é chamada de domínio de ligação à Impβ (IBB), pois se liga à Impβ (WEIS; RYDER; LAMOND, 1996) a fim de liberar o sítio de ligação à cNLS. Além dos sinais de importação, algumas proteínas também têm sinais de expor- tação nuclear (NESs) (KUTAY; GÜTTINGER, 2005; COOK et al., 2007) e podem ser transportadas para dentro e fora do núcleo. Os membros da família β-Kap transportam proteínas que contêm NLS, ligando-se diretamente a ela ou por meio de uma molécula adaptadora tal como a Impα ou a Snurportina-1 (STEWART, 2007). 1.2 Importação nuclear mediada por Importina-α A via de importação nuclear clássica (Figura 2) é a melhor caracterizada destes ciclos de transporte, e todos os seus componentes foram identificados (GÖRLICH; KUTAY, 1. Introdução 13 Figura 1 – Estrutura da Impα [adaptado de Christie et al. (2015)]. (A) Estrutura da Impα de arroz (PDB ID 4BQK) com α-hélices H3 na cor verde. (B) Estrutura da Impα de camundongo complexada com a cNLS da Nucleoplasmina (PDB ID 3UL1) (mostrado em stick). A Impα está colorida com base na conservação das sequências de estruturas de Impα conhecidas de diferentes espécies (Homo sapiens, Mus musculus, Saccaromyces cerevisiae, Oryza sativa, Arabidopsis thaliana e Neurospora crassa). 1999; CHOOK; BLOBEL, 2001; FAHRENKROG; AEBI, 2003; PEMBERTON; PASCHAL, 2005). Embora um número considerável de componentes e interações sejam necessárias para gerar o transporte nuclear, este sistema é menos complexo do que os sistemas que funcionam em outros processos celulares centrais, tais como sinalização, motilidade celular, ou o transporte de vesícula (STEWART, 2007). Isto, combinado com a disponibilidade de informação estrutural sobre a maioria dos componentes e muitos dos seus complexos, faz com que o ciclo da importação nuclear de proteínas se torne um sistema atrativo para desenvolver conceitos que podem servir como paradigmas da funcionalidade em sistemas biológicos mais complexos. Existem diversas vias de importação nuclear de proteínas que usam diferentes carregadores, mas que compartilham muitas características comuns e são baseadas em uma série de proteínas que atuam em concerto: interações proteicas em que as proteínas transportadas são reconhecidas no citoplasma, translocadas através das NPCs e liberadas no núcleo (GÖRLICH; KUTAY, 1999; CHOOK; BLOBEL, 2001; FAHRENKROG; AEBI, 2003; PEMBERTON; PASCHAL, 2005). Em cada via, as proteínas a serem transportadas são reconhecidas por meio de um sinal denominado sequência de localização nuclear (NLS). As NLSs foram subdivididas em diversas classes (KOSUGI et al., 2009), podendo inclusive 1. Introdução 14 ser reconhecidas por componentes de vias diferentes. A via clássica de importação nuclear, a qual utiliza Impα e Impβ como transportadoras, importa uma ampla variedade de proteínas e tem sido estudada com mais detalhes. Proteínas com uma NLS clássica são importadas pela Impβ, que se liga a essas proteínas por meio da proteína adaptadora chamada Impα (GÖRLICH; KUTAY, 1999; CHOOK; BLOBEL, 2001; FAHRENKROG; AEBI, 2003; PEMBERTON; PASCHAL, 2005). Ambas as importinas são moléculas alongadas formadas por uma série de sequências repetidas em tandem. De forma simplificada, o ciclo de importação nuclear pode ser dividido em quatro etapas: i) montagem no citoplasma do complexo importina e proteína a ser transportada; ii) translocação através das NPCs; iii) desmontagem do complexo no núcleo e iv) reciclagem do complexo de importação (Figura 2). Figura 2 – Via clássica de importação nuclear [adaptado de Stewart (2007)]. O complexo de importação é formado no citoplasma entre proteínas a serem importadas que apresentam sinais de localização nuclear (NLSs), Impα e Impβ. Depois de passar através do complexo poro-nuclear (NPCs), a ligação de RanGTP na Impβ dissocia esta da Impα. A proteína contendo NLS é então deslocada da Impα e esta é transportada para o citoplasma por seu fator de exportação nuclear (i.e. CAS) complexado com RanGTP. No citoplasma, RanGAP estimula a hidrólise de GTP, liberando as importinas para um outro ciclo de importação. Os estudos de biologia estrutural tem desempenhado um papel importante em decifrar os eventos moleculares necessários para a importação nuclear. A técnica de 1. Introdução 15 cristalografia de raios-X tem sido utilizada extensivamente para elucidar os detalhes moleculares da ligação de cNLSs à Impα. O primeiro motivo de direcionamento nuclear foi identificado no antígeno-T (TAg) do vírus símio SV40 (SV40TAg). O SV40TAg é composto por uma pequena extensão de aminoácidos básicos (126-PKKKRRV-132: os resíduos básicos estão em negrito) funda- mentais para a importação nuclear, já que substituições de resíduos neste motivo inibiram o transporte da proteína (KALDERON et al., 1984a). Posteriormente, outro sinal de localização nuclear foi identificado para a proteína Nucleoplasmina de Xenopus laevis. Esse sinal compreende dois grupos de resíduos básicos separados por uma região intermediária chamada de linker, composta de 10 a 12 resíduos (155-KRPAATKKAGQAKKKK-170: os resíduos básicos estão em negrito e a região linker está sublinhada) (DINGWALL et al., 1988). De maneira semelhante ao SV40TAg, substituições de resíduos em qualquer um destes grupos afetou o transporte nuclear da proteína, sugerindo que ambos os motivos foram necessários para o direcionamento nuclear (ROBBINS et al., 1991). Muitas proteínas com cNLS já foram identificadas com base na similaridade com estas duas sequências, consequentemente, as cNLSs são definidas pela presença de um ou dois grupos de seqüência ricas em aminoácidos básicos, principalmente, arginina e lisina, que são necessários e suficientes para a importação nuclear pelo complexo Impα:Impβ (FONTES; TEH; KOBE, 2000). Estes dois grupos são os sítios de ligação da cNLS à Impα e são frequentemente denominados de sítio principal (repetições ARM 2–4) e secundário (repetições ARM 6–8). Além disso, as cNLS que se ligam aos dois ou a apenas um sítio da Impα são denominadas de cNLSs bipartidas (e.g. Nucleoplasmina) e monopartidas (e.g. SV40TAg), respectivamente (FONTES; TEH; KOBE, 2000). As análises estruturais por cristalografia de raios-X demonstraram que as cNLSs monopartidas ligam-se preferencialmente ao sítio de ligação principal (CONTI et al., 1998; FONTES; TEH; KOBE, 2000; YANG et al., 2010). Estes dados também foram corroborados por um estudo de mutagênese que mostrou que a mutação em resíduos do sítio principal inibiram a importação nuclear da cNLS monopartida, no entanto, as mutações no sítio secundário praticamente não afetaram o processo de importação (LEUNG et al., 2003). Por contraste, como já foi mencionado anteriormente para o estudo da Nucleoplasmina, a mutação em resíduos de qualquer um dos sítios de ligação afetaram substancialmente a importação das cNLSs bipartidas (LEUNG et al., 2003; ROBBINS et al., 1991). Para cada sítio de ligação existem resíduos em posições específicas da cNLS que se ligam à Impα durante o processo de formação deste complexo. Na região da cNLS que se liga ao sítio principal da Impα, as cadeias laterais dos resíduos nas posições P2–P5 formam contatos em quatro cavidades principais da proteína, destes, o mais fundamental é o contato em P2 que é representado por uma lisina, cuja cadeia lateral forma uma ponte salina com a cadeia lateral do ácido aspártico da Impα (HODEL; CORBETT; HODEL, 1. Introdução 16 2001; COLLEDGE et al., 1986; FONTES; TEH; KOBE, 2000; FONTES et al., 2003). Para as demais posições (P3–P5), há também uma preferência por resíduos com cadeia lateral básica, no entanto, este pode não ser um requerimento crítico. Por exemplo, um estudo mostrou que as cadeias laterais básicas não são estritamente necessárias para estas posições, contanto que a afinidade geral da cNLS seja suficiente para estabelecer e manter contatos mínimos com a Impα a fim de manter a cNLS funcional (HODEL; CORBETT; HODEL, 2001). A manutenção desta afinidade pode ocorrer por meio da realização de contatos em regiões diretamente flanqueadoras do sítio principal, ou então, da ligação ao sítio secundário no caso das cNLSs bipartidas (HODEL; CORBETT; HODEL, 2001). Isto foi mostrado em um estudo da SV40TAg que teve seu sítio principal mutado e, consequentemente, teve sua distribuição nuclear afetada (MAKKERH; DINGWALL; LASKEY, 1996). Neste estudo, foram adicionados resíduos na região N-terminal que correspondem aos resíduos de ligação ao sítio secundário, e esta alteração foi suficiente para direcionar o acúmulo nuclear desta proteína. As posições fundamentais de ligação ao sítio secundário são compostas também por resíduos básicos, normalmente, de lisina (P1’) e arginina (P2’), sendo que a cadeia lateral da arginina forma uma ponte salina com o resíduo de ácido glutâmico da Impα (FONTES; TEH; KOBE, 2000; MARFORI et al., 2012). Uma comparação de valores de energia livre de ligação entre as posições de ambos os sítios demonstra que P1’ e P2’ são semelhantes às posições P3 e P5 de ligação ao sítio principal (HODEL; CORBETT; HODEL, 2001). Para as cNLS bipartidas, as posições P2’ e P2 são separadas por uma região intermediária, chamada de linker, de no mínimo 10 resíduos que possibilita que a cNLS se ligue simultaneamente nos dois sítios da Impα (LAI et al., 2000). Os dados estruturais tem mostrado a ocorrência de algumas interações a partir da região linker com os resíduos ImpαR315, ImpαY277 e ImpαR238 de algumas cNLSs bipartidas (FONTES; TEH; KOBE, 2000; FONTES et al., 2003; CONTI; KURIYAN, 2000; CHEN et al., 2005; CUTRESS et al., 2008; GIESECKE; STEWART, 2010; YANG et al., 2010; BARROS et al., 2012), porém estas interações não são conservadas e aparentam ser específicas de algumas cNLSs (MARFORI et al., 2012). Com base nos estudos estruturais foi possível determinar os principais contatos envolvidos na ligação das cNLSs à Impα (um esquema geral para os contatos entre a Impα e a cNLS é mostrado na Figura 3), e assim, propor sequências-consenso para ambos os tipos de cNLSs. As cNLSs monopartidas são definidas como K(K/R)X(K/R), enquanto que as bipartidas correspondem a KRX10-12KRRK, KRX10-12K(KR)(KR) e KRX10-12K(K/R)X(K/R) (X corresponde a qualquer resíduo, resíduos de lisina em negrito indicam a posição crítica em P2 e os motivos KR do sítio secundário estão sublinhados) (MARFORI et al., 2011; MARFORI et al., 2012). Apesar de todo o avanço na elucidação dos mecanismos de importação nuclear para 1. Introdução 17 Figura 3 – Representação esquemática de uma NLS monopartida ligada à Impα nos sítios principal (major binding site) e secundário (minor binding site) [adaptado de Christie et al. (2015)]. Resíduos de asparagina e triptofano conservados da Impα são mostrados em verde e amarelo, respectivamente. As cadeias principal e lateral da cNLS são mostradas como linhas pretas e azuis, respectivamente. Linhas tracejadas indicam interações de ponte salina comuns nas cavidades de ligação P2 e P2’. as vias clássicas, questões importantes ainda permanecem. Os movimentos funcionalmente importantes que a Impα pode adotar durante o reconhecimento da cNLS ainda não foram explorados, bem como uma análise sistemática das interações na região linker a fim de avaliar a ocupação destes contatos durante a sua ligação à Impα. A maior parte dos estudos estruturais com Impα tiveram um enfoque estático do sistema estudado. Uma abordagem interessante para tratar questões dinâmicas de um sistema é o uso de técnicas computacionais envolvendo a simulação de biomoléculas. Elas permitem analisar a evolução de componentes interativos da molécula e acessar comportamentos dinâmicos globais do sistema, que seriam de alto custo ou de difíceis (até mesmo limitados) procedimentos para serem obtidos experimentalmente. 1.3 Simulação computacional de biomoléculas 1.3.1 Dinâmica molecular Ao longo do último meio século, os avanços em biologia estrutural forneceram a resolução atomística de muitas das moléculas que são essenciais à vida, incluindo 1. Introdução 18 proteínas e ácidos nucleicos (DROR et al., 2012). Embora modelos estruturais estáticos determinados por meio de técnicas como a cristalografia de raios-X sejam extremamente úteis, as moléculas que estes modelos representam são, na realidade, altamente dinâmicas, e seus movimentos são muitas vezes fundamentais para a sua função celular (DROR et al., 2012). As proteínas estão envolvidas em processos biológicos como metabolismo, trans- missão de sinal, armazenamento de energia, defesa contra invasores e formação de tecido muscular. A capacidade para a realização dessas funções depende das possíveis altera- ções conformacionais da proteína, da dinâmica dessas conformações e das consequentes interações geradas (HALPERIN et al., 2002). Um completo entendimento da função proteica, portanto, requer uma compreensão tanto do comportamento da dinâmica como das características estruturais estáticas da proteína (GIPSON et al., 2012). Simulações computacionais de sistemas biomoleculares têm crescido rapidamente ao longo das últimas décadas, simulando desde proteínas muito pequenas no vácuo até grandes complexos de proteínas na presença de solvente (KARPLUS; MCCAMMON, 2002; OROZCO, 2014). Neste contexto, a simulação de dinâmica molecular é uma ferramenta computacional amplamente utilizada para simular as propriedades dos líquidos e sólidos a nível atomístico em áreas de investigação, tais como a química, termodinâmica e várias outras áreas (YANG; WANG; CHEN, 2007). Simulações de dinâmica molecular all atom, empregando mecânica clássica, permitiram o estudo de complexos muito grandes de proteínas, tais como o ribossomo (SOTHISELVAM et al., 2014) e capsídeos de vírus (SUN et al., 2013; ZHAO et al., 2013). Métodos híbridos QM/MM (Quantum Mechanics/Molecular Mechanics) permitiram o estudo da atividade enzimática (SENN; THIEL, 2007) e moléculas polarizáveis em membranas biológicas (BERNARDI; PASCUTTI, 2012). Métodos computacionais de simulação biomolecular oferecem uma clara vantagem no entendimento da flexibilidade da proteína, podendo caracterizar a dinâmica deste sistema. Neste contexto, simulações de dinâmica molecular modelam as interações físicas entre átomos resolvendo as equações de movimento de Newton, Lagrange ou Langenvin (ADCOCK; MCCAMMON, 2006). No entanto, a solução da dinâmica desses sistemas é complicada (SCHLICK et al., 1999; HENZLER-WILDMAN; KERN, 2007). Simulações de dinâmica molecular são ainda limitadas em dois aspectos: precisão dos campos de força e alto custo computacional. Essas limitações podem gerar uma amostragem inadequada dos estados conformacionais, limitando a capacidade de analisar e revelar propriedades funcionais dos sistemas examinados. Moléculas biológicas são conhecidas por terem perfis rugosos de energia, com muitos mínimos locais frequentemente separados por barreiras de alta energia (ONUCHIC; LUTHEY-SCHULTEN; WOLYNES, 1997), o que pode ocasionar num aprisionamento em algum estado não-funcional, de difícil deslocamento pela maioria dos métodos de simulações convencionais. Estudos recentes têm demonstrado que, de fato, 1. Introdução 19 em simulações longas, as proteínas podem ficar “presas” em conformações não relevantes sem conseguir voltar à conformação original funcionalmente relevante (BERGONZO et al., 2013; MARSILI et al., 2010). Na tentativa de sobrepor essas limitações, vários métodos têm sido desenvolvidos para esta finalidade, como dinâmica molecular com troca de réplica (replica-exchange molecular dynamics) (REMD) (SUGITA; OKAMOTO, 1999; GARCÍA; SANBONMATSU, 2001), metadinâmica (LAIO; PARRINELLO, 2002; BARDUCCI; BUSSI; PARRINELLO, 2008) e simulação por termalização generalizada (generalized simulated annealing) (TSALLIS; STARIOLO, 1996). Outra alternativa para expandir a exploração conformacional de proteínas tem sido a aplicação do método analítico de modos normais de vibração. Independentemente dos métodos aplicados, todos estão associados a vantagens e desvantagens para a simulação efetiva de sistemas biológicos e a escolha adequada de qual método utilizar está atrelada à particularidade e ao tipo de informação que se deseja obter a partir de cada sistema biológico de interesse. 1.3.2 Análise de modos normais Nas últimas décadas, houve um grande aumento no número de estudos baseados na análise de componentes principais de estruturas biomoleculares (JOLLIFFE, 2002). Estes estudos têm se mostrado útil em desvendar os modos coletivos, e em particular aqueles modos de baixa frequência, que fundamentam a dinâmica em equilíbrio de proteínas (KITAO; GO, 1999). Análise de modos normais de estruturas em equilíbrio (BAHAR; RADER, 2005; CUI; BAHAR, 2010), análise da dinâmica essencial de matrizes de covari- ância obtidas de simulações de dinâmica molecular (AMADEI; LINSSEN; BERENDSEN, 1994) e decomposição em valores singulares de trajetórias de dinâmica molecular ou Monte Carlo (BAHAR et al., 1997; GARCÍA; HARMAN, 1996; ROMO et al., 1995), todos estes compõem a categoria de métodos baseados em análise de componentes principais. Análise de modo normal é uma ferramenta poderosa para predizer os possíveis movimentos de uma determinada macromolécula. Modos normais de vibração são oscilações harmônicas simples a partir de um mínimo local de energia, característico de um sistema de estrutura ~R e sua função de energia V (~R) (CUI; BAHAR, 2010). Para uma função V (~R) puramente harmônica, qualquer movimento pode ser exatamente expresso como uma combinação de modos normais (CUI; BAHAR, 2010). Para uma função V (~R) não- harmônica, o potencial próximo ao ponto de mínimo de energia ainda pode ser aproximado apropriadamente por um potencial harmônico e qualquer movimento de pequena amplitude ainda pode ser bem descrito por uma soma de modos normais (CUI; BAHAR, 2010). Uma aplicação importante dos modos normais é a identificação de potenciais mudanças de conformação, por exemplo, de enzimas após a associação do ligante (TAMA et al., 2000; TAMA; SANEJOUAND, 2001; DELARUE; SANEJOUAND, 2002). O método também tem sido utilizado recentemente no estudo da abertura dos canais de membrana 1. Introdução 20 (VALADIE et al., 2003), na análise de movimentos estruturais do ribossomo (TAMA et al., 2003), na maturação do capsídeo de vírus (KIM; JERNIGAN; CHIRIKJIAN, 2003) e na análise de movimentos de domínio em grandes proteínas em geral (HINSEN, 1998; HINSEN; THOMAS; FIELD, 1999). Análise de modos normais é mais frequentemente usada na tentativa de prever que tipo de mudança conformacional uma proteína sofre, a fim de cumprir a sua função, por meio da análise de seus modos de menor frequência e, consequentemente, maior amplitude. Normalmente, 50% dos movimentos observados na proteína podem ser descritos com precisão apenas por um ou dois modos de baixa frequência (SUHRE; SANEJOUAND, 2004). Cálculos de modos normais de moléculas de interesse biológico foram introduzidos vários anos após a primeira simulação de dinâmica molecular (CUI; BAHAR, 2010). É reconhecido que os modos normais têm vários atributos importantes, que os tornam de interesse como um complemento para simulações de dinâmica molecular. A principal desvantagem dos modos normais é justamente a sua aproximação do potencial total por uma função harmônica em torno de uma estrutura de mínima energia (CUI; BAHAR, 2010). Apesar de ser uma aproximação harmônica, há vantagens importantes que a tornam uma técnica que vem sendo utilizada em estudos de dinâmica de macromoléculas: (a) ao contrário de simulações de dinâmica molecular, os resultados são essencialmente precisos e sem erros estatísticos, (b) é necessária apenas a diagonalização de uma matriz, (c) a quantificação, que é de particular importância para os cálculos de calor e cálculos específicos de entropia, pode ser introduzida de uma forma simples, e (d) o fornecimento de informações sobre as alterações de conformação é muitas vezes mais fácil de visualizar do que simulações de dinâmica molecular, principalmente por proporcionar a observação de movimentos de maior amplitude (CUI; BAHAR, 2010). Diferentes exemplos demonstram movimentos de transição funcionalmente impor- tantes, frequentemente seguindo as trajetórias de um ou mais modos normais (MA, 2005). Uma conclusão importante nestes estudos é que as estruturas das proteínas evoluíram de uma maneira em que as flexibilidades estruturais intrínsecas, observadas nos modos normais, facilitam a ocorrência de variações conformacionais funcionalmente importantes. A combinação do cálculo de modos normais com simulações de dinâmica molecular curtas tem favorecido a maior exploração conformacional atrelada a um refinamento das estruturas geradas. Recentemente, foi desenvolvido um protocolo de dinâmica molecular com modos normais excitados (COSTA et al., 2015), em que é possível explorar os perfis de energia livre de mudanças conformacionais amplas. 21 2 Objetivo O objetivo deste estudo foi compreender os mecanismos de interação e dinâmica conformacional envolvidos no reconhecimento de sequências de localização nuclear clássicas pela Importina-α. 22 3 Manuscrito I: Nucleoplasmina O capítulo trata do estudo estrutural envolvendo a proteína Impα complexada com a NLS da proteína Nucleoplasmina. Nosso trabalho teve como enfoque determinar os principais movimentos da Impα possivelmente associados ao reconhecimento da NLS e também avaliar dinamicamente o comportamento das interações na interface do complexo. Para isso, adotamos a combinação das técnicas de dinâmica molecular e de modos normais. Nossos dados mostram que os movimentos de dobramento e torção da Impα podem estar associados ao reconhecimento de NLSs. Além disso, determinamos os contatos críticos durante este reconhecimento, sugerindo que os resíduos da região conhecida como linker podem ter um papel no aumento da afinidade com a NLS e, portanto, estarem envolvidos diretamente no processo de importação nuclear. O trabalho foi publicado na PLOS ONE (doi: 10.1371/journal.pone.0157162). Observação: Conforme estabelecido pelo Conselho do Programa de Pós-Graduação em Ciên- cias Biológicas (Genética), expresso na Instrução Normativa Nº01/2012, os resultados apresentados neste capítulo foram redigidos na forma de manuscrito. http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0157162 3. Manuscrito I: Nucleoplasmina 23 Bending-Twisting Motions and Main Interactions in Nucleoplasmin Nuclear Import Marcos Tadeu Geraldo1,*, Agnes Alessandra Sekijima Takeda1,2, Antônio Sérgio Kimus Braz3, Ney Lemke1 1 Laboratório de Bioinformática e Biofísica Computacional, Departamento de Física e Biofísica, Instituto de Biociências de Botucatu, UNESP – Universidade Estadual Paulista, Botucatu, SP, 18618-970, Brazil 2 Instituto de Biotecnologia (IBTEC), UNESP – Universidade Estadual Paulista, Botucatu, SP, 18607-440, Brazil 3 Laboratório de Biologia Computacional e Bioinformática, Centro de Ciências Naturais e Humanas, UFABC – Universidade Federal do ABC, Santo André, SP, 09210-170, Brazil * mtgeraldo@gmail.com 3.1 Abstract Alpha solenoid proteins play a key role in regulating the classical nuclear import pathway, recognizing a target protein and transporting it into the nucleus. Importin-α (Impα) is the solenoid responsible for cargo protein recognition, and it has been extensively studied by X-ray crystallography to understand the binding specificity. To comprehend the main motions of Impα and to extend the information about the critical interactions during carrier-cargo recognition, we surveyed different conformational states based on molecular dynamics (MD) and normal mode (NM) analyses. Our model of study was a crystallographic structure of Impα complexed with the classical nuclear localization sequence (cNLS) from nucleoplasmin (Npl), which was submitted to multiple 100 ns of MD simulations. Representative conformations were selected for calculating the 87 lowest frequencies NMs of vibration, and a displacement approach was applied along each NM. Based on geometric criteria, using the radius of curvature and inter-repeat angles as the reference metrics, the main motions of Impα were described. Moreover, we determined the salt bridges, hydrogen bonds and hydrophobic interactions in the Impα-NplNLS interface. Our results show the bending and twisting motions participating in the recognition of nuclear proteins, allowing the accommodation and adjustment of a classical bipartite NLS sequence. The essential contacts for the nuclear import were also described and were mostly in agreement with previous studies, suggesting that the residues in the cNLS linker region establish important contacts with Impα adjusting the cNLS backbone. The MD simulations combined with NM analysis can be applied to the Impα-NLS system to help 3. Manuscrito I: Nucleoplasmina 24 understand the interactions between Impα and cNLSs and the analysis of non-classic NLSs. 3.2 Introduction 1 Solenoid proteins are molecules composed of structural motifs that are arranged in 2 tandem, creating a superhelical architecture. This modular characteristic provides the 3 establishment of folding and binding contacts that contrasts the globular proteins, 4 allowing higher flexibility and the arrangement of cooperative protein-protein interactions 5 due to the formation of diversified interfaces (KOBE; KAJAVA, 2000; KAPPEL et al., 6 2010; DOYLE et al., 2015; SETTANNI et al., 2013). 7 A remarkable characteristic in nuclear protein import regulation is the contribution of 8 solenoid proteins to recognizing a target protein and transporting it into the nucleus 9 (CHRISTIE et al., 2015). One of the most studied pathways is the classical nuclear import 10 pathway, which requires the interaction between the solenoid Importin-α (Impα) and 11 Importin-β (Impβ) proteins, followed by the assembly of the cargo protein to Impα 12 (CONTI et al., 1998; RADU; BLOBEL; MOORE, 1995; CINGOLANI et al., 1999; 13 GÖRLICH et al., 1995; GÖRLICH et al., 1996). This trimeric complex is translocated 14 through the nuclear pore complex (NPC), and the cargo protein is delivered into the cell 15 nucleus mediated by Ran-GTP-dependent steps of protein-protein interactions 16 (MOROIANU; BLOBEL; RADU, 1996; LEE et al., 2005). The dissociation of the cargo 17 protein from Impα is catalyzed by nucleoporins (e.g., NUP50) (MATSUURA; STEWART, 18 2005), and Impα binds to its export factor CAS complexed with RanGTP (MATSUURA; 19 STEWART, 2004). A final step is the recycling of both Impα and Impβ back to the 20 cytoplasm. 21 Impα is composed of ten tandem repeats of armadillo motifs (ARM) oriented in an 22 elongated and curved-twisted shape (CONTI et al., 1998; RIGGLEMAN; WIESCHAUS; 23 SCHEDL, 1989; GÖRLICH et al., 1996). Each motif is formed by a superhelical 24 architecture of three α-helices (CONTI et al., 1998; RIGGLEMAN; WIESCHAUS; 25 SCHEDL, 1989; KOBE, 1999). From the curved orientation, convex and concave surfaces 26 can be identified; in particular, the inner concave surface harbors conserved residues that 27 mediate the cargo protein binding (CONTI et al., 1998; MARFORI et al., 2012). 28 Cargo protein transport depends on the recognition of a specific sequence signal by Impα 29 called the nuclear localization sequence (NLS). For classical NLSs (cNLSs), the binding 30 pattern to Impα is primarily mediated by one or two clusters of positively charged 31 residues, the monopartite or bipartite cNLSs, respectively (DINGWALL et al., 1988; 32 KALDERON et al., 1984b; ROBBINS et al., 1991; FONTES et al., 2003; KOSUGI et al., 33 2009). The clusters of bipartite cNLSs are separated by 10-12 variant residues, 34 3. Manuscrito I: Nucleoplasmina 25 denominated as the linker region. 35 The inner concave surface of Impα is adapted to receive either monopartite or bipartite 36 cNLSs, and its specific binding sites can be identified as the major and minor sites. 37 Structural and biophysical studies have related important positions of Impα to the NLS 38 binding: positions P2-P5 from the major site (ARMs 2-4; binding to both monopartite 39 and bipartite cNLSs) and P1’-P2’ from the minor site (ARMs 6-8; binding preferentially 40 to bipartite cNLSs) (FONTES et al., 2003; HODEL; CORBETT; HODEL, 2001). 41 Moreover, the region between major and minor sites (ARMs 4-6) provides interactions 42 and is also considered fundamental in cNLS recognition (FONTES et al., 2003; FONTES; 43 TEH; KOBE, 2000; CONTI; KURIYAN, 2000; CHEN et al., 2005; CUTRESS et al., 2008; 44 GIESECKE; STEWART, 2010; YANG et al., 2010; BARROS et al., 2012). 45 Few studies have reported the flexibility and structural integrity of Impβ (KAPPEL et al., 46 2010; FORWOOD et al., 2010); however, the application of computational simulation 47 approaches is underexplored for understanding the wide structural motions and the 48 interaction basis of nuclear import mediated by Impα binding. Therefore, two main 49 questions arose: (i) Are there motions related to NLS recognition? (ii) What is the 50 dynamical behavior of the interactions on the complex interface? Motivated by these 51 questions, we combined molecular dynamics simulations (MD) and normal mode (NM) 52 analysis. As our cNLS model of study, we used the crystallographic structure of 53 nucleoplasmin NLS (NplNLS) complexed with Impα because experimental works 54 involving Npl (MARFORI et al., 2012; FONTES; TEH; KOBE, 2000; MAKKERH; 55 DINGWALL; LASKEY, 1996) are available and could be used to support and contrast 56 with our simulation results. Npl is the first protein to be described as a molecular 57 chaperone involved in chromatin reprogramming (DINGWALL; LASKEY, 1990), and it is 58 characterized as containing a bipartite cNLS 59 (Npl:151GSAVKRPAATKKAGQAKKKKLD172; residues in bold indicate the positions 60 in contact with Impα minor and major binding sites, respectively). We determined that 61 bending and twisting-like major movements of Impα may influence the NLS binding. In 62 addition, we confirmed the importance of contacts in the major and minor sites, along 63 with contacts flanking these sites, including the linker region, for the establishment of 64 carrier-cargo recognition. 65 3.3 Materials and Methods 66 3.3.1 Model of study: Classical bipartite NLS 67 All simulation analyses were conducted using the crystallographic structure of NplNLS 68 complexed to mouse Impα isoform 2 (Impα-NplNLS; PDB ID: 3UL1) (MARFORI et al., 69 2012). The missing atoms from residues G152 and S342 of NplNLS (152GSAVKRPAATKK 70 3. Manuscrito I: Nucleoplasmina 26 AGQAKKKKLD172) and Impα (Impα∆IBB, residues 72-497), respectively, were modeled 71 with MODELLER v9.11 software (ESWAR et al., 2006). The N-terminal Impβ-binding 72 (IBB, residues 1 to 71) domain from Impα was kept truncated because it competes to the 73 binding in the NLS region. The choice criteria for determining the best model were guided 74 by the correct stereochemistry and the occurring interactions in the complex interface, 75 using Molprobity (CHEN et al., 2009) and PISA (KRISSINEL; HENRICK, 2007) servers, 76 respectively. An additional modeling for Impα was conducted removing the peptide (Apo 77 Impα) to compare the motions and flexibility of Impα in the presence and absence of the 78 cNLS. 79 3.3.2 MD simulations 80 The topology and parameter files were generated on the program GROMACS v4.5.3 81 (HESS, 2008), employing the force field Charmm36 (HUANG; MACKERELL, 2013) 82 without a protonation requirement because the PROKPA webserver (http://propka.org) 83 analysis indicated no protonation changes. A cubic box of 56,568 explicit TIP3P water 84 molecules (JORGENSEN et al., 1983; MACKERELL et al., 1998) was generated, 85 ensuring at least 10 ångströms (Å) from the protein system to each edge of the box. 86 Counter ions were added for system-charge neutralization by replacing water molecules. 87 The system was submitted to a gradual energy minimization composed of four steps: (i) 88 500 steps of energy minimization by the steepest descent method, limiting the protein and 89 peptide movement to accommodate the solvent molecules; (ii) 50,000 steps of energy 90 minimization by the steepest descent method, limiting the movement of the protein and 91 peptide’s main chains; (iii) 50,000 steps of unrestricted energy minimization by the 92 steepest descent method; and (iv) 50,000 steps of unrestricted energy minimization by the 93 conjugated gradients (CG) method. 94 Equilibration and unrestrained MDs were performed in periodic boundary conditions. The 95 leapfrog integrator was used for integrating Newton’s equations of motion. The linear 96 constraint solver (LINCS) method (HESS et al., 1997; HESS, 2008) was used to freeze 97 bonds involving hydrogen atoms, allowing an integration step of 2 femtoseconds (fs). The 98 cutoff distance for short-range electrostatic and van der Waals interactions was 10 Å. The 99 Particle Mesh Ewald method (PME) (ESSMANN et al., 1995) was used to treat 100 long-range electrostatics. 101 The system was equilibrated in two steps, both applying a position restraining force on 102 the heavy atoms of the protein. The first stage involved the adoption of NVT conditions 103 (constant number of particles, volume and temperature), heating the system to the target 104 temperature of 300 kelvin (K) and simulating in this condition by 100 picoseconds (ps) 105 with a velocity-rescaling (V-rescale) thermostat (BUSSI; DONADIO; PARRINELLO, 106 2007). The second stage involved 100 ps of equilibration by the adoption of NPT 107 3. Manuscrito I: Nucleoplasmina 27 conditions (constant number of particles, pressure and temperature), with pressure 108 coupling using the Parrinello-Rahman barostat (PARRINELLO; RAHMAN, 1981) and 109 keeping the pressure relatively constant, close to the value of 1 bar. After the 110 equilibration procedures, the restraints were removed, and the system was submitted to 111 three MD simulations of 100 nanoseconds (ns) each, with structure sampling every 10 ps. 112 To ensure the conformational sampling of the system, a clustering technique proposed by 113 Lyman and Zuckerman (LYMAN; ZUCKERMAN, 2006) was applied. In this approach, 114 the steps below were applied to generate a collection of reference structures of the 115 simulations {Si}: (1) A cutoff root-mean square deviation (RMSD) d was defined. (2) 116 Each simulation was merged into a single trajectory file. (3) One structure from the 117 trajectory file was sampled randomly and denominated as the reference structure. (4) All 118 structures were compared to the sampled reference structure, and the ones with an RMSD 119 less than d were removed from the trajectory file. (5) Steps 1, 2, 3 and 4 were repeated 120 until all structures were removed, thus generating a collection of reference structures {Si}. 121 (6) Based on the collection {Si}, each frame from the trajectory file was clustered with 122 the nearest reference structure, and the frequency of structures of each cluster was then 123 calculated. An estimation of convergence is assessed when each reference structure is 124 equivalently represented in each simulation. The parameter for the calculation of d was 125 the global RMSD of carbon-α (Cα) atoms. The choice of d (=2.5 Å) was constrained to a 126 feasible number of reference structures for the subsequent analysis of NMs. 127 3.3.3 NM analysis 128 Each reference structure obtained from MD simulations was minimized in the GROMACS 129 program with explicit solvent using the same methodology described above. All NM 130 analyses were carried out in the CHARMM v.36b1 program (BROOKS et al., 2009). The 131 topology and parameter files required for CHARMM were generated with the 132 CHARMM-GUI server (www.charmm-gui.org) employing an additional energy 133 minimization. The CG algorithm was applied with harmonic constraints that were 134 progressively decreased from 250 to 5 kcal/mol-1Å-2, with 100 steps of minimization at 135 each decrease. Then, the constraints were removed, and 10,000 steps of CG were carried 136 out. Afterwards, the adopted basis Newton Raphson (ABNR) algorithm was applied with 137 no constraints for 300,000 steps. 138 The final minimized structure was used for the calculation of the 87 lowest frequency 139 NMs, using the VIBRAN module of CHARMM for each reference structure. An 140 NM-displacement method using the VMOD facility of CHARMM was applied, generating 141 structures along each NM based on short MD simulations at a low temperature (30 K), 142 followed by energy minimization. Based on the values of mass-weighted root mean square 143 (MRMS), the maximum displacement range was set to 3 Å for each direction of the NM 144 3. Manuscrito I: Nucleoplasmina 28 with a 0.1 Å projection step, totaling 61 structures per NM. For each MRMS step, a 145 harmonic force constant over the Cα atoms was applied (increasing from 1,000 until 146 10,000 kcal/mol-1Å-2), and a short MD simulation was carried out for 1 ps for each 147 constant value, totaling 10 ps of simulation. Keeping the restraints, 1,000 steps of CG 148 energy minimization were employed to generate the final structure. In addition, for each 149 projection step, a value of the total restraint energy, according to the miscellaneous mean 150 field potential (MMFP) facility of CHARMM, was used as a criterion for discarding 151 unfavorable conformations. 152 3.3.4 Data Analysis 153 PCA over MD simulations 154 Principal component analysis (PCA) from MD simulations was performed using the 155 quasi-routine of the module VIBRAN of CHARMM to obtain the covariance matrix of 156 Cα atomic displacements from the trajectories and identify the most relevant structural 157 variations. The calculation was performed according to the description of Floquet et al. 158 (2015), and the resulting principal components (PCs) were used to compare the motions 159 observed in NMs. 160 Collectivity 161 The measurement of the involvement of atoms in a particular protein motion (referred to 162 as degree of collectivity) for a given NM was calculated according to Brüschweiler (1995) 163 and Tama and Sanejouand (2001), using an in-house CHARMM script. The degree of 164 collectivity is comprised between 0 and 1. Values close to 1 indicate maximum collectivity. 165 Geometric analysis of Impα 166 The wide motions from Impα were described in terms of geometrical measurements. For 167 bending characterized motions, three vertices (represented by Impα residues R117, A313 168 and K486) were manually selected: two vertices in distal tips and one vertex at the middle 169 point of the protein. The plane formed by these three vertices corresponded to the plane of 170 the observed bending motion (Fig 1). Then, the radius of curvature (R) was calculated: 171 R = d2 2 √ d2 −m2 (3.1) d is the distance from the distal vertices to the middle vertex, and m is half the distance 172 between the two distal vertices. 173 For twisting characterized motions, the angle between α-helices 3 (H3) from neighboring 174 ARM repeats were calculated using an available script for PyMol 175 3. Manuscrito I: Nucleoplasmina 29 (http://www.pymolwiki.org/index.php/AngleBetweenHelices). For each pair of helices, 176 vectors were defined along the Cα atoms, and the torsion angles between these vectors 177 were determined. 178 This geometric analysis was also applied to X-ray-solved structures of Impα complexed 179 with different types of bipartite NLSs. These structures were retrieved by the basic local 180 alignment search tool allocated at Prody software (BAKAN; MEIRELES; BAHAR, 2011) 181 using the Impα-NplNLS model as the query. Only structures with 100% of sequence 182 identity to Impα were selected to be compared to the simulation data. 183 Maps of cross-correlations 184 The initial comparison of residue-residue contacts from the MD and NM results was 185 initially performed by the generation of maps of cross-correlation to evaluate the 186 associated movements between NplNLS and Impα. The calculation of correlations 187 corresponded to the ensemble of the trajectories of all MD simulations into one final 188 pseudo-trajectory. Similarly, an ensemble of all structures from NM-displacements into 189 one pseudo-trajectory was conducted. The cross-correlation calculations were performed 190 in the Wordom software (SEEBER et al., 2007). 191 Interactions evaluation 192 The occurrence of specific interactions in the Impα-NplNLS interface was evaluated. The 193 determination of salt bridges and hydrogen bonds were performed using the VMD 194 software (HUMPHREY; DALKE; SCHULTEN, 1996). The criteria for considering the 195 occurrence of these interactions were the donor-acceptor distance for salt bridges and 196 hydrogen bonds 63.5 Å and the donor-hydrogen-acceptor angle deviation for hydrogen 197 bonds 660 degrees. The determination of hydrophobic contacts was based initially on the 198 LIGPLOT program, using the crystallographic and in silico model of Impα-NplNLS to 199 generate a list of possible interactions. Later, based on this list, the distances of the 200 closest carbon atom from the hydrophobic side chains of each residue-pair were calculated 201 for the MD and NM ensemble pseudo-trajectories to calculate the percentage of the 202 occurrence of each hydrophobic contact. The criterion adopted was distances 64 Å. 203 Complementary analysis 204 The backbone RMSD and the Cα root-mean-square fluctuations (RMSF) of MD and 205 NM-displacements were calculated in the Wordom software (SEEBER et al., 2007). The 206 structural alignment of NplNLS peptides was performed in the Crystallographic 207 Object-Oriented Toolkit (Coot) software (EMSLEY et al., 2010). All graphics from the 208 calculations above were performed using the R software (IHAKA; GENTLEMAN, 1996), 209 3. Manuscrito I: Nucleoplasmina 30 and the structural analysis, visualization and generation were performed in the Pymol 210 software (SCHRÖDINGER, 2010). 211 3.4 Results 212 3.4.1 Selection of Impα-NplNLS model 213 A stereochemical analysis allowed for the selection of the best complex model, taking into 214 account the geometry and maintenance of the main interactions in the peptide-protein 215 interface. The best-selected model showed a Ramachandran plot for Impα with 96.7% and 216 3.3% of the residues in favored and allowed regions, respectively, without residues in the 217 outlier region, whereas NplNLS had 100% of the residues in favored regions. 218 An overall evaluation of the Impα-NplNLS model indicated the NplNLS harboring in 219 both major and minor binding sites of Impα, with the main residues inside the binding 220 pockets (Fig 2). Furthermore, we were able to evidence the previously stated interactions 221 in the interface of this complex (MARFORI et al., 2012; FONTES; TEH; KOBE, 2000); 222 the only major exception was the absence of ImpαD325 in contact with NplK155 in the minor 223 site (S1 Fig). 224 3.4.2 Standard MD combined with NM-displacement method 225 The trajectories obtained from the three Impα-NplNLS MD simulations were clustered 226 into three reference structures (67,730 ps, 207,080 ps and 274,970 ps), each one exhibiting 227 a similar frame-frequency among simulations, indicating a likely convergence of the MDs 228 (S2 Fig). Reference structure 207,080ps was the most representative in the simulations 229 (approximately 80% of the trajectories clustered with this structure) and was more similar 230 to the X-ray-solved NplNLS structure based on the backbone RMSD values (S1 Table) 231 obtained from the structural alignment of the NLSs (S3 Fig). In this alignment, important 232 positions from major and minor sites were occupied by the expected residues, with similar 233 side chain conformations, represented by NplK167, NplK168 and NplK170 in positions P2, P3 234 and P5, respectively, and NplK155 and NplR156 in P1’ and P2’, respectively. Reference 235 structures 67,730 ps and 274,970 ps showed a greater structural variance of side chains in 236 the major site than in the minor site. 237 The subsequent step was the NM-displacement method applied to the three MD reference 238 structures. We observed flexible and favorable motions in the early NMs (modes 7-20) of 239 Impα-NplNLS, represented by lower values of restriction energy along the whole 240 displacement range, compared to the remaining modes (blue areas in the S4 Fig). Similar 241 patterns were also observed for Apo Impα; however, the favorable motions were extended 242 3. Manuscrito I: Nucleoplasmina 31 up to NM29 (S5 Fig), indicating that more conformations could be acquired favorably 243 compared to the Impα bound to NplNLS. 244 The distribution of backbone RMSD showed that standard MD in combination with the 245 NM-displacement method increased the conformational exploration, reaching values over 246 5 Å, particularly to reference structures 207,080 ps and 274,970 ps, whereas MD alone 247 reached only approximately 2.5 Å (S6 Fig). Moreover, greater Cα fluctuations were 248 observed in the NM-displacement technique, based on the RMSF values (S7 Fig and S8 249 Fig). The comparison between the RMSF values for Impα-NplNLS and Apo Impα showed 250 small differences, primarily limited to the region of the major site (within the range of 251 residue 100 to 200), where Apo Impα was more flexible (S9 Fig). 252 3.4.3 Collective motions of Impα 253 The description of the motions obtained from MD and NM calculations combined the 254 results from a qualitative vector-based analysis and a quantitative geometrical analysis. 255 The first NMs showed wide and collective types of motions of the ARM repeats of Impα, 256 primarily along modes 7-17 (S10 Fig). A qualitative analysis of vectors from modes 7 and 257 9 clearly showed a motion pattern, described by a bend and a twist (Fig 3 and S1 Movie 258 and S3 Movie), respectively. 259 The bend motions in NM7/PC1 were characterized by the opening and closing of Impα in 260 the concave surface, along the NLS binding pockets. The quantitative analysis using 261 geometric measurements determined the radius of curvature in opened and closed 262 configurations of Impα (Fig 4A, S12 Fig and S1 Movie) and showed different amplitudes 263 for the bending motion, in which Apo Impα had higher amplitudes than the 264 Impα-NplNLS complex. 265 The twist pattern for the Impα structures along NM9/PC3 (Fig 3, S12 Fig and S3 Movie) 266 oscillated from maximum and minimum values of torsion over the entire protein (Fig 4B). 267 A general observation of the angles for the Impα-NplNLS complex and Apo Impα showed 268 similarities among the structures. A more detailed analysis, considering the inter-repeat 269 angles (Fig 5), detected oscillation between the ARMs in the Impα-NplNLS complex, 270 primarily between ARMs 5-6. Movements with smaller amplitudes were observed for most 271 pairs of ARMs in Apo Impα, except the oscillation observed between ARMs 6-7. 272 Bending and twisting movements were also evaluated for the crystal structures of Impα in 273 the presence of different types of bipartite cNLSs. A comparison of geometries indicated a 274 small difference among them in the order of tenths of ångströms (S5 Table). The 275 qualitative vector analysis of Impα-NplNLS NM8/PC2 indicated a combination of two 276 motions, characterized by a “lateral” bending tendency (based on a 90º X-axis rotation of 277 Impα in relation to the bending orientation in NM7/PC1) in ARMs 1-6, mixed with a 278 3. Manuscrito I: Nucleoplasmina 32 twisting in ARMs 7-10 (Fig 3, S12 Fig and S2 Movie). The amplitude of the inter-repeat 279 angle variation was slightly distinct between NM8 and PC2; however, in both cases, we 280 could verify motions that almost reached both sites, including their intermediate region 281 (Fig 5). The complexity of these mixed motions increased in the following modes (S11 Fig 282 and S13 Fig). The same analysis for Apo Impα NM8 showed the reduction of movement 283 amplitude for most ARMs. 284 3.4.4 Main contacts in Impα-NplNLS interface 285 In general, the cross-correlations results from MD and NM analyses were similar (Fig 6). 286 Well-bounded areas of positive correlations could be identified, highlighting the NplNLS 287 range of residues in contact with Impα major and minor sites and the linker region. 288 However, for standard MD, the correlations were more scattered in the linker. 289 Subsequent analysis showed that most correlations could be related to specific 290 interactions in the interface of Impα-NplNLS. Compared to our simulation data, the 291 starting structure exhibited more contacts of hydrogen bonds and hydrophobic 292 interactions (S1 Fig). Throughout the simulations, some of those initial contacts were lost, 293 and MD and NM showed mostly common results (Fig 7). In particular, we observed salt 294 bridges (ImpαE396, ImpαD280, ImpαD192 and ImpαD270) that were established along the 295 NplNLS, specifically in P2’, P2 and the linker region. Hydrophobic contacts mediated by 296 tryptophans (ImpαW399, ImpαW357, ImpαW273, ImpαW231, ImpαW184 and ImpαW142) occurred 297 mostly in major and minor sites, specifically in P2’, P3 and P5. As expected, a great 298 number of hydrogen bonds were established along the NplNLS, such as ImpαS360, ImpαN361, 299 ImpαG323, ImpαV321, ImpαR315, ImpαY277, ImpαR238, ImpαA148, ImpαG150 and ImpαN188. 300 Most of the aforementioned interactions occurred with side chains of the charged residues 301 of the NplNLS. Moreover, we highlighted residues ImpαD192, ImpαE396, ImpαW184, ImpαW231, 302 ImpαW357, ImpαW399 and ImpαY277 that were observed in more than 90% of the analyzed 303 trajectory frames from MD and NM ensembles (S2 Table, S3 Table and S4 Table). 304 3.5 Discussion 305 3.5.1 Bending and twisting motions may be directly related to Impα function 306 Both bending and twisting motions are good candidates to adapt the Impα to the cNLS, 307 because their motion pattern promotes conformational changes in the NLS binding 308 pockets. The combination of these motion patterns appeared to be similarly recurring in 309 other modes, such as NM8. The vectors from other high-collectivity modes, such as 310 NM10-13 and NM17-18, also exhibited these two main motions but in smaller portions of 311 the protein, and they were also associated with undefined types of motions. This finding 312 3. Manuscrito I: Nucleoplasmina 33 was expected because modes with higher frequency are normally associated with localized 313 vibrations (KESKIN et al., 2002). Studies involving the protein Impβ – a solenoid protein 314 similar to Impα (KOBE; KAJAVA, 2000; FORWOOD et al., 2010; KAJAVA, 2001) – 315 suggested the importance of bending and twisting motions to generate the flexibility of 316 Impβ, allowing it to bind to different types of proteins (FORWOOD et al., 2010; LEE et 317 al., 2000). In our Impα computational analysis, the described movements could be equally 318 important to adapt to different sizes of cargo proteins and NLSs, enhancing the contacts 319 over the NLS binding site. Although no significant changes were observed by the 320 geometric analysis from the Impα crystallographic structures bound to different types of 321 bipartite NLSs, we must consider that the presence of only a peptide (approximately 20 322 amino acid length) submitted to similar crystallization conditions may not be sufficient to 323 induce large conformational changes in this system. Therefore, the role of Impα motions is 324 likely critical for the cNLS accommodation, considering the entire protein that contains it. 325 The MD/NM approaches complementarily showed the flexibility of Impα by the analysis 326 of NM7-9 and PC1-3. The bending observed in NM7/PC1 may be directly related to the 327 accessibility to the binding sites and the release of NLSs. This movement was observed in 328 both Impα-NplNLS and Apo Impα, with small differences in their amplitude, resembling 329 “open” and “close” movements. Bending motions have been reported in globular proteins 330 to identify movements of domains along with opened and closed states (BROOKS; 331 KARPLUS, 1985; ICHIYE; KARPLUS, 1991). The bending analysis for Impα was 332 possible because its small curvature (KOBE; KAJAVA, 2000) allowed for the 333 establishment of a plane comprising two distal (N- and C-termini) atoms and a central 334 atom. Solenoid Impβ required a more complex analysis, determining angles between 335 vectors projected onto a reference plane in each motif to evaluate the curvature changes 336 (FORWOOD et al., 2010). Specifically, the concerted bending motion of Impα NM7 could 337 operate as an opening-closing gateway, allowing the NLS entrance and adjusting the 338 amplitude of this motion in relation to the cargo protein size. The absence of a ligand 339 may imply in a wider curvature favoring the access to the inner concave surface of Impα, 340 allowing the IBB domain or cargo protein binding. Pumroy et al. (2015) compared the 341 flexibility of three human Impα isoforms (α-1, α-3 and α-7) considering their bound and 342 unbound states and applying MD simulations. Based on the protein end-to-end distance 343 measurements, the authors observed an increase in the flexibility of Apo Impα isoforms, 344 in accordance with our bending characterization that showed a higher radius of curvature 345 for Apo Impα. 346 According to Kobe and Kajava (2000), the “twist” takes into account the rotations of the 347 neighboring repeats relative to each other along the backbone direction, and ARM repeat 348 proteins have large twist movements, allowing the accommodation of extended and 349 flexible peptides. The twist observed in the NM9/PC3 of Impα-NplNLS showed 350 similarities to each other, confirming the observations of Hayward, Kitao and Berendsen 351 3. Manuscrito I: Nucleoplasmina 34 (1997), which compared NM/PC for lysozyme protein. Although Impα-NplNLS and Apo 352 Impα had similar average values (Fig 4B), the differences were found only in individual 353 analyses of the neighboring ARM repeats. The angle variation between ARM6 and ARM7 354 in NM9 of Apo Impα (Fig 5) is remarkable, and a similar result was already observed for 355 Apo Impα-3 (PUMROY et al., 2015). The lack of a ligand (IBB domain or NLS) 356 interacting with the Apo Impα concave portion may lead to a higher twist in the middle 357 of Apo Impα due to the lack of interactions that stabilize and provide binding specificity. 358 The increment of the angle amplitude in this region was contrasted by lower values for the 359 remaining protein regions compared to Impα-NplNLS. Therefore, Impα-NplNLS may 360 require conformational changes for the NLS adjustment to the binding pockets of the 361 Impα inner surface, which explains the greater variation in the ARM’s angles in contrast 362 to Apo Impα, which would not require such adjustments. This same explanation applies 363 to the amplitude variations also found in NM8 for the bound/unbound states of Impα. 364 Experiments and MD simulation with Neurospora crassa Impα emphasize the instability 365 of an N-terminally truncated Impα in the absence of an NplNLS peptide, indicating the 366 occupancy of the NLS binding sites as a requirement for Impα crystallization (TAKEDA 367 et al., 2013). In addition, Falces et al. (2010) also showed, performing circular dichroism 368 assays with Impα-1∆IBB from Xenopus laevis, the stabilization of the protein upon 369 association with Npl, thus reinforcing the aforementioned observations. The establishment 370 of polar contacts settles the NLS backbone, whereas hydrophobic and electrostatic 371 interactions with positively charged NLS residues allow the specificity (FONTES et al., 372 2003; KOBE, 1999). The greater structural flexibility of Apo Impα was only observed by 373 the geometrical analysis for the NM7 bending. Despite the non-association of the 374 NM8/NM9 twisting motions to a greater overall flexibility of Impα, the NLS absence 375 allowed for a greater motion amplitude of other NMs, such as modes 21-27, comparing the 376 restriction energies of Impα-NplNLS (S4 Fig) and Apo Impα (S5 Fig), indicating that the 377 higher flexibility of Apo Impα appeared to be distributed over these modes. In addition, 378 considering the Cα fluctuations of Impα in its bound and unbound states (S9 Fig), we 379 observed a higher flexibility in the major site of Apo Impα. This result was also observed 380 for Apo Impα-3 (PUMROY et al., 2015), and the authors predicted a weaker binding for 381 NLSs that relies primarily on the major site. However, this statement does not seem to 382 apply to our case because Impα-2 has been crystallized with different types of NLSs, 383 including those with preference to the major site; e.g., simian virus 40 (SV40) NLS 384 (FONTES; TEH; KOBE, 2000). Moreover, the binding stability of Impα-2 to some of 385 those NLSs was also observed with ligand binding assays (MARFORI et al., 2012; KIRBY 386 et al., 2015; BARROS et al., 2016). In summary, the NLS binding stabilizes Impα, 387 apparently restricting the motion range represented by some higher-frequency NMs; 388 however, it may also allow localized adjustments between the ARM repeats to improve 389 the overall affinity to the bound NLS. 390 3. Manuscrito I: Nucleoplasmina 35 3.5.2 The role of the linker residues in cNLS recognition 391 The cross-correlation calculations performed for both MD simulations (Fig 6A) and 392 displacement along NMs (Fig 6B) showed positive correlations from residues of both 393 major and minor sites of Impα. Further analysis showed that the majority of positive 394 correlations observed could be explained by the correspondence of salt bridges, hydrogen 395 bonds and hydrophobic interactions. We could determine that positions P2 and P5 from 396 the major site and P1’ and P2’ of the minor binding site have high levels of contact with 397 the NplNLS, based on the occupancy values of some interactions in those regions. These 398 data confirm the maintenance of the main contacts between Impα and NplNLS, showing 399 that our simulation results are in agreement with the experimental data (MARFORI et 400 al., 2012; FONTES; TEH; KOBE, 2000). 401 A classical bipartite NLS sequence interacts simultaneously with major and minor binding 402 sites of Impα and depends on a linker region containing a minimum of 10 residues 403 between P2’ and P2 positions to act as a cNLS (CHRISTIE et al., 2015; BARROS et al., 404 2012). The linker region in the NplNLS appears to play a key role in the process of cNLS 405 recognition. The cross-correlation maps clearly indicated an involvement of this region in 406 the promotion of contacts with Impα. A detailed analysis of the interactions showed the 407 occurrence of salt bridges, hydrogen bonds and hydrophobic contacts along the NplNLS. 408 The involvement of ImpαR238, ImpαR315, ImpαW273 and ImpαY277 was already observed in 409 Impα-NplNLS crystal structures (MARFORI et al., 2012; FONTES; TEH; KOBE, 2000). 410 However, the computational approach indicated other interactions previously reported in 411 single structures; ImpαK353 and ImpαN350 were reported with the NLS from FEN-1 412 (BARROS et al., 2012) and ImpαD280 from SV40 NLS bound to the Impα from a 413 filamentous fungus (BERNARDES et al., 2015). Moreover, new possible contacts were 414 detected; e.g., residues ImpαN446, ImpαR101 and ImpαR102 interacted in the N- and 415 C-terminal regions of the NLS. 416 Our simulation analyses suggest that the linker contacts are important to settle the cNLS, 417 and help to accommodate the cNLS side chains into the grooves of major and minor 418 binding sites. The linker region, in addition to the N- and C-terminals of an NLS, may 419 also compensate for interactions for the establishment of an activation pattern of the 420 other NLS region (KOSUGI et al., 2009), which indicates that the linker contacts may 421 occur in different NLSs and can be maintained after the docking of the bipartite NLS in 422 both major and minor binding sites, depending on the residues composing the linker, such 423 as proline and acidic amino acids (CHRISTIE et al., 2015; KOSUGI et al., 2009). In 424 summary, the recurrent presence of residues outside the major and minor sites strongly 425 reinforces their importance for the proper binding of the NLS, which corroborates other 426 studies involving other NLSs (FONTES et al., 2003; CONTI; KURIYAN, 2000; CHEN et 427 al., 2005; CUTRESS et al., 2008; GIESECKE; STEWART, 2010; YANG et al., 2010) and 428 3. Manuscrito I: Nucleoplasmina 36 further encourages us to understand in greater detail their roles and effects during the 429 nuclear import process. 430 3.5.3 NM analysis with classic MD simulations 431 MD simulations and NM analysis have been used for macromolecules as a complementary 432 analysis to the experimental data to describe their main motions and relate to a specific 433 function (ALEXANDROV et al., 2005; BROOKS; KARPLUS, 1985; ICHIYE; KARPLUS, 434 1991; SKJAERVEN; MARTINEZ; REUTER, 2011). The approach used for the 435 Impα-cNLS complex combines NMs robustness – to evaluate wider protein movements – 436 with the reference structures from the MD and indicates the benefits of this association in 437 protein-peptide analysis. 438 The data sampling from MD simulations, despite the time constraints and convergence 439 difficulties associated with this technique (GROSSFIELD; ZUCKERMAN, 2009; 440 GENHEDEN; RYDE, 2012), were balanced among the selected reference structures. 441 Considering the time of the simulations, there is a likely stability of the studied system. 442 Although we have not performed classical MD of Apo Impα, Takeda et al. (2013) reinforce 443 the instability of an N-terminally truncated Impα in the absence of an NLS peptide. One 444 way of computationally evaluating the N-terminally truncated Impα would be applying 445 long MDs to observe the effects of high protein flexibility. However, with a simple NM 446 calculation, we could observe in general, based on the restraint energy profile, RMSF 447 values and geometrical analysis, the higher flexibility of Impα in the absence of the NLS. 448 The MD data with the Impα-NplNLS complex not only supported the maintenance of the 449 protein-peptide complex but also showed some of the major movements and interactions 450 occurring at the complex interface. NMs promoted an analytical method to access the 451 dynamics of the system, allowing the possibility of the recognition of new interactions and 452 dribbling the convergence and conformational restrictions from standard MD. The main 453 motions described here in NMs were also obtained in the PC analysis, in agreement with 454 the comparison between the lowest NMs and the first PCs obtained from MD simulations 455 in the lysozyme model (HAYWARD; KITAO; BERENDSEN, 1997) and a subunit of the 456 GroEl chaperone (SKJAERVEN; MARTINEZ; REUTER, 2011). The highlighted 457 movements are of high occurrence in the protein’s lifetime and are likely to be 458 functionally important. Moreover, the low computational cost of NM is once more an 459 attractive feature for application in biological systems. 460 3.6 Conclusions 461 Computational approaches of MD and NM analysis were combined to evaluate the main 462 of motions of Impα and its interaction to a cNLS peptide. The bending motion may be 463 3. Manuscrito I: Nucleoplasmina 37 involved in the NLS entrance and the accommodation of cargo protein depending on its 464 size, whereas the twisting motions may be involved in the NLS recognition and 465 accommodation into the Impα binding sites. The combination of these movements could 466 allow local adjustments between the ARM repeats, which could improve the overall 467 affinity to the cNLS. The absence of an NLS was also evaluated and may imply in a wider 468 curvature of Apo Impα, allowing for the IBB domain or cargo protein binding. Moreover, 469 a higher twist in the middle of Apo Impα was detected possibly due to the lack of 470 interactions that stabilize and provide binding specificity, which could explain the 471 challenges in crystallizing N-terminally truncated Apo Impα. 472 The evaluation of salt bridges, hydrogen bonds and hydrophobic interactions corroborates 473 the fundamental interactions between Impα and NplNLS and gives additional support for 474 interactions outside the classical binding pockets that are important during this process. 475 The linker contacts in cNLS assist the adjustment of the peptide backbone, which helps 476 the interactions between cNLS side chains and residues from major and minor binding 477 grooves. In conclusion, MD simulations combined to NM analysis supported the 478 maintenance of the Impα-NplNLS complex exploring the conformational space and 479 accessing the dynamics of the system with a lower computational cost. This approach may 480 help to understand the affinities between Impα and cNLS peptides and non-classic NLSs. 481 3.7 Acknowledgments 482 This study was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo 483 (FAPESP) [grant numbers 2012/19447-2; 2014/21976-9] and Conselho Nacional de 484 Desenvolvimento Científico e Tecnológico (CNPq) [grant number 142110/2012-4]. We are 485 grateful to Prof. Dr. Cesar Martins for providing computational access for running part of 486 the simulations. 487 3. Manuscrito I: Nucleoplasmina 38 3.8 Figures 488 Fig 1 – Scheme of geometrical methods adopted for the description of the Impα motions. (A) Display of the selected vertices and the determined distances (2m and d) for the calculation of the radius of curvature. (B) The vectors generated to H3 for each ARM repeat. 3. Manuscrito I: Nucleoplasmina 39 Fig 2 – The starting structure of Impα-NplNLS for MD simulations. (A) The Impα as a cartoon diagram colored based on each ARM repeat as a rainbow spectrum from N-terminal (blue) to C-terminal (red) and the NplNLS as a cyan cartoon diagram positioned in an antiparallel configuration compared to Impα. (B) The surface representation of Impα with the NplNLS as a cyan stick diagram, indicating both major (blue) and minor (orange) binding sites. (C) The major site zoom indicating positions P2-P5 and (D) the minor site zoom in P1’ and P2’. In both sites, the positively charged side chains are positioned in the main pockets of the Impα binding core. 3. Manuscrito I: Nucleoplasmina 40 Fig 3 – Main motions observed from NM analysis of Impα-NplNLS. Impα (cartoon model) is shown as a rainbow spectrum from N-terminal (blue) to C-terminal (red), and NplNLS (cyan cartoon model) is positioned in an antiparallel configuration compared to Impα. The vector arrows for NM7-9 are shown with the correspondent description of the motion. NM7 is shown in a front view, whereas NM8 and NM9 are shown in an upper view (90º rotation in the X-axis). 3. Manuscrito I: Nucleoplasmina 41 Fig 4 – Geometric analysis of the bending and twisting motions of Impα. (A) The bending motion was quantitatively characterized by the radius of curvature along NM7 (solid line) and PC1 (dashed line) for Impα-NplNLS and along NM7 for Apo Impα (dotted line), whereas (B) the twisting motion was quantitatively characterized by the average values for the angles between helices along NM9 (solid line) and PC3 (dashed line) for Impα-NplNLS and along NM9 for Apo Impα (dotted line). 3. Manuscrito I: Nucleoplasmina 42 Fig 5 – Angle between helices of Impα. The angles between neighboring H3 pairs from the motions described as lateral-bending/twisting (along PC2 and NM8) and twisting (along PC3 and NM9) for Impα-NplNLS and Apo Impα. The ARM groups considered for each angle calculation are depicted with different color assignments. 3. Manuscrito I: Nucleoplasmina 43 Fig 6 – Heatmap of cross-correlations between Impα and NplNLS. (A) Trajectories from standard MD simulations (300 ns ensemble) and (B) NM-displacement (ensemble from references structures 67,730 ps, 207,080 ps and 274,970 ps) were used for the calculation of correlations. A color bar indicates the degree of correlation from anti-correlated (negative values) to correlated (positive values) residues. The X and Y axes, respectively, show the position of each ARM repeat in relation to the Impα sequence and the NplNLS residues in contact with the protein binding sites. 3. Manuscrito I: Nucleoplasmina 44 Fig 7 – Schematic representation of the interactions observed in the Impα-NplNLS interface. (A) The major (P2-P5) and minor (P1’-P2’) sites and the linker region are indicated in the sequence of the NplNLS. (B) The standard MD (300 ns ensemble) and (C) NM-displacement (ensemble from reference structures 67,730 ps, 207,080 ps and 274,970 ps) interaction scheme is shown with salt bridges (red) and hydrogen bonds (green) as dashed-lines, and hydrophobic contacts are shown as arcs with radiating spokes. The important tryptophan residues that mediate the hydrophobic contacts are depicted in the scheme as black sticks. The main chain of the NplNLS is represented as a gray horizontal line with its respective amino acid sequence, together with side chains shown as perpendicular lines. Only interactions that had an occupancy rate ≥50% of the analyzed trajectories are indicated in the scheme. 3. Manuscrito I: Nucleoplasmina 45 3.9 Supporting Information 489 Figures 490 S1 Fig: Interactions from the starting structure. Scheme of interactions of the starting structure for MD simulations. The representation is similar to the description of Fig 7. S2 Fig: Convergence analysis. Bar plot indicating the frequency of structures clustered within each reference (67,730 ps, 207,080 ps and 274,970 ps) from the MD simulations. 3. Manuscrito I: Nucleoplasmina 46 S3 Fig: Structural alignment of NplNLS. Structural representation of the NplNLSs from reference structures 67,730 ps (orange), 207,080 ps (green) and 274,970 ps (blue) aligned with the NplNLS from X-ray (red). (A) Stick diagram of the main chain, with major and minor sites indicated. (B-C) Stick diagram including side chains and the residues from each site. 3. Manuscrito I: Nucleoplasmina 47 S4 Fig: Heatmap of the total restraint energy values from Impα-NplNLS. The values are from the structures generated from the NM-displacement of the most representative reference structure (207,080 ps) according to the convergence analysis. The X-axis is the displacement range, represented as values of mass-weighted root mean square (MRMS), and the Y-axis is the NM numbers. Lower values of energy (blue tons) indicate favorable conformations. 3. Manuscrito I: Nucleoplasmina 48 S5 Fig: Heatmap of the total restraint energy values from Apo Impα. The values are from the structures generated from the NM-displacement. The X-axis is the displacement range, represented as values of MRMS, and the Y-axis is the NM numbers. Lower values of energy (blue tons) indicate favorable conformations. S6 Fig: Conformational exploration. Box-plot of the RMSD distribution from the trajectories of standard MD (MD-1, MD-2 and MD-3) and NM-displacement (67,730 ps, 207,080 ps and 274,970 ps). 3. Manuscrito I: Nucleoplasmina 49 S7 Fig: Residue fluctuations of Impα-NplNLS. Residue fluctuations based on Cα RMSF of Impα from the ensemble trajectories of standard MD (red) and NM-displacement (cyan). S8 Fig: Residue fluctuations of NplNLS. Residue fluctuations based on Cα RMSF of NplNLS from the ensemble trajectories of standard MD (red) and NM-displacement (cyan). 3. Manuscrito I: Nucleoplasmina 50 S9 Fig: Residue fluctuations of apo Impα∆IBB. Residue fluctuations based on Cα RMSF of Impα from the trajectories of the NM-displacement of Impα-NplNLS (from reference structure 207,080 ps; cyan) and Apo Impα∆IBB (black). S10 Fig: Collectivity from Impα-NplNLS. The collectivity values are plotted for each NM, and a smoothed line is fitted (blue line), representing the data tendency. The shaded area is the confidence interval around the smoothed line calculated with the ggplot package (http://ggplot2.org/) in R. 3. Manuscrito I: Nucleoplasmina 51 S11 Fig: Main motions observed in NMs10-13, NM17 and NM18. The vector arrows indicating the motions are shown. The Impα is displayed in a cartoon diagram, with each ARM colored from blue to red, corresponding to N to C-terminals. The NplNLS (cyan) is in a cartoon representation and is positioned in an antiparallel configuration compared to Impα. 3. Manuscrito I: Nucleoplasmina 52 S12 Fig: Main motions observed in PCs1-3 from standard MD. The vector arrows indicating the motions are shown. The Impα is displayed in a Cα representation, with each ARM colored from blue to red, corresponding to N to C-terminals. The NplNLS (cyan) is in Cα representation and is positioned in an antiparallel configuration compared to Impα. 3. Manuscrito I: Nucleoplasmina 53 S13 Fig: Angles between helices of Impα-NplNLS for NMs10-13, NM17 and NM18. The ARM groups considered for each angle calculation are depicted with different color assignments, similar to Fig 5. 3. Manuscrito I: Nucleoplasmina 54 Movies 491 S1 Movie: Movie of NM7. Impα is shown in a cartoon diagram, and it is colored 492 according to each ARM repeat, from the N-terminal (blue) to the C-terminal (red) ends 493 of the protein. The NplNLS is represented in a cyan cartoon diagram positioned in an 494 antiparallel configuration compared to Impα. To view this movie file, access the following 495 link: 497 S2 Movie: Movie of NM8. Impα is shown in a cartoon diagram, and it is colored 498 according to each ARM repeat, from the N-terminal (blue) to the C-terminal (red) ends 499 of the protein. The NplNLS is represented in a cyan cartoon diagram positioned in an 500 antiparallel configuration compared to Impα. To view this movie file, access the following 501 link: 503 S3 Movie: Movie of NM9. Impα is shown in a cartoon diagram, and it is colored 504 according to each ARM repeat, from the N-terminal (blue) to the C-terminal (red) ends 505 of the protein. The NplNLS is represented in a cyan cartoon diagram positioned in an 506 antiparallel configuration compared to Impα. To view this movie file, access the following 507 link: 509 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s001 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s001 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s001 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s002 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s002 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s002 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s003 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s003 http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0157162.s003 3. Manuscrito I: Nucleoplasmina 55 Tables 510 S1 Table: Backbone RMSD values from structural alignment. The reference structures are aligned with the Impα-NplNLS crystallographic structure (PDB ID 3UL1). Backbone RMSD 67,730 ps 207,080 ps 274,970 ps Impα 2.12439 1.15923 2.482 NplNLS 1.5432 0.759316 1.70461 3. Manuscrito I: Nucleoplasmina 56 S2 Table: Salt bridges occupancies. The occupancies of salt bridges between NplNLS and Impα in standard MD and NM-displacement. Interactions that were above 50% of occupancy are highlighted in gray. Salt Bridges – Standard MD Salt Bridges – NM displacement NLS Impα Occupancies (%) NLS Impα Occupancies (%) R156 E396 99.97 R156 D433 0.06 D172 R101 64.98 R156 E396 99.94 D172 K102 44.85 D160 R117 0.38 E153 R117 0.47 D160 H203 0.06 K155 D325 2.16 D172 R101 33.29 K161 D280 51.68 D172 R106 0.04 K162 E354 7.24 D172 R117 0.02 K167 D192 96.82 D172 R227 0.84 K168 D270 4.00 D172 H177 0.02 K168 E266 1.50 D172 K102 26.99 K169 E107 4.75 D172 K108 0.36 K170 E180 20.04 E153 R117 1.63 E153 K108 0.06 K155 D325 93.03 K161 D280 99.77 K161 E354 0.02 K162 E354 43.61 K167 D192 98.81 K167 E107 0.04 K168 D270 74.51 K168 E266 100.00 K169 D113 0.08 K169 D192 0.04 K169 E107 33.08 K169 E266 0.88 K169 E306 0.04 K170 E180 80.12 K170 E266 0.33 3. Manuscrito I: Nucleoplasmina 57 S3 Table: Hydrogen bonds occupancies. The occupancies of hydrogen bonds between NplNLS (blue) and Impα (green) in standard MD and NM-displacement. Interactions that were above 50% of occupancy are highlighted in gray. Hydrogen Bonds – Standard MD Hydrogen Bonds – NM displacement Donor Acceptor Occupancies (%) Donor Acceptor Occupancies (%) R156-Side E396-Side 99.94% K168-Side E266-Side 99.98% W357-Side R156-Main 99.61% R156-Side E396-Side 99.92% G164-Main Y277-Side 98.62% K161-Side D280-Side 99.81% R315-Side T160-Main 98.45% K167-Side D192-Side 98.81% R156-Main N361-Side 98.19% K167-Side G150-Main 98.37% K167-Side D192-Side 96.58% R238-Side Q165-Main 97.47% K168-Main N188-Side 96.12% K167-Side A148-Main 96.53% K155-Side V321-Main 93.05% G164-Main Y277-Side 96.46% W184-Side K168-Main 92.35% W357-Side R156-Main 94.67% W231-Side Q165-Main 92.13% R238-Side G164-Main 93.39% N188-Side K168-Main 85.79% K155-Side D325-Side 92.99% K167-Side G150-Main 84.58% R156-Side S360-Side 92.18% N361-Side R156-Main 84.33% K155-Side V321-Main 89.64% R238-Side G164-Main 83.91% R315-Side T160-Main 88.91% K167-Side A148-Main 77.50% K155-Side T328-Side 86.59% K155-Side G323-Main 73.12% W184-Side K168-Main 85.40% R101-Side D172-Side 65.59% N361-Side R156-Main 84.44% V154-Main N403-Side 64.45% W231-Side Q165-Main 83.74% K167-Main N188-Side 64.25% K170-Side E180-Side 80.15% N146-Side K170-Main 64.21% K162-Side N350-Side 75.31% K170-Main N146-Side 62.96% K168-Side D270-Side 74.27% K155-Side N361-Main 56.44% R156-Main N361-Side 72.24% K167-Side T155-Side 52.82% K155-Side G323-Main 72.07% N403-Side V154-Main 52.75% R101-Side D172-Side 66.44% K161-Side D280-Side 51.89% V154-Main N403-Side 64.96% R156-Side S360-Side 50.64% N188-Side K168-Main 63.24% W142-Side K170-Main 47.02% K353-Side T160-Side 58.60% K170-Side Q181-Side 46.69% K102-Side D172-Side 54.10% K102-Side D172-Side 45.34% G151-Main N446-Side 52.38% K155-Main N361-Side 44.13% N403-Side V154-Main 51.05% K167-Side N188-Main 43.02% W231-Side A166-Main 46.61% A153-Main N403-Side 38.06% K162-Side E354-Side 44.25% N235-Side A166-Main 37.63% N403-Side R156-Side 42.22% W231-Side A166-Main 32.86% K170-Side Q181-Side 42.20% R315-Side T160-Side 30.54% W142-Side K170-Main 41.57% K170-Side W142-Side 29.99% A158-Main W357-Side 41.53% G191-Main A166-Main 26.68% R156-Side N361-Side 39.87% S149-Side K168-Main 25.98% A153-Main N403-Side 39.35% A163-Side Y277-Side 25.12% Q165-Side D270-Side 37.95% K155-Side T322-Main 24.25% G151-Main S406-Main 37.89% K168-Side N228-Side 23.82% K168-Main N188-Side 37.41% K155-Side T328-Side 23.68% K167-Side N188-Side 37.22% W184-Side K169-Main 22.01% G151-Main D442-Side 36.80% K167-Side S149-Main 21.88% N146-Side K170-Main 36.49% 3. Manuscrito I: Nucleoplasmina 58 S4 Table: Hydrophobic contacts occupancies. The occupancies of hydrophobic contacts between NplNLS and Impα in standard MD and NM-displacement. Interactions that were above 50% of occupancy are highlighted in gray. Hydrophobic Contacts – Standard MD Hydrophobic Contacts – NM displacement NLS Impα Occupancies (%) NLS Impα Occupancies (%) S152 S406 2.41 S152 S406 0.00 A153 A364 50.61 A153 A364 41.89 A153 N403 33.91 A153 N403 47.00 K155 T322 7.59 K155 T322 10.84 K155 A364 81.50 K155 A364 94.64 R156 W357 1.26 R156 W357 4.02 R156 W399 99.18 R156 W399 98.83 P157 W357 0.00 P157 W357 0.04 A158 R315 83.98 A158 R315 61.75 A158 W357 79.05 A158 W357 95.65 K162 W273 67.96 K162 W273 83.45 K162 T311 0.00 K162 T311 9.50 K162 R315 54.59 K162 R315 14.73 K162 E354 1.43 K162 E354 29.15 A163 W273 0.00 A163 W273 0.31 A163 Y277