UNIVERSIDADE ESTADUAL PAULISTA - UNESP CÂMPUS DE JABOTICABAL DETERMINAÇÃO DO PERFIL TRANSCRIPTÔMICO ASSOCIADOS À EFICIÊNCIA ALIMENTAR DE BOVINOS NELORE (BOS TAURUS INDICUS) Marta Serna García 2022 DETERMINAÇÃO DO PERFIL TRANSCRICIONAL ASSOCIADO À EFICIÊNCIA ALIMENTAR DE BOVINOS NELORE (BOS TAURUS INDICUS) Discente: Marta Serna García Orientador: Profa. Dra. Lucia Galvão de Albuquerque Co-orientadores: Profa. Dra. Larissa Fernanda Simielli Fonseca, Prof. Dr. Mateus José Rodrigues Paranhos da Costa and Prof. Dr Julian Carretero Asuncion Tese apresentada à Faculdade de Ciências Agrárias e Veterinárias – UNESP, Câmpus de Jaboticabal, como parte das exigências para a obtenção do título de Doutora em Genética e Melhoramento Animal. 2022 DADOS CURRICULARES DA AUTORA Marta Serna García nasceu em 15 de fevereiro de 1991, em Valência (Espanha), filha de Ana Victoria García Tarín e Manuel Serna Bañuelos. Em setembro de 2016 finalizou a graduação em Engenharia Agroalimentar e Rural da Universitat Politécnica de València, UPV, Espanha. Durante a graduação de 2015-2016, recebeu uma bolsa da prefeitura de Valencia e da Universitat Politécnica de València, Espanha, tendo sido concedidas 45 bolsas, na modalidade. Foi orientada pelo Dr. Pascual Medina Besso na plataforma de Análise Multigênica (Affymetrix) da Unidade Central de Investigação da Faculdade de Medicina da Universidad de València, Espanha, continuando nesta plataforma até o ano de 2017. Em maio de 2016, iniciou o curso mestrado no Programa de Pós-Graduação em Produção Animal da Universitat Politècnica de València, Espanha. Neste período recebeu uma bolsa Européia de mestrado concedida no âmbito do “Erasmus Mundus Programme” (Projeto EuroInkaNet) e obteve o título de Mestra em julho de 2018, sob orientação da Profa. Dra. María Antonia Santacreu Jerez e da Profa. Dra. Lucia Galvão de Albuquerque. Em agosto de 2018 o iniciou o curso de doutorado no Programa de Pós- Graduação em Genética e Melhoramento Animal da Universidade Estadual "Júlio de Mesquita Filho" campus de Jaboticabal (FCAV- UNESP), sob orientação do Prof. Dr. Mateus José Rodrigues Paranhos da Costa. Em 2021 passou a ser orientada pela Profa. Dra. Lucia Galvão de Albuquerque. Foi bolsista da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), tendo como linha de pesquisa genética molecular e seleção genômica. Nesse período, colaborou com a Universidad de Valencia, Espanha (UV) sob orientação Prof. Dr. Jose Viña e realizou 219 horas de docência universitária durante este período. Desde setembro de 2021 é professora, em tempo integral, na Universidad Europea de València, Espanha. "Se debe aprender a navegar en océanos de incertidumbres a través de archipiélagos de certeza." (Edgar Morin, 1921) "Aos meus pais (Ana e Manuel), pelo apoio e amor. À minha irmã Eva, por compartilhar a ciência e ser a minha inspiração e referência" "A todos os cientistas e à ciência por sua luta, especialmente nestes tempos e ao ser humano mais criativo graças à incerteza" Dedico e Ofereço. AGRADECIMENTOS Agradeço à toda minha família e aos amigos por me apoiarem, em especial ao Jethro pela ajuda, paciência e muito amor. Agradeço ao Joaquim Panadero por ter sido meu guia, por fazer ilusionar e amar ainda mais a ciência, por fazer o difícil fácil, pela inumerável paciência e delicadeza, por ser uma pessoa maravilhosa e por descrever o brilhantismo e profissionalismo com humildade. Agradeço à Profa. Dra. Lucia Galvão de Albuquerque pela orientação, apoio e pela confiança. Obrigada por me receber, ser um exemplo de profissionalismo e dedicação. Agradeço ao Profa. Dr. Mateus Paranhos da Costa pela coorientação, pela confiança e atenção. Agradeço à Profa. Dra. Larissa Fonseca pela coorientação, pela ajuda e apoio profissionalmente e mostrar qualidade humana. Agradeço ao Profa. Dra. Julian Carretero Asunción pela coorientação, pela ajuda no processo. Agradeço ao Prof. Dr. Jesus Ferro, Dra Rosa Peiró pelas contribuições bem como à Dra. Maria Antonia Santacreu Jerez por me ensinar à amar o melhoramento animal. Agradeço ao Dr. José Viña pela oportunidade ao me receber em seu grupo de pesquisa na Espanha, pelas contribuições e ensino. Agradeço à banca examinadora (qualificação e defesa) pelas contribuições e apontamentos que foram de fundamental importância. A todos os amigos do Departamento de Zootecnia, especialmente aos colegas de sala e aos meus queridos amigos: Ana, Andres, Bruna, Danielle, Diogo, Flavio, Gabriela, Malane, Natalia, Patricia, Thaise. Obrigada não só pelo auxílio nas pesquisas, como também pela amizade, pelos momentos e apoio moral. A todos os professores, de Brasil e Espanha. Também aos técnicos administrativos e demais funcionários da Universidade. À Universidad Europea de Valencia, en especial Dra. Nicla Flacco pelo apoio e ajuda. À FCAV-UNESP e ao Programa de Pós-graduação em Genética e Melhoramento Animal pelo apoio logístico. O presente trabalho foi realizado com apoio da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Código de Financiamento 001 e da Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP nº dos processos #2009/16118-5, #2017/10630-2 e #2018/20026-8) I Summary CHAPTER 1 - General considerations.................................................................. 3 1. Background.................................................................................................... 3 2. Objectives...................................................................................................... 4 3. Literature Review........................................................................................... 5 4. List of abbreviations...................................................................................... 12 5. Reference...................................................................................................... 12 CHAPTER 2 - Transcriptome profiling of liver in Nelore cattle phenotypically divergent for RFI in two genetics groups. Abstract............................................................................................................ 22 Background...................................................................................................... 23 Methods............................................................................................................ 24 Results............................................................................................................. 28 Discussion........................................................................................................ 43 Conclusions...................................................................................................... 48 List of abbreviations......................................................................................... 49 Declarations..................................................................................................... 49 References………………………………………………………………………….. 50 Additional Files……………………………………………………………………… 58 CHAPTER 3 - Gene co-expression network analysis and prediction of hub genes in liver of Nelore Cattle phenotypically divergent for the RFI in two genetics groups. Abstract............................................................................................................ 67 Background...................................................................................................... 68 Methods............................................................................................................ 69 Results............................................................................................................. 74 Discussion........................................................................................................ 92 Conclusions...................................................................................................... 99 List of abbreviations......................................................................................... 99 Declarations..................................................................................................... 100 References………………………………………………………………………….. 100 Additional Files…………….………………………………………………………… 111 II CHAPTER 4 Alternatively spliced genes in hepatic tissue associated with RFI in two Nelore cattle genetics group. Abstract............................................................................................................ 115 Background...................................................................................................... 116 Methods............................................................................................................ 116 Results............................................................................................................. 120 Discussion........................................................................................................ 135 Conclusions...................................................................................................... 138 References………………………………………………………………………….. 138 Additional Files …………………………………………….……………………….. 145 Declarations ….…………………………………………….……………………….. 166 CAPÍTULO 5 - Considerações finais……………………………….........……..…167 1 DETERMINAÇÃO DO PERFIL TRANSCRICIONAL ASSOCIADO À EFICIÊNCIA ALIMENTAR DE BOVINOS NELORE (BOS TAURUS INDICUS) RESUMO – A população mundial deverá atingir 9,8 milhões pessoas em 2050. Essa tendência ameaça a sustentabilidade dos sistemas e preocupa quanto a capacidade de suprir necessidades alimentares do planeta. Devido à crescente demanda dos produtos pecuários, é necessário estar preparados para os impactos na produção, mas fazê-lo de maneira inclusiva e sustentável exigirá grandes transformações. O consumo alimentar residual (CAR) permite identificar e selecionar animais mais eficientes sem prejuízo do crescimento e reprodução, ou da qualidade da carne. Além disso, o aumento da eficiência alimentar diminui o impacto ambiental, uma vez que animais mais eficientes tendem a consumir menos e produzir mais quando comparados aos menos eficientes. A característica é de difícil mensuração e a busqueda pelo entendimento genético e biológico se tornou uma alternativa na investigação de animais superiores, e dentre as técnicas moleculares, o RNA-Seq, tem sido uma das mais utilizadas, pois permite o sequenciamento e a quantificação dos transcritos com uma grande resolução, detecção de eventos de splicing alternativo e ncRNAs (non-coding RNA), que podem ter diversas funções na arquitetura regulatória. Visando entender melhor os processos genéticos envolvidos na expressão do Consumo Alimentar Residual (CAR), foram analisados dois grupos genéticos (G1 e G2) de bovinos Nelore. No G1, animais (n=60) do mesmo grupo de contemporâneos foram submetidos a uma avaliação do CAR e dentre eles, 24 animais extremos para a característica foram abatidos. Amostras de tecido hepático foram coletadas e sequenciadas por meio da plataforma HiSeq 2500 System (Illumina). Para G2, foram utilizados dados públicos, provenientes do sequenciamento de 20 amostras de tecido hepático de animais divergentes para CAR, também do mesmo grupo de contemporâneos, obtidos por meio HiSeq 2000 System (Illumina). Os objetivos do estudo foram identificar os genes diferencialmente expressos, descrever eventos de splicing alternativo, traçar um perfil dos transcritos de ncRNA comparando duas populações de Nelore (Bos taurus indicus), classificados de forma divergente para CAR, em tecido hepático usando RNA-Seq. Foram encontrados 1.811 genes de expressão diferencial (DEGs) e 2.054 DEGs (p-valor 0,05) para G1 e G2, respectivamente. Quanto aos DEG comumente identicados nas duas populações, foram encontrados 88, dos quais 33 estavam envolvidos na resposta imune e no bloqueio do estresse oxidativo. Foram encontrados sete (B2M, ADSS, SNX2, TUBA4A, ARHGAP18, MECR, ABCF3) possíveis biomarcadores, considerando AUC > 0,70. O gene B2M foi mais expresso no grupo mais eficiente [CAR negativo] comparado com o grupo ineficiente [CAR positivo] atua regulando o turnover proteico, metabolismo lipídico e inibe a morte celular. Em ambos os grupos genéticos, o grupo CAR negativo apresentou genes com capacidade antioxidante, maior capacidade de proliferação celular e proteção contra a morte. Os genes BoLA-DRA, BoLA-DRB3 e CD74 foram preditos como hub, e estão envolvidos na melhoria da resposta imune. Os fatores de transcrição LRRFIP1 e NR4A2 foram detectados e estão envolvidos na promoção da angiogênese e diminuição da apoptose. A análise de redes de interação física mediante dados públicos (BioGRID) detectou 7 possíveis nodes; DDB2, ACTB, PAK1, VIM, PRC1, RPS6KA1 e PKM regulando positivamente proliferação celular, lipólise, organização do citoesqueleto e sobrevivência celular no grupo CAR negativo. No capítulo 4, com os mesmos animais do capítulo 2 (fenotipicamente divergentes para RFI), foram identificados eventos de splicing alternativos diferencialmente expressos (DAS) a partir da abordagem exon-centric. Os DAS encontrados foram transcritos para 67 e 75 genes (p-value ≤ 0,05) respectivamente para G1 e para G2. Identificamos 9 comuns genes DAS, no geral, desempenham um papel no metabolismo lipídico e do sistema imunológico, destacando SAT1 como gene hub. Os RNAs não codificantes significativos encontrados foram; MIR25 and SNORD16 para G1 RNase_MRP e SCARNA10 e para G2 em RLFI. Destacamos o MIR25 sendo capaz de atuar bloqueando a citotoxicidade e o estresse oxidativo e o RNase_MRP como bloqueador do dano mitocondrial. Os resultados relatados neste estudo indicam genes candidatos e mecanismos moleculares reguladores do CAR em dois grupos geneticos em bovinos Nelore, o que irá melhorar a compreensão da eficiência alimentar, auxiliar em futuras abordagens dos métodos quantitativos de melhoramento para aumentar da produtividade e a diminuição do impacto ambiental da pecuária através da identificação de animais com alta eficiência alimentar com rapidez e menor custo. Palavras-Chave: bovino, CAR, genes, Hub gene, ncRNA, nodes, splicing alternativo 2 DETERMINATION OF THE TRANSCRIPTOME PROFILE ASSOCIATED WITH THE FEED EFFICIENCY OF NELORE CATTLE (BOS TAURUS INDICUS) Abstract - The world population is expected to reach 9.8 million by 2050. This trend threatens the sustainability of the systems in place and concerns the ability to meet the requirements of our planet. Due to the growing demand for livestock products, it is necessary to be prepared for the impact on production but doing so in an inclusive and sustainable way will require major transformations. Residual Feed Intake (RFI) indicates how to identify and select more efficient animals without detriment to growth and reproduction or compromising meat quality. In addition, increased feed efficiency reduces environmental impact, as more efficient animals tend to consume less and produce more compared to less efficient ones. The search for genetic and biological understanding of this type of characteristics has become an alternative in the investigation of more efficient animals and, among molecular techniques, RNA-seq is one of the most used. It allows sequencing and quantifying transcripts with high resolution, detection of alternative splicing events, and ncRNAs (non-coding RNAs), which may have multiple roles in regulatory architecture. To gain a better understanding of the genetic processes involved in RFI expression, two genetic groups (G1 and G2) of Nelore cattle were analyzed. In G1, animals (n = 60) from the same contemporaneous group underwent RFI evaluation. Twenty-four extreme animals were selected for this population, low RFI (N = 12) and high RFI (N = 12); they were subsequently slaughtered. Liver tissue was collected and sequenced using the HiSeq 2500 System (Illumina). Public data was used for G2, which includes RNA-seq data from 20 samples of liver tissue from animals divergent for RFI. These animals belong to the same contemporary group and the data were obtained by the HiSeq 2000 system (Illumina). The study aimed to identify differentially expressed genes, alternative splicing events and the ncRNA transcript profile by comparing two populations of Nelore cattle (Bos taurus indicus) classified divergently for RFI using RNA-seq data. We found 1,811 differentially expressed genes (DEGs) and 2,054 DEGs (p-value 0.05) in low RFI (LRFI) group compared with high RFI (HRFI) group for G1 and G2, respectively. Analyses found 88 common genes between the genetic groups, and 33 of these were involved in immune response and blocking of oxidative stress. Seven possible markers (B2M, ADSS, SNX2, TUBA4A, ARHGAP18, MECR, ABCF3) were found, considering AUC > 0.70. The B2M gene was overexpressed in the efficient group (LRFI) when compared with the inefficient group (HRFI); this gene acts by regulating protein turnover, lipid metabolism and inhibition of cell death. For both genetic groups, the LRFI group includes genes that could have an antioxidant capacity, increased cell proliferation capacity and protection against death. The BoLA-DRA, BoLA-DRB3 and CD74 have been predicted to be hub genes and are involved in enhancing the immune response. Transcription factors LRRFIP1 and NR4A2 are involved in the promotion of angiogenesis and decreasing apoptosis. Physical interaction network analysis using public data (BioGRID) detected seven possible nodes: DDB2, ACTB, PAK1, VIM, PRC1, RPS6KA1 and PKM. In chapter 4, with the same animals selected in chapter 2 (phenotypically divergent for RFI), differentially expressed alternative splicing events (DAS) were identified from the exon-centric approach. The DAS found were transcribed for 67 and 75 genes, respectively for G1 and for G2. We identified 9 common DAS genes, in general, play a role in lipid metabolism and the immune system, highlighting SAT1 as the hub gene. The significant non-coding RNAs found were MIR25 and SNORD16 for G1 RNase_MRP and SCARNA10 and for G2 in RLFI group. We highlight MIR25 being able to act by blocking cytotoxicity and oxidative stress and RNase_MRP as a blocker of mitochondrial damage.These genes were positively regulated for cell proliferation, lipolysis, cytoskeletal organization and cell survival in the efficient group. The results reported in this study indicate candidate genes and regulatory molecular mechanisms for RFI in the two genetic groups of Nelore cattle, which will improve the understanding of feed efficiency, help in future approaches of quantitative breeding methods to increase productivity and reduce the environmental impact of livestock through rapid identification of animals with high feed efficiency and low cost. Keywords: bovine, genes, Hub gene, ncRNA, nodes, RFI, splicing alternative 3 CHAPTER 1 – General Considerations 1. Introduction The world's population is constantly growing and will be over 11 billion by the end of the century. Moreover, between 720-811 million people faced hunger in 2020 (FAO, 2019). Improving food production systems and promoting the sustainable use of natural resources are critical to achieve Sustainable Development Goal 2 (SDG2): a world without hunger, food insecurity and malnutrition and SDG 13, "Climate Action" (UN, 2020). Humanity faces the challenge of minimizing global climate change, mainly caused by greenhouse gas (GHG) emissions, which are currently the highest in history (PORTER et al., 2014). There is a global trend of a rise in demand for livestock products (WRIGHT et al., 2012; GERBER et al., 2013; ROJAS-DOWNING et al., 2017) which are fundamental to global food security (ROSEGRANT et al., 2009). In the last 50 years, GHG emissions have almost doubled and will continue increasing (EDENHOFER et al., 2014). In fact, 57% of global GHG emissions are from the production of animal-based food, 25% being bovine meat production (FAO, 2019). South and Southeast Asian and South America were the largest emitters of production-based GHG emissions (XU et al., 2021). Cattle emit most of the enteric methane (77%) of global emissions (POORE; NEMECEK, 2018). Sustainability of food systems covering the needs of the global population will be an enormous challenge that we have to face (HOLDEN et al., 2018). For all the above, we need to maintain a balance between productivity, food security and environmental preservation (WRIGHT et al., 2012). One of the alternatives is the search for more efficient animals. Individual feed efficiency of cattle that induces differences in muscle growth and development (MEHRBAN et al., 2021) is controlled by multiple genes and/or isoforms (polygenic). A common proposal to increase this efficiency is selection through residual food intake (RFI) (BRANCO et al., 2009; GRION et al., 2014; CEACERO et al., 2016). Selection for RFI does not affect animal growth, reproduction or the meat quality but leads to animals with lower consumption and lower maintenance requirements (LANNA; DE ALMEIDA, 2003; KOCH et al., 1963; BASARAB et al., 2003; CARBERRY et al., 2012). As a result, meat production costs and environmental impact are reduced, since there is a decrease in pasture area for livestock production and in enteric methane emission (HEGARTY et al., 2007). Despite the importance of RFI for animal production in zebu cattle, studies with 4 Nelore, especially using RNA-seq (ribonucleic acid sequencing), are limited because this trait is difficult and costly to measure and is influenced by different biological processes. The RNA-seq tool is powerful, robust and adaptable for measuring genome-wide transcription (CORCHETE et al., 2020). However, there is no consensus on the most appropriate methodology to guarantee the validity of the results in terms of robustness, accuracy and reproducibility, so this field is in continuous development (HRDLICKOVA; TOLOUE; TIAN, 2017; GÓMEZ- GALLEGO et al., 2018; CORCHETE et al., 2020; GERACI; SAHA; BIANCHINI, 2020). There are several studies using RNA-seq for RFI in Nelore cattle, such as in muscle tissue (FIDELIS et al., 2017; MENEZES et al., 2020), blood (NASCIMENTO et al., 2015) and liver tissue (TIZIOTO et al., 2015; AL-HUSSEINI et al., 2016; CARVALHO et al., 2019; MUKIIBI et al., 2019, 2020). To our knowledge, this is the first study carried out to integrate RNA-seq data from different Nelore populations and it may allow us to provide a broader and more precise vision of the action mechanisms involved in the RFI trait. The RNA-seq technique also provides the possibility of studying alternative splicing events, non- coding RNAs and co-expression network analysis that can facilitate the detection of important biological pathways and the identification of central genes (hub or key players) (LANGFELDER; MISCHEL; HORVATH, 2013). Results from this work could help future approaches in quantitative methods of animal husbandry and management and allow the use of possible markers for selecting more efficient animals. 2. Objectives 2.1. General objective To study liver tissue transcriptome in two genetic groups of Nelore cattle (Bos taurus indicus) classified by their RFI records, using the RNA-seq technique. In addition, to identify biological and cellular processes underlying the RFI trait assuming it may help to highlight possible biomarkers to be used in future studies and further explorations into the genetic architecture of this trait. 2.2. Specific objective • To identify differentially expressed genes in liver tissue of Nelore cattle common to two genetic groups associated with the RFI trait and to understand the biology of the liver better. • To conduct gene co-expression network analysis, prediction of hub genes and analysis of the nodes of physical interactions with a public database 5 (BioGRID) of Nelore cattle RFI trait in two genetic groups. • To identify spliced genes differentially expressed in liver tissue of the Nelore breed (Bos taurus indicus) for the RFI trait in two genetic groups using data obtained by RNA-seq. • To study non-coding RNA expressed in liver tissue of animals from both Nelore genetic groups divergent for the RFI trait using data obtained by RNA-seq. 3. Background 3.1. Residual Feed Intake (RFI) The residual feed intake (RFI) methodology aims to assess feed efficiency. It has been used as a selection criterion for cattle, focusing on increasing the efficiency of individual feeding (BRANCO et al., 2009; GRION et al., 2014; CEACERO et al., 2016) without detriment to reproduction or deterioration of meat quality. RFI is the difference between observed consumption and estimated consumption, through adjustments for average metabolic weight (BMR) and weight gain rate (AWG, kg/day) (KOCH et al., 1963). This is a difficult trait to measure compared to other determinations (MARTIN et al., 2021). It is calculated individually over an average period of 2.5 to 3 months in which animals remain in individual barns or automated collective bays, by keeping a daily record of the amount of feed ingested and rejected, and the average daily weight gain of each animal. This trait allows identification and selection of more efficient animals (negative RFI) when compared with less efficient ones (positive RFI), since they ingest less feed than estimated for the same weight gain, thereby making it possible to select animals with lower consumption and lower maintenance requirements (KOCH et al., 1963; BASARAB et al., 2003; CARBERRY et al., 2012). Animals with low RFI (more efficient) emit about 28% less methane into the atmosphere (NKRUMAH et al., 2006). The most efficient animals produced less manure (N, P and K) than the most inefficient animals (BASARAB et al., 2003; HEGARTY et al., 2007; HAYES; LEWIN; GODDARD, 2013). There is an individual variation in nutrient utilization efficiency among animals of the same breed, sex and age ingesting the same type of feed, which indicates genetic variations in nutrient utilization efficiency (RICHARDSON; HERD, 2004). In addition, RFI is a trait with moderate heritability (from 0,19 to 0,57) (DEL CLARO; MERCADANTE; VASCONCELOS SILVA, 2012; BERRY; CROWLEY, 2013; 6 SANTANA et al., 2014). For this reason, the traits related to feed efficiency are increasingly studied and gradually implemented in genetic improvement programs. 3.2. RNA coding, alternative splicing and RNA non-coding Gene expression involves transcription, translation and the turnover of mRNAs and proteins (BUCCITELLI; SELBACH, 2020). The central dogma of molecular biology (CRICK, 1970) indicates that most eukaryotic genes (DNA fragments of the genome) are copied in the form of messenger RNA precursors (pre-mRNAs). Pre-mRNAs contain coding sequences (exons) separated by non-coding sequences (introns), which are removed by means of the process of splicing exons (Splicing) to give rise to exons connected to each other forming mature messenger RNAs (mRNAs). Finally, the mRNA is translated into protein through ribosomes). It is known that cells contain substantial amounts of intronic RNA that originates from unprocessed (primary) gene transcripts called non-coding RNAs. It seems that gene expression is mainly regulated by transcription factors (proteins that can bind specific DNA sequences to control transcription) (LAMBERT et al., 2018) at the transcriptional level and miRNAs (small regulatory non-coding RNAs) at the post-transcriptional level (MITSIS et al., 2020). Splicing is regulated by a complex machinery called spliceosome that consists of five small nuclear ribonucleoproteins (snRNP) (U1, U2, U4 / U6, U5) (STALEY; GUTHRIE, 1998; WILL; LÜHRMANN, 2011) and more than 200 auxiliary proteins that recognize and assemble exons through consensus sequences that mark the exon-intron junctions, recognized by the spliceosome (MATLIN; MOORE, 2007; WILL; LÜHRMANN, 2011). Splicing was discovered in 1970, and soon afterward it was observed that there are alternative splicing patterns within a single pre-mRNA molecule that can produce different functional mRNAs (BERGET; MOORE; SHARP, 1977), which means that different proteins can be obtained from one gene, through this process known as alternative Splicing (KALSOTRA; COOPER, 2011). However, our knowledge of RNA splicing is quite recent (ROY; GILBERT, 2006). In humans, up to 95% of genes are alternately spliced (JIANG; CHEN, 2021), creating diversity and allowing a gene to generate multiple protein isoforms with different functions. It can express itself in a specific way in a tissue, at a specific stage of development, therefore coding important domains (CASTLE et al., 2008; TAPIAL et al., 2017). Alternative splicing genes regulate the localization of proteins, their enzymatic properties and their interaction with ligands. In most cases, changes caused by individual splicing isoforms are small. Due to its widespread usage and molecular 7 versatility, alternative splicing emerges as a central element in gene regulation that interferes with almost every biological function analyzed (KELEMEN et al., 2013), such as cell development and differentiation (JIANG; CHEN, 2021). In addition, it is known that about 15%-30% of mutations associated with hereditary diseases have been related to failures in the regulation of alternative splicing (PARK et al., 2018; ELLINGFORD et al., 2019). Alternative splicing explains that cattle have fewer genes (approximately 23,000) compared to the number of proteins (AEBERSOLD et al., 2018; LAU et al., 2019). The different isoforms of a gene show dramatic differences in their biological properties and functions (NILSEN; GRAVELEY, 2010; KALSOTRA; COOPER, 2011). However, our knowledge about function of isoforms and identification of the most functionally important protein isoforms/variants in cattle is still limited (RODRIGUEZ et al., 2012; LI et al., 2017). There is still a profound lack of knowledge about the functions of biologically important isoforms, especially in the Nelore breed. Regulation of alternative splicing Constitutive splicing is the process in which mRNA is spliced identically, producing the same isoforms, while alternative splicing generates different isoforms by using different sets of exons (NAKKA et al., 2018). Figure 1 shows the five main subtypes of alternative splicing. Figure 1. Constitutive splice and the five main types of alternative splicing (image adapted from JIANG; CHEN, 2021). 8 Following the representation in Figure 1: 1) Constitutive splicing; 2) Exon omission (cassette exons) is the most common in mammalian cells, resulting in the complete omission of one or more exons (SUGNET et al., 2004); 3) Mutually exclusive exons are uncommon, in which two or more splicing events are no longer independent. They run or deactivate in a coordinated manner (SAMMETH, 2009); 4) Alternative 5′ junction sites (alternative donors): the use of an alternative 5′ donor site, which changes the 3′ boundary of the upstream exon (JIANG; CHEN, 2021); 5) Alternative 3' junction sites (alternative acceptors): compared to alternative 5' junction sites, the use of an alternative 3' junction site causes the change of the 5' boundary of the exon downstream (JIANG; CHEN, 2021); 6) Intron retention, one or more introns remain unspliced in the mRNA (VANICHKINA et al., 2018). It is the rarest in mammals (SAMMETH; FOISSAC; GUIGÓ, 2008; NAKKA et al., 2018) and is often misinterpreted as a splicing error (JIANG; CHEN, 2021). We can use different approaches to define exon junctions from RNA-seq data: isoform-based or exon-based (CHEN, 2013; JIANG; CHEN, 2021) and whether the genome is used as a reference or based on de novo assemblies (MARTIN; WANG, 2011). Isoform-based analysis approaches quantify full-length transcription isoforms and then estimate the relationship between the isoforms they included and the isoforms excluding an alternative exon of interest by calculating the splicing level. Exon-based approaches quantify individual exon inclusion ratios. It is more appropriate if the target is not a whole isoform. The computer programs that can analyze alternative splices usually consist of three main steps: detection, statistical comparison and prediction of effects (HALPERIN et al., 2021). Currently, determination of isoform structures and quantification is still hampered by technological limitations (JIANG; CHEN, 2021) and the discovery of new transcripts using RNA-seq short reads is still a challenge (CONESA et al., 2016). 9 Non-coding RNA (ncRNAs) Only 2% of genes encode proteins in the human genome (DJEBALI et al., 2012; LI et al., 2014; HON et al., 2017) the rest of the RNAs that do not translate into amino acids are functional ncRNAs acting as important mediators of gene regulation in biological processes (RÖNNAU et al., 2014). Some ncRNAs can encode small peptides (length less than 100 aa) regulating biological processes, although it is uncommon (HUBÉ; FRANCASTEL, 2018) (Figure 3.A). In addition, the complexity of an organism is highly associated with the abundance of ncRNA genes (RUBIN et al., 2000; MATTICK, 2001; KAPUSTA; FESCHOTTE, 2014). . Figure 3. A. Structure of long coding and non-coding RNAs. ① Coding RNAs refer to the mRNA encoding the protein. ② Some coding mRNAs can function without translating into proteins by forming a secondary RNA structure derived primarily from untranslated regions (UTR). ③ Non-coding RNAs act as cell regulators without coding proteins. ④ Some lncRNAs (long non-coding RNAs) can bind to ribosomes and encode peptides to modulate cellular activities. B. Structure of coding and non-coding RNAs. Figure adapted from (RÖNNAU et al., 2014; LI; LIU, 2019; BHAT et al., 2020). All these ncRNA classes (Figure 3.B) can perform numerous cellular functions, including structural, catalytic and regulatory functions. They are studied intensively because they direct various signaling pathways (RÖNNAU et al., 2014). For example, lncRNAs modulate enzymatic activities (WANG; CHANG, 2011), cell differentiation, growth (HOBUSS; BÄR; THUM, 2019), adipose function (SUN; LIN, 2019) and play an important role in immunology (CHEN; SATPATHY; CHANG, 2017). In addition, ncRNAs, including microRNAs and small interfering RNAs, can act as regulators in alternative splicing processes (LUCO; MISTELI, 2011). In Figure 3B, the ncRNAs can be subdivided according to their size and function. A total of 2,578 microRNAs were identified in humans (BOLTON et al., 2014) and the number continues to rise. They regulate more than 50% of all 10 protein-coding genes in mammalian cells (SCHWARZENBACH et al., 2014). Genes can be attacked by multiple miRNAs and each miRNA is able to target hundreds of mRNAs directly or indirectly, which are involved in many physiological processes (CROCE, 2009). miRNA studies for RFI trait are still rare in Nelore cattle (DE OLIVEIRA et al., 2018; CARVALHO et al., 2019). Another type of ncRNA is the snoRNA (small nucleolar RNAs). They are known to guide the modifications and processing of nucleotides (CECH; STEITZ, 2014; SLOAN et al., 2016). There are 400 species of snoRNA in the human genome (MARTENS-UZUNOVA; OLVEDY; JENSTER, 2013) and some miRNAs originate from snoRNAs (RÖNNAU et al., 2014). SnoRNAs are post-transcriptional (SUN; HAO; PRASANTH, 2018) or transcriptional regulators interfering with the maturation and modification of other ncRNAs (CECH; STEITZ, 2014; BAZIN et al., 2017). Some studies have identified lncRNA in various tissues of cattle (HUANG; LONG; KHATIB, 2012; QU; ADELSON, 2012; WEIKARD; HADLICH; KUEHN, 2013; BILLEREY et al., 2014; GAO et al., 2019; WANG; KOGANTI; YAO, 2020). The detection changes from 304 to 23,735 lncRNA depending on the tissue (KOSINSKA-SELBI; MIELCZAREK; SZYDA, 2020) and also the approach. The number of functional lncRNAs has not yet been accurately determined, however they have been increasingly described as having important cellular functions (STATELLO et al., 2020). They are also mediators of interactions with proteins (and/or other molecules), although the nature and dynamics of such interactions have yet to be elucidated (KIRK et al., 2018). Due to the fundamental role of ncRNAs in the regulation of gene expression, and consequently its possible impact on phenotypes of interest for animal production, it is an important topic of study. In cattle, the number of studies are slowly increasing, but the identification of ncRNAs is limited and even more so in the Nelore breed. 3.3. RNA-seq workflow. The study of RNA-seq data generally involves trimming, alignment, counting and normalization of sequencing data and differential expression analysis (CORCHETE et al., 2020). Trimming is used to increase the rate of mapped reads by removing the adaptor sequences and removing poor-quality nucleotides with a wisely chosen reading length, to avoid unpredictable changes in gene expression (WILLIAMS et al., 2016) and transcriptome assembly (MACMANES, 2014). The beneficial effects of this process were evaluated for assembly based on genomic 11 references (DEL FABBRO et al., 2013; CHEN et al., 2014) and a de novo assembly approach (MBANDI et al., 2014). Recently, methods have been reported without alignment with reference genome, pre-index reference transcripts, divide sequence readings into k-mers, and then quick matches as "pseudo" alignments have been performed (JIANG; CHEN, 2021). Once the reads have been mapped, they must be assigned to a gene or transcript (called a count or quantification). This is followed by a normalization procedure to eliminate potential sequencing bias. There are several methods and studies comparing methodologies with various algorithms used for identifying differently expressed genes with different results (MOULOS; HATZIS, 2015; SEYEDNASROLLAH; LAIHO; ELO, 2015; CORCHETE et al., 2020; GERACI; SAHA; BIANCHINI, 2020). The biggest challenge in RNA-seq analysis is combining the different steps sequentially into a complete workflow and users must choose from many possible approaches, parameters and methodological options. There is no consensus on the most appropriate methodology that should be used for the analysis of RNA-seq (CORCHETE et al., 2020). 3.4. Transcriptome analyses in cattle through RNA-seq The RNA-seq is a tool of no more than a decade, currently considered the most powerful, robust and adaptable technique for analyzing genes that are expressed in each tissue or cell type depending on the conditions in which they are found (STARK; GRZELAK; HADFIELD, 2019; CORCHETE et al., 2020; GERACI; SAHA; BIANCHINI, 2020). RNA-seq is a more accurate and informative method compared to other techniques for transcriptome analysis, such as microarrays (XUAN et al., 2013; FINOTELLO; DI CAMILLO, 2015; HAN et al., 2015; SERNA et al., 2016). It makes it possible to study novel transcripts with a higher resolution, a better detection range and lower technical variability (PERKINS et al., 2014; SU et al., 2014; Y et al., 2020). It has been well described that RNA-seq also offers a high degree of agreement with validations using qRT-PCR (quantitative polymerase chain reaction; real time PCR) (SU et al., 2014). Several studies have used RNA-seq to investigate gene expression in Nelore cattle which has made it possible to identify DEGs in muscle tissue associated with tenderness (FONSECA et al., 2017), fatty acid composition (OLIVIERI et al., 2021), loin area and subcutaneous fat (SILVA-VIGNATO et al., 2017), intramuscular fat (CESAR et al., 2015) and residual feed intake (RFI) (TIZIOTO et al., 2016) as well 12 as in the liver (TIZIOTO et al., 2015) and in multiple tissues (CHEN et al., 2021; DU et al., 2021). Another application of RNA-seq is the detection of alternative splicing events. In cattle, these events have been associated with mastitis (WANG et al., 2016) embryonic development (HUANG; KHATIB, 2010) carcass traits (HE; LIU, 2013) and meat quality (SILVA et al., 2020; MALANE et al., 2020; FANG et al., 2021; SUN et al., 2021). This technology is also capable of detecting ncRNA associated with RFI in Nelore cattle (ALEXANDRE et al., 2020). For Nelore cattle, knowledge about the molecular mechanisms regulated by alternative splicing events and ncRNA, especially in feed efficiency traits, is still limited. 4. List of abbreviations Food and Agriculture Organization of the United Nations (FAO), Sustainable Development Goal 2 (SDG2), Sustainable Development Goals (SDGs), United Nations (UN), greenhouse gas (GHG), Residual Food Intake (RFI), RNA sequencing (RNA-seq), Intergovernmental Panel on Climate Change (IPCC), Sustainable Development Goal 13 (SDG 13), World Health Organization (WHO), Kilograms (Kg), Ribonucleic acid (RNA), Nitrogen (N), Phosphorus (P), Potassium (K), Average metabolic weight (BMR),Rate of weight gain (AWG, kg/day), Expected progeny reference (DEP), Quantitative polymerase chain reaction real time (qRT-PCR), Differential expression gene (DEGs), Gene Set Enrichment Analysis (GSEA), Messenger RNA (mRNA), ARN noncoding (NCRNAs), Long ARN noncoding (lncRNA), MicroRNAs (miRNAs), Piwi-interacting RNAs (piRNAs), Ribosomal RNAs (rRNAs), Small Cajal-specific body RNAs (scaRNAs), Small-interfering RNAs (siRNAs), Small nuclear RNAs (snRNAs). Small nucleolar RNAs (snoRNAs), Transfer RNAs (tRNAs). 5. References ABIEC. Perfil da pecuária no Brasil. Beef REPORT, p. 49, 2019. Disponível em:. AEBERSOLD, R. et al. How many human proteoforms are there? Nature Chemical Biology, v. 14, n. 3, p. 206–214, fev. 2018. AL-HUSSEINI, W. et al. Characterization and Profiling of Liver microRNAs by RNA-sequencing in Cattle Divergently Selected for Residual Feed Intake. Asian- Australasian journal of animal sciences, v. 29, n. 10, p. 1371–1382, out. 2016. ALEXANDRE, P. A. et al. Exploring the Regulatory Potential of Long Non-Coding RNA in Feed Efficiency of Indicine Cattle. Genes, v. 11, n. 9, p. 1–22, set. 2020. BASARAB, J. A. et al. Residual feed intake and body composition in young growing cattle. Canadian Journal of Animal Science, v. 83, n. 2, p. 189–204, jun. 2003. BHAT, A.A. et al. Role of non-coding RNA networks in leukemia progression, 13 metastasis and drug resistance. Mol Cancer, v. 19, n. 57. Mar. 2020. BAZIN, J. et al. Global analysis of ribosome-associated noncoding RNAs unveils new modes of translational regulation. Proceedings of the National Academy of Sciences of the United States of America, v. 114, n. 46, p. E10018–E10027, nov. 2017. BERGET, S. M.; MOORE, C.; SHARP, P. A. Spliced segments at the 5′ terminus of adenovirus 2 late mRNA. Proceedings of the National Academy of Sciences, v. 74, n. 8, p. 3171–3175, ago. 1977. BERRY, D. P.; CROWLEY, J. J. CELL BIOLOGY SYMPOSIUM: Genetics of feed efficiency in dairy and beef cattle. Journal of Animal Science, v. 91, n. 4, p. 1594– 1613, abr. 2013. BILLEREY, C. et al. Identification of large intergenic non-coding RNAs in bovine muscle using next-generation transcriptomic sequencing. BMC Genomics, v. 15, n. 1, p. 499, jun. 2014. BOLTON, E. M. et al. Noncoding RNAs in prostate cancer: the long and the short of it. Clinical cancer research : an official journal of the American Association for Cancer Research, v. 20, n. 1, p. 35–43, jan. 2014. BRANCO, R. et al. Consumo alimentar residual de machos nelore selecionados para peso pós-desmame. 46° Reunião Anual da Sociedade Brasileira de Zootecnia, p. 3, jul. 2009. BUCCITELLI, C., SELBACH, M. mRNAs, proteins and the emerging principles of gene expression control. Nature Reviews Genetics, v. 21, n. 10, p. 630-644, oct. 2020. CARBERRY, C. A. et al. Effect of phenotypic residual feed intake and dietary forage content on the rumen microbial community of beef cattle. Applied and Environmental Microbiology, v. 78, n. 14, p. 4949–4958, jul. 2012. CARVALHO, E. B. et al. Differentially expressed mRNAs, proteins and miRNAs associated to energy metabolism in skeletal muscle of beef cattle identified for low and high residual feed intake. BMC genomics, v. 20, n. 1, jun. 2019. CASTLE, J. C. et al. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nature Genetics, v. 40, n. 12, p. 1416–1425, nov. 2008. CEACERO, T. M. et al. Phenotypic and Genetic Correlations of Feed Efficiency Traits with Growth and Carcass Traits in Nelore Cattle Selected for Postweaning Weight. PLOS ONE, v. 11, n. 8, p. 66, ago. 2016. CECH, T. R.; STEITZ, J. A. The noncoding RNA revolution-trashing old rules to forge new ones. Cell, v. 157, n. 1, p. 77–94, mar. 2014. CESAR, A. S. M. et al. Putative regulatory factors associated with intramuscular fat content. PloS one, v. 10, n. 6, jun. 2015. CHEN, C. et al. Software for pre-processing Illumina next-generation sequencing short read sequences. Source Code for Biology and Medicine, v. 9, n. 1, p. 8– 8, maio 2014. CHEN, L. Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing. Statistics in biosciences, v. 5, n. 1, p. 138– 155, maio 2013. CHEN, W. et al. Identification of Predictor Genes for Feed Efficiency in Beef Cattle 14 by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data. Frontiers in genetics, v. 12, fev. 2021. CHEN, Y. G.; SATPATHY, A. T.; CHANG, H. Y. Gene regulation in the immune system by long noncoding RNAs. Nature immunology, v. 18, n. 9, p. 962–972, ago. 2017. CONESA, A. et al. A survey of best practices for RNA-seq data analysis.Genome Biology, v 17, n. 13, jan. 2016. CORCHETE, L. A. et al. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Scientific Reports , v. 10, n. 1, p. 1–15, nov. 2020. CRICK, F. Central Dogma of Molecular Biology. Nature , v. 227, n. 5258, p. 561– 563, nov.1970. CROCE, C. M. Causes and consequences of microRNA dysregulation in cancer. Nature reviews. Genetics, v. 10, n. 10, p. 704–714, out. 2009. DE OLIVEIRA, P. S. N. et al. An integrative transcriptome analysis indicates regulatory mRNA-miRNA networks for residual feed intake in Nelore cattle. Scientific reports, v. 8, n. 1, dez. 2018. DEL CLARO, A. C.; MERCADANTE, M. E. Z.; VASCONCELOS SILVA, J. A. Meta- análise de parâmetros genéticos relacionados ao consumo alimentar residual e a suas características componentes em bovinos. Pesquisa Agropecuaria Brasileira, v. 47, n. 2, p. 302–310, fev. 2012. DEL FABBRO, C. et al. An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis. PLoS ONE, v. 8, n. 12, dez. 2013. DJEBALI, S. et al. Landscape of transcription in human cells. Nature, v. 489, n. 7414, p. 101–108, set. 2012. DOBIN, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England), v. 29, n. 1, p. 15–21, jan. 2013. DU, L. et al. Transcriptome profiling analysis of muscle tissue reveals potential candidate genes affecting water holding capacity in Chinese Simmental beef cattle. Scientific Reports , v. 11, n. 1, p. 1–14, jun. 2021. EDENHOFER, O. et al. Climate Change 2014 Mitigation of Climate Change Working Group III Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change Edited by. 2014. Disponível em: . ELLINGFORD, J. M. et al. Functional and in-silico interrogation of rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders. bioRxiv, sep. 2019. FANG, S. et al. NONCODEV5: a comprehensive annotation database for long non- coding RNAs. Nucleic acids research, v. 46, n. D1, p. D308–D314, jan. 2018. FANG, X. et al. Comparative Genome-Wide Alternative Splicing Analysis of Longissimus Dorsi Muscles Between Japanese Black (Wagyu) and Chinese Red Steppes Cattle. Frontiers in veterinary science, v. 8, abr. 2021. FAO. How to Feed the World in 2050. Disponível em: . FERRELL, C. L.; JENKINS, T. G. Cow Type and the Nutritional Environment: 15 Nutritional Aspects. Journal of Animal Science, v. 61, n. 3, p. 725–741, set. 1985. FIDELIS, H. A. et al. Residual feed intake, carcass traits and meat quality in Nelore cattle. Meat science, v. 128, p. 34–39, jun. 2017. FINOTELLO, F.; DI CAMILLO, B. Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis. Briefings in Functional Genomics, v. 14, n. 2, p. 130–142, mar. 2015. FONSECA, L. F. S. et al. Differences in global gene expression in muscle tissue of Nelore cattle with divergent meat tenderness. BMC Genomics, v. 18, n. 1, p. 945, dez. 2017. FONTOURA, A. B. P. et al. Associations between feed efficiency, sexual maturity and fertility-related measures in young beef bulls. animal, v. 10, n. 1, p. 96–105, jul. 2016. GAO, D. et al. A survey of statistical software for analysing RNA-seq data. Human Genomics, v. 5, n. 1, p. 56–60, out. 2010. GAO, Y. et al. Analysis of Long Non-Coding RNA and mRNA Expression Profiling in Immature and Mature Bovine (Bos taurus) Testes. Frontiers in Genetics, v. 10, n. JUL, p. 646, jul. 2019. GERACI, F.; SAHA, I.; BIANCHINI, M. Editorial: RNA-Seq Analysis: Methods, Applications and Challenges. Frontiers in Genetics, v. 11, p. 220, mar. 2020. GÓMEZ-GALLEGO, C. et al. Human Breast Milk NMR Metabolomic Profile across Specific Geographical Locations and Its Association with the Milk Microbiota. Nutrients, v. 10, n. 10, p. 1355, 21 set. 2018. GRION, A. L. et al. Selection for feed efficiency traits and correlated genetic responses in feed intake and weight gain of Nelore cattle. Journal of Animal Science, v. 92, n. 3, p. 955–965, jul. 2014. HALPERIN, R. F. et al. Improved methods for RNAseq-based alternative splicing analysis. Scientific Reports, v. 11, n. 1, p. 1–15, maio 2021. HAN, Y. et al. Advanced Applications of RNA Sequencing and Challenges. Bioinformatics and Biology Insights, v. 9, n. Suppl 1, p. 29–46, nov. 2015. HAYES, B. J.; LEWIN, H. A.; GODDARD, M. E. The future of livestock breeding: Genomic selection for efficiency, reduced emissions intensity, and adaptation.Trends in Genetics,v. 29, n 4, p 206-214, abr. 2013. HE, H.; LIU, X. Characterization of Transcriptional Complexity during Longissimus Muscle Development in Bovines Using High-Throughput Sequencing. PLoS ONE, v. 8, n. 6, jun. 2013. HEGARTY, R. S. et al. Cattle selected for lower residual feed intake have reduced daily methane production. Journal of Animal Science, v. 85, n. 6, p. 1479–1486, jun. 2007. HOBUSS, L.; BÄR, C.; THUM, T. Long non-coding RNAs: At the heart of cardiac dysfunction? Frontiers in Physiology, v. 29, n. 10, p. 30, jan, 2019. HOLDEN, N. M. et al. Review of the sustainability of food systems and transition using the Internet of Food. npj Science of Food 2018 2:1, v. 2, n. 1, p. 1–7, 9 out. 2018. HON, C. C. et al. An atlas of human long non-coding RNAs with accurate 5’ ends. Nature, v. 543, n. 7644, p. 199–204, 9 mar. 2017. 16 HRDLICKOVA, R.; TOLOUE, M.; TIAN, B. RNA-Seq methods for transcriptome analysis. WIREs RNA.v. 8,n. 1, jan. 2017. HUANG, W.; KHATIB, H. Comparison of transcriptomic landscapes of bovine embryos using RNA-Seq. BMC Genomics, v. 11, n. 1, p. 1–10, dez. 2010. HUANG, W.; LONG, N.; KHATIB, H. Genome-wide identification and initial characterization of bovine long non-coding RNAs from EST data. Animal Genetics, v. 43, n. 6, p. 674–682, dez. 2012. HUBÉ, F.; FRANCASTEL, C. Coding and non-coding RNAs, the frontier has never been so blurred. Frontiers in Genetics, v. 9, n. APR, abr. 2018. JIANG, W.; CHEN, L. Alternative splicing: Human disease and quantitative analysis from high-throughput sequencing. Computational and Structural Biotechnology Journal, v. 19, p. 183, jan. 2021. KALSOTRA, A.; COOPER, T. A. Functional consequences of developmentally regulated alternative splicing. Nature Reviews. Genetics, v. 12, n. 10, p. 715, out. 2011. KAPUSTA, A.; FESCHOTTE, C. Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. Trends in Genetics : TIG, v. 30, n. 10, p. 439–452, set. 2014. KELEMEN, O. Function of alternative splicing. Gene, v. 514, n.1, p. 1-30, fev. 2013. KIM, D. et al. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, v. 14, n. 4, abr. 2013. KIRK, J. M. et al. Functional classification of long non-coding RNAs by k-mer content. Nature genetics, v. 50, n. 10, p. 1474–1482, out. 2018. KOCH, R. M. et al. Efficiency of Feed Use in Beef Cattle. Journal of Animal Science, v. 22, n. 2, p. 486–494, maio 1963. KOSINSKA-SELBI, B.; MIELCZAREK, M.; SZYDA, J. Review: Long non-coding RNA in livestock. Animal, v. 14, n. 10, p. 2003–2013, jan. 2020. LAMBERT S.A. et al. The Human Transcription Factors. Cell Reports. v. 4, n. 75, p. 598-599, ago. 2018. LANGFELDER, P.; MISCHEL, P. S.; HORVATH, S. When Is Hub Gene Selection Better than Standard Meta-Analysis? PLoS ONE, v. 8, n. 4, abr. 2013. LANNA, D. P.; DE ALMEIDA, R. Residual feed intake: um novo critério de seleção?. V Simpósio da Sociedade Brasileira de Melhoramento Animal .jul, 2003. LAU, E. et al. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Reports, v. 29, n. 11, p. 3751- 3765, dez. 2019. LI, G. et al. Long noncoding RNA plays a key role in metastasis and prognosis of hepatocellular carcinoma. BioMed Research International, v. 2014, jun. 2014. LI, H. et al. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks. Methods in Molecular Biology, v. 1558, p. 415–436, jul. 2017a. LI, J.; LIU, C. Coding or noncoding, the converging concepts of RNAs. Frontiers in Genetics, v. 10, n. MAY, p. 496, apr. 2019. 17 LUCO, R. F.; MISTELI, T. More than a splicing code: integrating the role of RNA, chromatin and non-coding RNA in alternative splicing regulation. Current opinion in genetics & development, v. 21, n. 4, p. 366, ago. 2011. MACMANES, M. D. On the optimal trimming of high-throughput mRNA sequence data. Frontiers in genetics, v. 5, n. JAN, jan. 2014. MALANE M.M., et al. Structural variants affecting mRNAs isoforms splice sites associated with marbling in Nellore cattle. Journal of Animal Science, v. 98, n. 4, p. 24–25, Nov. 2020. MARTENS-UZUNOVA, E. S.; OLVEDY, M.; JENSTER, G. Beyond microRNA - Novel RNAs derived from small non-coding RNA and their implication in cancer. Cancer Letters, v. 340, n. 2, p. 201–211, nov. 2013. MARTIN, J. A.; WANG, Z. Next-generation transcriptome assembly. Nature Reviews Genetics 2011 12:10, v. 12, n. 10, p. 671–682, set. 2011. MARTIN, P. Climate change, complexity, agriculture and challenged governance. In: Research Handbook on Climate Change and Agricultural Law.p. 74–102. fev. 2013. MARTIN, P. et al. Invited review: Disentangling residual feed intake—Insights and approaches to make it more fit for purpose in the modern context. Journal of Dairy Science, v. 104, n. 6, p. 6329–6342, jun. 2021. MATLIN, A. J.; CLARK, F.; SMITH, C. W. J. Understanding alternative splicing: towards a cellular code. Nature Reviews Molecular Cell Biology 2005 6:5, v. 6, n. 5, p. 386–398, maio 2005. MATLIN, A. J.; MOORE, M. J. Spliceosome assembly and composition. Advances in experimental medicine and biology, v. 623, p. 14–35, jun. 2007. MATTICK, J. S. Non-coding RNAs: the architects of eukaryotic complexity. EMBO reports, v. 2, n. 11, p. 986–991, jul. 2001. MBANDI, S. K. et al. A glance at quality score: Implication for de novo transcriptome reconstruction of Illumina reads. Frontiers in Genetics, v. 5, n. FEB, p. 17, feb. 2014. MEHRBAN, H. et al. Genetic parameters and correlations of related feed efficiency, growth, and carcass traits in Hanwoo beef cattle. Animal Bioscience, v. 34, n. 5, p. 824, maio. 2021. MENEZES, A. C. B. et al. Feeding behavior, water intake, and energy and protein requirements of young Nelore bulls with different residual feed intakes. Journal of animal science, v. 98, n. 9, set. 2020. MITSIS et al. Transcription factors and evolution: An integral part of gene expression (Review). World Academy of Sciences Journal, v. 2, p. 3-8, jan. 2020. MOULOS, P.; HATZIS, P. Systematic integration of RNA-Seq statistical algorithms for accurate detection of differential gene expression patterns. Nucleic acids research, v. 43, n. 4, fev. 2015. MUKIIBI, R. et al. Liver transcriptome profiling of beef steers with divergent growth rate, feed intake, or metabolic body weight phenotypes1. Journal of animal science, v. 97, n. 11, p. 4386–4404, nov. 2019. MUKIIBI, R. et al. Bovine hepatic miRNAome profiling and differential miRNA expression analyses between beef steers with divergent feed efficiency 18 phenotypes. Scientific reports, v. 10, n. 1, dez. 2020. NAKKA, K. et al. Diversification of the muscle proteome through alternative splicing. Skeletal Muscle, v. 8, n. 1, p. 8, mar. 2018. NAKKA, K. et al. Diversification of the muscle proteome through alternative splicing. Skeletal Muscle, v. 8, n. 1, p. 8, mar. 2018. NASCIMENTO, C. F. et al. Residual feed intake and blood variables in young Nelore cattle. Journal of animal science, v. 93, n. 3, p. 1318–1326, mar. 2015 NILSEN, T. W.; GRAVELEY, B. R. Expansion of the eukaryotic proteome by alternative splicing. Nature, v. 463, n. 7280, p. 457–463, jan. 2010. NKRUMAH, J. D. et al. Relationships of feedlot feed efficiency, performance, and feeding behavior with metabolic rate, methane production, and energy partitioning in beef cattle. J Anim Sci, v. 84, n. 1, p. 145–153, jan. 2006. OLIVIERI, B. F. et al. Differentially expressed genes identified through RNA-seq with extreme values of principal components for beef fatty acid in Nelore cattle. Journal of animal breeding and genetics, v. 138, n. 1, p. 80–90, jan. 2021. PARK, E. et al. The Expanding Landscape of Alternative Splicing Variation in Human Populations. The American Journal of Human Genetics, v. 102, n. 1, p. 11–26, jan. 2018. PERKINS, J. R. et al. A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat. Molecular Pain, v. 10, n. 1, p. 7, jan. 2014. POORE, J.; NEMECEK, T. Reducing food’s environmental impacts through producers and consumers. Science, v. 360, n. 6392, p. 987–992, jul. 2018a. PORTER, J. R. et al. Food security and food production systems. Climate Change 2014 Impacts, Adaptation and Vulnerability: Part A: Global and Sectoral Aspects, p. 485–534, jul. 2014. QU, Z.; ADELSON, D. L. Bovine ncRNAs are abundant, primarily intergenic, conserved and associated with regulatory genes. PLoS ONE, v. 7, n. 8, ago. 2012. RICHARDSON, E. C.; HERD, R. M. Biological basis for variation in residual feed intake in beef cattle. 2. Synthesis of results following divergent selection. Australian Journal of Experimental agriculture., v. 44, n. 4–5, p. 431–440, jan. 2004. RODRIGUEZ, J. M. et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Research, v. 41, n. Database issue, p. D110-7, nov. 2012. ROJAS-DOWNING, M. M. et al. Climate change and livestock: Impacts, adaptation, and mitigation. Climate Risk Management, v. 16, p. 145–163, apr. 2017. RÖNNAU, C. G. H. et al. Noncoding RNAs as Novel Biomarkers in Prostate Cancer. BioMed Research International, v. 2014, jun. 2014. ROSEGRANT, M. W. et al. Looking into the future for agriculture and AKST. Agriculture at a Crossroads, global report. Washington, DC, USA: Island Press. pp.307-376. jun. 2009. ROY, B.; M. HAUPT, L.; R. GRIFFITHS, L. Review: Alternative Splicing (AS) of Genes As An Approach for Generating Protein Complexity. Current Genomics, v. 14, n. 3, p. 182–194, maio 2013. 19 ROY, S. W.; GILBERT, W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nature reviews. Genetics, v. 7, n. 3, p. 211–221, mar. 2006. RUBIN, G. M. et al. Comparative genomics of the eukaryotes. Science (New York, N.Y.), v. 287, n. 5461, p. 2204–2215, mar. 2000. SACKS, D. et al. Multisociety Consensus Quality Improvement Revised Consensus Statement for Endovascular Therapy of Acute Ischemic Stroke. International Journal of Stroke, v. 13, n. 6, p. 612–632, ago. 2018. SAMMETH, M. Complete alternative splicing events are bubbles in splicing graphs. Journal of Computational Biology, v. 16, n. 8, p. 1117–1140, ago. 2009. SAMMETH, M.; FOISSAC, S.; GUIGÓ, R. A General Definition and Nomenclature for Al-ternative Splicing Events. PLoS Computational Biology, v. 4, n. 8, nov. 2008. SANTANA, M. H. A. et al. Genome-wide association study for feedlot average daily gain in Nelore cattle ( Bos indicus ). Journal of Animal Breeding and Genetics, v. 131, n. 3, p. 210–216, jun. 2014. SCHWARZENBACH, H. et al. Clinical relevance of circulating cell-free microRNAs in cancer. Nature Reviews Clinical Oncology, v. 11, n. 3, p. 145–156, mar. 2014. SILVA, D.B.S., et al. Spliced genes in muscle from Nelore Cattle and their association with carcass and meat quality. Scientific Reports, v. 10, n. 14701, sep. 2020. SERNA, M. et al. Ovarian Transcriptomic Analysis Reveals Differential Expression Genes Associated with Cell Death Process after Selection for Ovulation Rate in Rabbits. Animals, v. 10, n. 10, p.1924. oct 2020. SEYEDNASROLLAH, F.; LAIHO, A.; ELO, L. L. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in Bioinformatics, v. 16, n. 1, p. 59–70, jan. 2015. SILVA-VIGNATO, B. et al. Comparative muscle transcriptome associated with carcass traits of Nelore cattle. BMC Genomics, v. 18, n. 1, p. 506, dez. 2017. SILVA DE OLIVEIRA, J. et al. Fisiologia, manejo e alimentação de bezerros de corte. arq. Ciênc. Vet. Zool. Unipar. v. 10, n. 1, 2007. SLOAN, K. E. et al. Tuning the ribosome: The influence of rRNA modification on eukaryotic ribosome biogenesis and function. RNA Biology, v. 14, n. 9, p. 1138– 1152, dez. 2016. STALEY, J. P.; GUTHRIE, C. Mechanical Devices of the Spliceosome: Motors, Clocks, Springs, and Things. Cell, v. 92, n. 3, p. 315–326, fev. 1998. STARK, R.; GRZELAK, M.; HADFIELD, J. RNA sequencing: the teenage years. Nature Reviews Genetics 2019 20:11, v. 20, n. 11, p. 631–656, jul. 2019. STATELLO, L. et al. Gene regulation by long non-coding RNAs and its biological functions. Nature Reviews Molecular Cell Biology 2020 22:2, v. 22, n. 2, p. 96– 118, dez. 2020. SU, Z. et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature Biotechnology 2014 32:9, v. 32, n. 9, p. 903–914, ago. 2014. SUGNET, C. W. et al. Transcriptome and genome conservation of alternative splicing events in humans and mice. Pacific Symposium on Biocomputing. 20 Pacific Symposium on Biocomputing, p. 66–77, jul. 2004. SUN, H. Z. et al. Gene co-expression and alternative splicing analysis of key metabolic tissues to unravel the regulatory signatures of fatty acid composition in cattle. RNA biology, v. 18, n. 6, p. 854–862, nov. 2021. SUN, L.; LIN, J. D. Function and Mechanism of Long Noncoding RNAs in Adipocyte Biology. Diabetes, v. 68, n. 5, p. 887–896, feb. 2019. SUN, Q.; HAO, Q.; PRASANTH, K. V. Nuclear Long Noncoding RNAs: Key Regulators of Gene Expression. Trends in Genetics : TIG, v. 34, n. 2, p. 142– 157, fev. 2018. TAPIAL, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome research, v. 27, n. 10, p. 1759–1768, out. 2017. TIZIOTO, P. C. et al. Global liver gene expression differences in Nelore steers with divergent residual feed intake phenotypes. BMC Genomics, v. 16, n. 1, p. 1–14, jul. 2015. TIZIOTO, P. C. et al. Gene expression differences in Longissimus muscle of Nelore steers genetically divergent for residual feed intake. Scientific Reports, v. 6, n 9. Dec. 2016. TRAPNELL, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, v. 28, n. 5, p. 511–515, maio. 2010a. VANICHKINA, D. P. et al. Title: Challenges in defining the role of intron retention in normal biology and disease. Semin Cell Dev Biol. n. 75, p. 0-49. jul. 2017. WANG, J.; KOGANTI, P. P.; YAO, J. Systematic identification of long intergenic non-coding RNAs expressed in bovine oocytes. Reproductive Biology and Endocrinology, v. 18, n. 1, p. 1–9, fev. 2020. WANG, K. C.; CHANG, H. Y. Molecular mechanisms of long noncoding RNAs. Molecular cell, v. 43, n. 6, p. 904–914, set. 2011. WANG, X. G. et al. Deciphering Transcriptome and Complex Alternative Splicing Transcripts in Mammary Gland Tissues from Cows Naturally Infected with Staphylococcus aureus Mastitis. PLoS ONE, v. 11, n. 7, p. e0159719, jan. 2016. WEIKARD, R.; HADLICH, F.; KUEHN, C. Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing. BMC Genomics, v. 14, n. 1, nov. 2013. WILL, C. L.; LÜHRMANN, R. Spliceosome structure and function. Cold Spring Harbor perspectives in biology, v. 3, n. 7, p. 1–2, jul. 2011. WILLIAMS, C. R. et al. Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC bioinformatics, v. 17, n. 1, fev. 2016. UN. World 2019 Revision of World Population Prospects. United Nations. Disponível em: . UN. Goal 2: Zero Hunger - United Nations Sustainable Development. Disponível em: . WRIGHT, I. A. et al. Integrating crops and livestock in subtropical agricultural systems. Journal of the Science of Food and Agriculture, mar. 2012. . XU, X. et al. Global greenhouse gas emissions from animal-based foods are twice those of plant-based foods. Nature Food 2021 2:9, v. 2, n. 9, p. 724–732, set. 2021. 21 XUAN, J. et al. Next-generation sequencing in the clinic: promises and challenges. Cancer letters, v. 340, n. 2, p. 284–295, nov. 2013. Y, W. et al. Analysis of the physical meat quality in partridge (Alectoris chukar) and its relationship with intramuscular fat. Poultry science, v. 99, n. 2, p. 1225–1231, fev. 2020. YANG, C. et al. The impact of RNA-seq aligners on gene expression estimation. Conference on Bioinformatics, Computational Biology and Biomedicine, v. 2015, p. 462–471, set. 2015. 22 CHAPTER 2 - Transcriptome profiling of liver in Nelore Cattle phenotypically divergent for RFI in two genetic groups ABSTRACT – The identification and selection of genetically superior animals for Residual Feed Intake (RFI) resulting in improving food efficiency could enhance productivity and simultaneously minimize environmental impact. The aim of this study was to use RNA-seq data to identify differentially expressed genes (DEGs), specific biomarkers and enriched biological processes associated with RFI of liver in Nelore cattle in two genetic groups. The animals used in this study belong to two genetic groups. In the first group (G1), the animals (n=24) were from a stabilized Nelore herd submitted to RFI evaluation: 12 low RFI (LRFI) and 12 high RFI (HRFI). The samples from their liver tissues were frozen at -80 °C for subsequent extraction of total RNA. The libraries (TruSeq RNA Library) were constructed according to the manufacturer's protocol and sequenced using HiSeq 2500 System (Illumina). For the second group (G2), 20 samples of liver tissue of Nelore cattle divergent for RFI were selected. The RNA-seq was performed using a HiSeq 2000 System (Illumina®). The raw data of G2 were chosen from the ENA repository (EMBL-EBI). The same pipeline for RNA-seq for both genetic groups was performed: the low quality was trimmed using Trimmomatic software; HitSat2 software was used to map paired end reads; the scrip own was used to assemble the aligned reads for each sample and to estimate the number of counts per million (CPM). EdgeR software normalized reads and analysis differences were used. GSEA software and Pathway Studio (Elsevier) were used for differential and enrichment analysis, respectively. 1,811 DEGs were found for G1 and 2,054 for G2 (p-value 0.05). Common biological pathways, such as energy metabolism, protein turnover, redox homeostasis and immune response, were observed. We detected 88 common genes in both genetic groups, of which 33 were involved in immune response and blocking oxidative stress. In addition, seven (B2M, ADSS, SNX2, TUBA4A, ARHGAP18, MECR, ABCF3) possible gene biomarkers were found (in the 88 common ones), considering AUC > 0.70. The B2M gene was overexpressed in LRFI. This gene regulates the lipid metabolism protein turnover and inhibits cell death. We also found non-coding RNAs in both groups. MIR25 was up-regulated and SNORD16 was down-regulated in RLFI for G1. For G2, up- regulated RNase_MRP and SCARNA10 were found. We highlight MIR25 being able to act by blocking cytotoxicity and oxidative stress and RMRP as a blocker of mitochondrial damage. Keywords: Bovine, biomarkers, feed efficiency, liver tissue, RNA-Seq. 23 1. Background The world population is concerned about the environment and fulfilling Sustainable Development Goals (SDGs), such as SDG 13 (Climate Action) (UN, 2021); moreover, growing pressure from the media is putting the focus, in recent times, on livestock. Thirty-seven percent of greenhouse gas (GHG) emissions are partly bought about by global food supply even though the world population, particularly the high-income countries, has been recommended to reduce global meat consumption (MARTIN, 2017; POORE; NEMECEK, 2018), despite the recommendation of the World Health Organization (WHO) to consume 500 grams per week (SACKS et al., 2018). Brazil has the advantage of mostly having an extensive production system, which is more sustainable. It produces and exports most of the meat consumed by European countries, but it could benefit from increasing its market, as long as it seeks sustainable production (Meat|OECD- FAO, 2020). Furthermore, Brazil is the second highest meat consumer per capita (42.12 kg / year) (ABIEC, 2019), 80% of which is beef consumption. In the context of climate change and a more restrictive environmental legislation, bovine meat production is under close observation. The European market is particularly demanding wit1h regard to increasing sustainability. Food intake on farms is approximately 60% of the total cost of a livestock farm, therefore selection of more efficient animals would reduce production costs and increase the profitability of the agricultural system (SILVA DE OLIVEIRA et al., 2007; DEL CLARO; MERCADANTE; VASCONCELOS SILVA, 2012). The identification and selection of genetically superior animals for Residual Feed Intake (RFI) (more efficient animals ingest less feed than estimated for the same weight gain compared with less efficient animals) would reduce feed costs, thereby increasing profits and minimizing environmental impact (YANG et al., 2022; BASARAB et al., 2003; FORESTRY, 2006). There are various studies on the RFI trait in bovines from different breeds, genders and management systems (FONSECA et al 2015; TIZIOTO et al., 2015; CARDOSO et al., 2018; CHEN et al., 2021; DU et al., 2021; MARTIN et al., 2021), but little experimental information has been published that is thorough enough to unravel the biological regulation trait. The expression of the potential for feed efficiency is complex and depends on the interaction of numerous biochemical pathways through a multitude of tissues. The liver constitutes the main location of energy and substance metabolism (YANG et al., 2022). It can maintain glucose homeostasis in the blood and participates in protein biosynthesis as well as 24 immune and detoxification functions (TREFTS; GANNON; WASSERMAN, 2017). According to FERRELL and JENKINS (1985) in beef cattle, 70% of the total energy is used for maintenance, which illustrates the need for understanding biological mechanisms to optimize the regulation of maintenance. The non-coding RNAs have been shown to regulate gene expression (AGOSTINO et al., 2022). Non-coding RNAs (ncRNAs) are generated from the larger part of the genome that does not encode proteins but produces transcripts that regulate gene expression and protein function (WINKLE et al., 2021). ncRNAs also have the potential to be superior to established protein-based biomarkers (RÖNNAU et al., 2014), however the use of these is still limited by the wide range of concentrations and difficult detection when there is low abundance (VELONAS et al., 2013). In bovines, ncRNAs were reported as regulator of adipose tissue deposition (LEI et al., 2021) and fertility traits (GAO et al., 2019; WANG; KOGANTI; YAO, 2020). In Nelore cattle, ncRNAs were associated with intramuscular fat trait (DE OLIVEIRA et al., 2018). For RFI trait, there is studies to indicine Cattle (ALEXANDRE et al., 2020) Angus cattle (AL-HUSSEINI et al., 2016) and another Bos taurus breeding (NOLTE et al., 2020). Furthermore, studies to detect noncoding for Nelore cattle are still limited. Studying all transcriptomes in liver tissue associated with the RFI trait allows us to observe the influence and complexity of gene network interactions. Furthermore, it allows us to understand the molecular mechanisms involved in more efficient animals. The aim of this study was to use RNA-seq data to identify DEGs, specific biomarkers, non-coding RNAs detection and enriched biological processes associated with RFI of liver in Nelore cattle in two genetic groups. 2. Methods 2.2. Sampling method In this study, RNA-Seq data from two genetics groups (G1 and G2) of Nelore herds were evaluated for RFI. The animals constitutes two genetic lineages from two Brazilian researches institutes and they underwent a selection process as described below: 2.2.1. Genetic group 1 (G1) The animals belonged to a stabilized Nelore herd, with 160 matrices and 8 bulls, from the Institute of Animal Science (Brazil). Since 1978, animals have been 25 selected within the herd for yearling weight (about 8% of males and 60% of females) based on individual yearling weight performance. Animals of the entire contemporary group (n = 60) had been submitted to weight gain test for a period of up 4 months ( May 4 and October 19, 2010). The animals were weighed without prior fasting three times per week on consecutive days estimated the RFI model proposed by KOCH et al., (1963). After the weight gain test, 24 animals were selected based on RFI values in order to compose a representative sample of more (low RFI, LRFI) and less efficient (high RFI, HRFI) animals. These animals were slaughtered and samples from liver tissue were collected immediately and stored in an RNA holder (BioAgency, São Paulo, Brazil) at -80 C until RNA extraction. Liver tissue samples (50-80 mg) were collected immediately after slaughter and stored in a 15 mL Falcon tube containing 5 mL of RNA holder (BioAgency, São Paulo, SP, Brazil). The samples were frozen at -80 °C for subsequent total RNA extraction. For more details, see FONSECA et al., (2015). Total RNA was extracted using the RNeasy Lipid Tissue Mini Kit (Qiagen, CA, USA) according to the manufacturer’s protocol. The purity of the extracted RNA was determined by evaluating absorbance using a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Santa Clara, CA, USA, 2007). The quality of the RNA samples was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA, 2009) and the RNA concentration and genomic DNA contamination were measured using a Qubit® 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA, 2010). The mRNA libraries for sequencing were prepared using the TruSeq RNA Sample Preparation Kit® (Illumina, San Diego, CA, USA) according to the manufacturer's protocol. Libraries were pooled to enable multiplexed sequencing and, on average, were generated 25 million reads per sample. RNA sequencing (RNA- Seq) was performed using a HiSeq 2500 System (Illumina®) that generated 100 bp paired-end reads. 2.2.2. Genetic group 2 (G2) These steers comprised half-sib families produced by the artificial insemination of commercial and purebred Nelore dams, derived from 18 sires representing the main breeding lineages commercialized in Brazil. The 83 calves used in this expression study were allocated to feedlots in Embrapa Southeast Research. Measures of daily feed intake were collected for at least 70 days and body weight 26 was measured every 14 days. Liver samples were available for only 83 of the animals, which were ranked for RFI to select 20 animals genetically divergent for RFI. The raw data of G2 were chosen from ENA repository (EMBL-EBI) under accession PRJEB7696. Total RNA from liver tissue samples of 20 Nelore animals divergent for RFI were selected. The RNA-Seq was performed using the HiSeq 2000 System (Illumina®). Paired-end reads (2 × 100 bp) were produced. All the information about the methodology used can be found in TIZIOTO et al. (2015). 2.3. Transcriptome analyses Analyses were performed separately for each genetic group, using the same pipeline. All samples from G1 and G2 were compared using two groups; Low Residual feed Intake, LRFI (efficient) and High Residual Feed Intake , HRFI (inefficient) groups. The quality of RNA-Seq reads (quality indices, GC content, N content, length distributions, duplication, overrepresented sequences and K-mer content) was verified using Fastqc software (v.0.11.4) (WINGETT; ANDREWS, 2018). The low quality reads (adapter sequence and reads containing poly-N) from raw data using Trimmomatic v.0.36 (BOLGER; LOHSE; USADEL, 2014) with following parameters: java -jar /usr/local/bin/trimmomatic-0.36.jar PE -phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50 All downstream analysis was based on the trimmed data with high quality reads. HISAT2 v.2.0.5 (KIM; LANGMEAD; SALZBERG, 2015) was used to align the paired-end trimmed reads to the bovine reference genome (ARS-UCD1.2; Bos taurus) deposited in National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/). This reads counting were carried out using the own script and Ensembl (Index of /pub/release-105/gtf/bos_taurus/ (ensembl.org)) gene annotations (scrip in Additional File 1). EdgeR packages available from the Bioconductor project was used to estimation of normalized CPM reads (counts per million) and identify differentially expressed genes (DEGs) between RFI groups (OSHLACK; ROBINSON; YOUNG, 2010; LAW et al. 2016). We considered a gene to be differentially expressed when the p-value ≤0.05. https://www.ncbi.nlm.nih.gov/ http://ftp.ensembl.org/pub/release-105/gtf/bos_taurus/ 27 2.3.1. Non-coding RNAs analysis An alignment of the paired-end trimmed reads were performed with the Bos taurus bovine reference genome ncRNAs file downloaded from the Ensembl database:https://www.ensembl.org/Bos_taurus/Info/Index through HISAT2 v.2.0.5 (KIM; LANGMEAD; SALZBERG, 2015). This reads counting were carried out using the own script and Ensembl (Index of /pub/release-105/gtf/bos_taurus/ (ensembl.org)) gene annotations. Only reads showing a MAPQ equal to 0 were filtered out in order to remove noise related to multiple mapping sites. Counts for each non-coding RNA detected was obtained by using an in-house script, keeping only those non-coding RNAs with a mean count value higher or equal to 25 mean reads per sample. The program was used to estimation of normalized CPMs reads (counts per million) and identify differentially expressed RNA non-coding between RFI groups was used the EdgeR packages available from the Bioconductor project (OSHLACK; ROBINSON; YOUNG, 2010; SMYTH et al., 2016). We considered a gene to be differentially expressed when the p-value ≤0.05. 2.3.2. Gene set enrichment analysis To investigate biological processes and differentially expressed molecular functions (enrichment analysis) we used functional annotation gene set enrichment analysis, GSEA software (SUBRAMANIAN et al., 2005) using MSigDB database v7.5. Pathway Studio software v.10 (Elsevier Inc., Rockville, MD, USA) was used to identify the main biological processes and pathways with database ResNet v11. 2.3.3. Prediction of footprint gene profile To obtain RFI predictors or biomarkers receiver operating characteristic curves (ROC) were generated. For analysis used genefilters package (Bioconductor) with the used function "rowpAUCs". The function has applied on the normalized values (counts) in two genetic goups (independent). Statistical correction (FDR) is not applied. Rowwise calculation of Receiver Operating Characteristic (ROC) curves and the corresponding partial area under the curve (pAUC) for a given data matrix or ExpressionSet. Cutpoints (theta are calculated before the first, in between and after the last data value. By default, both classification rules x > theta and x < theta are tested and the (partial) area under the curve of the better one of the two i returned. This is only valid for symmetric cases, where the classification is independent of the magnitude of x (e.g., both over- and under-expression of 28 different genes in the same class). An AUC (Area under the ROC Curve) of 0.5 represents a test with no discriminating ability (ie, no better than chance), while an AUC of 1.0 represents a test with perfect discrimination (HOO; CANDLISH; TEARE, 2017). AUC of the proposed test was estimated to be >0.70. Results Alignment statistics The alignment parameters pipeline were the same to genetic group 1 and genetic group 2. The mean number of paired end reads per sample after filtering was approximately 19.4 million reads for G1 and for G2 were 20 million reads. Of the trimmed reads were around 86% for G1 and around 88% G2 mapped with bovine reference genome ARS-UCD1.2 Bos Taurus (Additional Table 1). The box plot containing CPMs for each group, the plot of principal component analysis (PCA), the correlation distance and Euclidian distance showed the formation of different groups (highest and lowest RFI), indicating differences in the expression of genes between the LRFI and HRFI groups (see Additional file: Figure S2). The distribution of quartiles, on box plot, was consistent between groups, indicating high quality of the data. In addition, the medians were similar in the two groups, indicating that the level of sequencing coverage permitted the identification of low-expressed genes. Differential expression analysis of RNA seq data Nelore animals from two genetic groups (G1 and G2), each divided into two groups, divergent for RFI, were compared. Results from the analysis of differential expression between LRFI and HRFI (p≤0.05) for G1 showed 1,811 DEGs (Table 2, Figures 1 and 2), 1,007 being underexpressed (55.6%) and 804 overexpressed (44.4%). Genetic group 2 showed 2,054 DEGs of which 1,140 were underexpressed (55.5%) and 914 were overexpressed (44.5%). 29 Figure 1. Distribution of the differentially expressed genes across the chromosomes. in the G1 and G2. The 88 common DEGs obtained for both genetics groups with the same biological orientation (Fold-change) (Figure 3 and Table 2). Figure 2. DEGs genes in G1 and G2 and common genes with the same biological orientation. Table 2. Eighty-eight common DEGs in G1 and G2 for LRFI (p≤0.05). Genetic group 1 Genetic group 2 Gene Name ID Gene Symbol Fold change (log2)* p- value padj Fold change (log2)* p- value padj Ankyrin Repeat y SOCS Box Protein 8 ENSBTAG00000000289 ASB8 -0.1495 0.0253 0.571 -0.1683 0.0056 0.308 ATP Binding Cassette Subfamily F Member 3 ENSBTAG00000006920 ABCF3 -0.1319 0.0412 0.571 -0.1681 0.0056 0.308 30 Vascular cell adhesion molecule 1-like LOC534578 0.8064 0.0011 0.571 0.4097 0.0253 0.445 Integral Membrane Protein 2B ENSBTAG00000003109 ITM2B 0.1688 0.0025 0.571 0.1242 0.0308 0.462 Contactin 4 ENSBTAG00000033225 CNTN4 0.9359 0.0066 0.571 0.6749 0.0472 0.51 Pleckstrin Homology Domain Containing B2 ENSBTAG00000000941 PLEKHB2 0.3896 0.0078 0.571 0.1984 0.0323 0.468 Zinc finger protein 665 ENSBTAG00000000943 ZNF286A -0.2862 0.0085 0.571 -0.1821 0.0373 0.483 Transmembrane And Coiled-Coil Domain Family 1 ENSBTAG00000006614 TMCC1 -0.2 0.0093 0.571 -0.2596 0.0009 0.22 Adenylosuccinate Synthase 2 ENSBTAG00000185100 ADSS 0.276 0.0097 0.571 0.1704 0.0267 0.451 Methionine Adenosyltransferase 2B ENSBTAG00000012350 MAT2B 0.2139 0.0097 0.571 0.1131 0.0435 0.5 Major histocompatibility complex, class II, DR alpha Cyclin B3 BoLA-DRB3 0.4489 0.0098 0.571 0.3182 0.0262 0.45 Syntaxin 7 ENSBTAG00000017139 STX7 0.2613 0.0106 0.571 0.1475 0.0479 0.51 TNF Alpha Induced Protein 8 ENSBTAG00000014551 TNFAIP8 0.3635 0.0107 0.571 0.2624 0.009 0.35 F-Box Protein 15 ENSBTAG00000009686 FBXO15 -0.4387 0.0108 0.571 -0.4727 0.0367 0.48 LRR Binding FLII Interacting Protein 1 ENSBTAG00000005354 LRRFIP1 0.327 0.0108 0.571 0.2132 0.0062 0.312 Toll Like Receptor 5 ENSBTAG00000000477 TLR5 0.5309 0.0112 0.571 0.8541 0.0159 0.394 Joining Chain Of Multimeric IgA And IgM ENSBTAG00000018531 JCHAIN 0.5275 0.0124 0.571 0.3372 0.0146 0.387 Uncharacterized LOC515089 -0.2801 0.0124 0.571 -0.3338 0.0054 0.306 Cyclin B3 ENSBTAG00000046286 CCNB3 -0.3393 0.0131 0.571 -0.437 0.0166 0.396 Annexin A1 ENSBTAG00000015978 ANXA1 0.3743 0.0136 0.571 0.2777 0.0171 0.401 Inositol Polyphosphate-5- Phosphatase K ENSBTAG00000011479 INPP5K -0.1223 0.0138 0.571 -0.1309 0.0405 0.494 Adhesion G Protein- Coupled Receptor A3 ENSBTAG00000004653 ADGRA3 -0.1803 0.0139 0.571 -0.1609 0.0074 0.333 Zinc Finger CCHC- Type Containing 7 ENSBTAG00000025859 ZCCHC7 -0.2154 0.0143 0.571 -0.2452 0.0201 0.425 Lymphocyte Cytosolic Protein 1 ENSBTAG00000007079 LCP1 0.5266 0.0145 0.571 0.4347 0.0078 0.338 Major Histocompatibility Complex, Class II, DR Alpha ENSBTAG00000010645 BoLA-DRA 0.4832 0.0157 0.571 0.2185 0.0499 0.512 CD74 Molecule ENSBTAG00000015228 CD74 0.5054 0.0158 0.571 0.2425 0.0213 0.432 Transmembrane Protein 50A ENSBTAG00000001296 TMEM50A 0.1392 0.016 0.571 0.1724 0.0416 0.497 Mannose Receptor C Type 2 ENSBTAG00000015739 MRC2 0.5687 0.0169 0.571 0.3 0.0424 0.5 Uncharacterized LOC788175 0.5953 0.0178 0.571 0.3008 0.027 0.451 Adrenoceptor Alpha 1A ENSBTAG00000031632 ADRA1A -0.2719 0.0193 0.571 -0.2767 0.0002 0.158 Stathmin 1 ENSBTAG00000013761 STMN1 0.4345 0.0194 0.571 0.2428 0.0382 0.487 Helicase, Lymphoid Specific ENSBTAG00000005979 HELLS 0.4887 0.0205 0.571 0.4734 0.0255 0.446 IFI30 Lysosomal Thiol Reductase ENSBTAG00000018349 IFI30 0.4426 0.0206 0.571 0.2357 0.0311 0.463 Mitochondrial Trans- 2-Enoyl-CoA Reductase Helicase, lymphoid-specific ENSBTAG00000017253 MECR -0.2213 0.0217 0.571 -0.2052 0.0095 0.353 DENN Domain Containing 10 FAM45A 0.2852 0.0223 0.571 0.1354 0.027 0.451 Sorting Nexin 2 ENSBTAG00000025358 SNX2 0.214 0.0231 0.571 0.1328 0.0299 0.46 Sorting nexin 6 ENSBTAG00000012873 SNX6 0.1672 0.0234 0.571 0.1112 0.0456 0.508 Rho Guanine Nucleotide Exchange Factor 5 ENSBTAG00000001815 ARHGEF5 -0.2641 0.024 0.571 -0.2617 0.04 0.493 Ribosomal Protein S6 Kinase A1 ENSBTAG00000014447 RPS6KA1 0.252 0.025 0.571 0.252 0.025 0.571 Protein Regulator Of Cytokinesis 1 ENSBTAG00000018643 PRC1 0.4199 0.0252 0.571 0.1908 0.023 0.434 31 Lumican ENSBTAG00000001745 LUM 0.4358 0.0263 0.571 0.3917 0.039 0.488 Damage Specific DNA Binding Protein 2 ENSBTAG00000020999 DDB2 0.1129 0.0264 0.571 0.2711 0.0081 0.342 FYN Binding Protein 1 ENSBTAG00000001492 FYB1 0.3626 0.0266 0.571 0.192 0.024 0.437 Granulin Precursor ENSBTAG00000018823 GRN 0.1808 0.0271 0.571 0.1586 0.0036 0.282 Pyruvate Kinase M1/2 ENSBTAG00000001601 PKM 0.3742 0.0274 0.571 0.1707 0.0364 0.479 Tetratricopeptide Repeat Domain 19 ENSBTAG00000013270 TTC19 -0.1468 0.0278 0.571 -0.1329 0.0498 0.512 ARV1 Homolog, Fatty Acid Homeostasis Modulator ENSBTAG00000021822 ARV1 -0.1859 0.0288 0.571 -0.175 0.0308 0.462 BTB Domain Containing 6 ENSBTAG00000012751 BTBD6 -0.169 0.0291 0.571 -0.0768 0.0388 0.488 Vimentin ENSBTAG00000018463 VIM 0.3079 0.0291 0.571 0.3593 0.0085 0.342 Uncharacterized LOC783680 0.2802 0.0297 0.571 0.1726 0.0247 0.441 NPC Intracellular Cholesterol Transporter 2 NPC2 0.2977 0.0297 0.571 0.3 0.0424 0.5 Kelch Like Family Member 36 ENSBTAG00000008385 KLHL36 -0.1304 0.0301 0.571 -0.1356 0.0355 0.474 Lymphocyte Cytosolic Protein 2 ENSBTAG00000009381 LCP2 0.3456 0.0302 0.571 0.2466 0.0003 0.161 Rho GTPase Activating Protein 18 ENSBTAG00000015381 ARHGAP18 0.3437 0.0304 0.571 0.2553 0.0292 0.458 Calpain 2 ENSBTAG00000012778 CAPN2 0.2296 0.031 0.571 0.1569 0.0408 0.495 Thymus, Brain And Testes Associated ENSBTAG00000021180 TBATA -0.4652 0.0319 0.571 -0.5368 0.0426 0.5 Coactosin Like F- Actin Binding Protein 1 ENSBTAG00000016315 COTL1 0.2718 0.0321 0.571 0.1888 0.0162 0.395 LETM1 Domain Containing 1 ENSBTAG00000010924 LETMD1 -0.1713 0.0335 0.571 -0.1957 0.0442 0.504 Toll Like Receptor 2 ENSBTAG00000008008 TLR2 0.3937 0.0337 0.571 0.3998 0.0008 0.204 Cytochrome B-245 Beta Chain ENSBTAG00000019953 CYBB 0.5508 0.0338 0.571 0.4176 0.0248 0.443 Amyloid Beta Precursor Protein Binding Family B Member 1 Interacting Protein ENSBTAG00000012526 APBB1IP 0.3322 0.034 0.571 0.2228 0.0061 0.31 Uncharacterized LOC112446800 -0.5062 0.0343 0.571 -0.5978 0.0305 0.461 P21 (RAC1) Activated Kinase 1 ENSBTAG00000010191 PAK1 0.318 0.0344 0.571 0.2334 0.0356 0.474 Thymosin Beta 4 X- Linked ENSBTAG00000038488 TMSB4X 0.3005 0.035 0.571 0.3199 0.0032 0.277 Moesin ENSBTAG00000003418 MSN 0.3817 0.0356 0.571 0.1346 0.0447 0.506 Nuclear Receptor Subfamily 4 Group A Member 2 ENSBTAG00000003650 NR4A2 0.4061 0.0358 0.571 0.5458 0.0214 0.433 Integrin Subunit Beta 2 ENSBTAG00000017060 ITGB2 0.4358 0.0359 0.571 0.2014 0.0401 0.493 Myosin IF ENSBTAG00000007661 MYO1F 0.3206 0.0366 0.571 0.1629 0.0369 0.482 GDNF Inducible Zinc Finger Protein 1 ENSBTAG00000008908 GZF1 -0.1394 0.0369 0.571 -0.2676 0.0003 0.161 LDL Receptor Related Protein 5 ENSBTAG00000005903 LRP5 -0.1151 0.038 0.571 -0.0944 0.0411 0.495 Serine Palmitoyltransferase Small Subunit B ENSBTAG00000049223 SPTSSB 0.4723 0.04 0.571 0.5892 0.0184 0.413 CD63 Molecule ENSBTAG00000011931 CD63 0.0771 0.0402 0.571 0.0907 0.0178 0.408 Myosin Binding Protein H ENSBTAG00000011465 MYBPH 0.9146 0.0402 0.571 0.6306 0.0169 0.399 Marginal Zone B And B1 Cell Specific Protein ENSBTAG00000038337 MZB1 0.4825 0.0407 0.571 0.57 0.0191 0.418 Lymphocyte Antigen 9 ENSBTAG00000002947 LY9 0.3472 0.0409 0.571 0.3123 0.0053 0.306 SET Domain And Mariner Transposase Fusion Gene ENSBTAG00000018935 SETMAR -0.2426 0.0409 0.571 -0.2787 0.017 0.399 32 Cathepsin S ENSBTAG00000017135 CTSS 0.4946 0.041 0.571 0.3211 0.0209 0.428 Ecotropic Viral Integration Site 2B ENSBTAG00000009353 EVI2B 0.2939 0.0427 0.571 0.1647 0.0454 0.508 Solute Carrier Family 16 Member 9 ENSBTAG00000019792 SLC16A9 0.5142 0.0431 0.571 0.5586 0.0058 0.308 Parvin Gamma ENSBTAG00000008057 PARVG 0.3136 0.0454 0.571 0.2878 0.0205 0.427 TATA-Box Binding Protein Associated Factor 5 Like ENSBTAG00000135801 TAF5L -0.1487 0.0454 0.571 -0.1746 0.0458 0.508 Beta-2- microglobulin ENSBTAG00000012330 B2M 0.2578 0.046 0.571 0.2057 0.0115 0.365 Tubulin alpha 4a ENSBTAG00000030974 TUBA4A -0.2453 0.0461 0.571 -0.2353 0.0178 0.408 FGR Proto- Oncogene, Src Family Tyrosine Kinase ENSBTAG00000011784 FGR 0.3737 0.0462 0.571 0.202 0.0284 0.456 Actin beta ENSBTAG00000026199 ACTB 0.2713 0.0466 0.571 0.1693 0.0346 0.472 Rho GTPase Activating Protein 30 ENSBTAG00000017875 ARHGAP30 0.299 0.0469 0.571 0.1761 0.0333 0.469 Adaptor Related Protein Complex 4 Subunit Beta 1 ENSBTAG00000003614 AP4B1 -0.2135 0.0481 0.571 -0.2169 0.0301 0.46 Death Associated Protein Kinase 1 ENSBTAG00000000738 DAPK1 0.2284 0.0482 0.571 0.2626 0.0192 0.418 *The Fold-change estimates (relative expression) refer to the LRFI group. Biomarkers To find the most important biomarkers that describe our trait, we performed ROC curves using AUC >0.70. We found seven genes: B2M, ADSS, SNX2, TUBA4A, ARHGAP18, MECR, ABCF3 that were on our list of the 88 DEGs (Table 2). The changes in expression and significance can be observed in Table 2. These biomarkers were associated with important biological processes, such as cell proliferation, insulin release, apoptosis, regulated cell death, lipid metabolism, fatty acid synthesis, lipid homeostasis, protein targeting, protein metabolic process, protein folding, glucose import, carbohydrate metabolism, reactive oxygen species generation, immune response, cell cycle, cell renewal and cell death (Figure 3). 33 Figure 3. Biological processes associated with biomarkers related to the RFI trait. The genes highlighted in red are up-regulated in LRFI, the blue circles present the down-regulated genes in the LRFI group compared with the HRFI; the green arrows indicate a positive regulation effect, while the red arrows denote a negative regulation effect. 34 Gene set enrichment analysis We followed two strategies to address the biological analysis, one using all transcripts with significant fold change for each genetic group and the other filtering the DEGs common in the two genetic groups (Table 2). Biological analysis of the whole transcriptome data The gene enrichment analysis showed 1,275 biological processes in G1 (1,163 with up-regulation and 112 with down-regulation), 124 molecular functions (95 down-regulated and 29 down-regulated), and 200 cellular components (148 down-regulated and 52 down-regulated) when comparing LRFI and HRFI. For G2, 375 biological processes were detected (214 with up-regulation and 161 with down-regulation), 65 molecular functions (38 with up-regulation and 27 with down-regulation) and 90 cellular components (38 with down-regulation and 52 with down-regulation) when comparing LRFI and HRFI. Out of all the biological processes we highlight, the ones related to energy metabolism, protein turnover, redox homeostasis and immune system all common to G1 (see Additional Table 2) and G2 (see Additional Table 3). These biological processes were described previously in the analysis of biomarkers (see Figure 4). The following describes each general biological process found and studied for each genetic group: Energy Metabolism Significant biological processes were identified for G1 related to negative energy balance in more efficient cattle like a negative regulation of energy derivation by oxidation of organic compounds (GO:0015980) through body use to produce energy and the adenosine triphosphate (ATP) metabolic process (GO:0046034) (Additional Table 2). Furthermore, in the liver synthesis of lipids, lipoproteins or phospholipids which are then used in the rest of the tissues of the organism. However, in the LRFI group an inhibited lipoprotein biosynthetic process was found (GO:0042158) as was the lipoprotein metabolic process (GO:0042157), oligosaccharide-lipid intermediate biosynthetic process (GO:0006490), and there may be less lipid accumulation in the more efficient group (LRFI). On the other hand, in the LRFI group we saw regulation of phospholipase activity (GO:0010517), regulation of lipid kinase activity (GO:0043550) and positive regulation of lipid kinase activity (GO:0090218) (Additional Table 2). This tendency was also observed for G2 where we found, in LRFI animals, 35 inactive regulation of generation of precursor metabolites and energy (GO:0043467) even though we obtained a fatty acid derivative biosynthetic process activated (GO:1901570), lipopolysaccharide-mediated signaling pathway (GO:0031663) (Additional Table 3). The liver regulates glucose in blood, converting glucose to glycogen or triacylglycerides or converting glycogen to glucose to elevate its level. When there is a lack of glucose and glycogen, the liver can convert amino acids and lipids to glucose. In the more efficient group (LRFI), an inactive glycogen biosynthetic process (GO:0005978), a glucan biosynthetic process (GO:0009250) and regulation of the glycolytic process (GO:0006110), the cellular response to glucose starvation (GO:0042149) can be observed (Additional Table 3). Protein Turnover The renewal of proteins (protein turnover) is related to both the synthesis of new proteins and the degradation of already existing proteins that provide essential components to synthesize other proteins. The LRFI group of the G1 displa