A bioinformatics analysis of a nitrilase-like novel yeast ORFan by Erica Koyama '23

Erica Koyama 
Dr. Susan Walsh 
Yeast ORFan Project 
26 November 2020

Saccharomyces cerevisiae (S. cerevisiae), commonly known as baker’s yeast or brewer’s yeast, is a eukaryotic unicellular fungus and an important model system for biological research. It became the first completely sequenced eukaryotic genome in 1996 with the completion of the Yeast Genome Project (Dujon 1996). The completed sequence represents the S. cerevisiae nuclear genome, and it became accessible to the public in 1996 (Mewes 2019). The yeast genome is distinctive among eukaryotic genomes, as 72% consist of open reading frames (ORFs), leaving only a small percentage for noncoding DNA (Dujon 1996). This means that the genome is compact, a result of intron scarcity and short intergenic regions (Dujon 1996). 

S. cerevisiae is a significant model organism for many different reasons. First, they can be easily and inexpensively grown and stored in laboratories. Under optimal conditions, cells bud every 90 minutes on agar plates without incubators and can be stored for years at room temperature after being freeze-dried (Duina et al. 2014). Additionally, genes in the form of plasmids or linear nucleotides can be easily moved into and out of cells, allowing scientists to study the phenotypic effects of a gene mutation or manipulation. Auxotrophic marker genes may be encoded into these plasmids, which has created standardized plasmids that may be used for biochemical pathway mapping. Through the combination of homologous recombination and plasmid transformations, targeted loci can be disrupted, making genetic manipulation a simple process in S. cerevisiae (Duina et al. 2014). Finally, the high degree of conservation of amino acid sequences and protein function between S. cerevisiae and other eukaryotes make it possible to transfer function annotations from S. cerevisiae to another organism (Botstein and Fink 2011). This is advantageous as protein function can only be obtained through experimentation, and databases such as the Gene Ontology Consortium (GO), whose annotations derive mostly from yeast, make it easier to obtain protein function in yeast than in other organisms (Botstein and Fink 2011).

While S. cerevisiae is a well-studied organism, it is estimated that 1700 of its ORFs are orphan open reading frames (ORFans) or ORFs with no significant sequence homologs (Lin et al. 2013). As ORFan function cannot be determined solely through sequence homology, its functions remain a mystery, and it is suggested that most ORFans are pseudogenes (Siew and Fischer 2004). One ORFan in S. cerevisiae is YIL165C. A mutation in this gene causes mitophagy defects, and the protein encoded by this gene is 119 amino acids long and weighs 12.9 kD (Kanki et al. 2009). 

Specific Aims and Experimental Design
YIL165C and its neighboring YIL164C protein, also known as NIT1, may be one protein instead of two separate proteins. NCBI’s Conserved Domain Database (CDD), Pfam, and Protein Data Bank (PDB) indicate that YIL165C is homologous to the carbon-nitrogen hydrolase superfamily, specifically to the nitrilase family. Nitrilases are thiol enzymes that directly catabolize nitriles into ammonia and its corresponding carboxylic acid (Gong et al. 2012). They are multimeric α-β-β-α sandwich proteins that have a conserved E-K-C catalytic triad, and mutations in this region dramatically decrease enzyme activity in the cell (Pace and Brenner 2001). In yeast, two other nitrilases have been characterized, NIT2 and NIT3 (Figure 1; Lin et al, 2013; Peracchi et al, 2017). An alignment between YIL165C and S. cerevisiae nitrilase (GAX71837.1) indicates that YIL165C spans the last 119 amino acids of the 302 aa long nitrilase (Figure 2A). In addition, according to the NCBI protein database, the E-K-C triad is located on the 44, 135, and 169 positions of the S. cerevisiae nitrilase, which is an area upstream from YIL165C. These data suggest that YIL165C only accounts for part of the nitrilase protein. As the Saccharomyces Genome Database shows that NIT1 occupies the area upstream of YIL165C, there is a high likelihood that NIT1 may be part of a larger protein with YIL165C. This hypothesis is further supported through an alignment among NIT1, YIL165C, and the nitrilase of the closely related species of lager yeast Saccharomyces pastorianus (S. pastorianus) (Figure 2B). In this alignment, NIT1 is shown to have a high similarity with the first 199 aa of the 322 aa S. pastorianus nitrilase and YIL165C to be very similar to the last 119 aa, with a stop codon, R, T, and V between the two proteins (Figure 2B; Figure 2C). In S. pastorianus, the region between NIT1 and YIL165C is W, R, T, and V, which suggests that the stop codon between NIT1 and YIL165C in S. cerevisiae may actually code for an amino acid if the two proteins are one big protein. In addition, the 44, 135, and 169 positions of NIT1 are E, K, and C respectively, which corresponds with the triad location specified by the NCBI protein database and suggests that NIT1 has an essential role in the nitrilase protein (Figure 2B).  Alternatively, an alignment with the S. cerevisiae nitrilase (GAX71837.1) published in 2017 suggests that there may be an intron between the two genes (Figure 2D). According to NCBI’s nucleotide database, the last 18 aa of NIT1 is coded by a non-coding region, which raises the possibility that this region may not code for amino acids. 

Polymerase chain reaction (PCR) will be used to determine if the stop codon between NIT1 and YIL165C is accurate. Primers will be designed to the 5’ end of NIT1 and the 3’ end of YIL165C, and through PCR, the region which includes the putative stop codon will be amplified. We will use both genomic DNA and mRNA as templates for the PCRs because there may be an intron in the sequence (Figure 2D). The amplified DNA will be sent for sequencing, and the resulting sequences will be translated to determine the disputed codon sequence and decide if YIL164C and YIL165C are indeed two separate genes or a single gene. In addition, as the gDNA will contain all portions of the sequence and the mRNA only the exons, a comparison of the proteins encoded by the two sequences will further tell us whether the confusion existed due to the presence of a small intron. If the two proteins have the same length, there is an intron in the sequence, and if the lengths differ, there is no introns. Once we have determined the accurate sequence of the full-length protein, we can use this protein in our further experiments. In addition to using PCR to find out if the two proteins are one big protein, we will use rapid amplification of cDNA ends (RACE). As the protein’s 5’ end may be coded by a region further upstream on the genome than the start of NIT1, RACE will be used to determine the extreme 5’ end of the protein (Dallmeier and Neyts 2013).

YIL165C is predicted to localize in the cytoplasm, nucleus or mitochondria of S. cerevisiae. PSORT indicates the protein to be cytoplasmic with 94.1% reliability and that it is a peripheral protein. While statistically insignificant, LoQate indicates that YIL165C is localized in the cytosol and nucleus (Figure 3A). However, data from NucPred suggest that the protein is not nuclear, so it is unlikely that the protein is localized in the nucleus (Figure 3B). Additionally, PSORT predicts NIT1 to be in the cytoplasm, which increases the likelihood that YIL165C may be localized there if the two proteins are one protein (Figure 3C). However, due to the nature of tagging each of these ORFs as independent entities, these data are suspect since they do not represent the full-length protein which may contain additional localization sequences in aggregate. Additionally, based on homology to other nitrilases, the NIT1/YIL165C protein may be localized in the mitochondria or cytosol. While statistically insignificant, LoQate indicates NIT3 to be localized in the mitochondria and cytosol and NIT2 in the cytosol (Figure 3D; Figure 3E). In addition, a High Throughput Direct Assay (HDA) was used to infer that NIT3 localizes in the mitochondria (Renvoisé et al. 2014; Reinders et al. 2006 Jun 6). 

To determine where the protein is localized in the cell, the full-length protein will be tagged with a fluorescent mCherry protein to analyze its location using fluorescent microscopy. Ideally, we will tag the protein at both the C- and N-terminus in case the tag may interfere with localization. We predict to visualize fluorescence in the cell according to where the protein is localized. For example, if the protein is localized in the mitochondria, the mitochondria should fluoresce, and a respective pattern should be observed for all sites where the protein localizes. An alternative approach to determine protein localization is through subcellular fractionation and western blotting (Daum et al. 1982). Organelles will be separated through fractionation and centrifugation, and a western blot of the subfractions will determine the localization of the proteins. Columns in SDS-PAGE will be loaded with each subfraction, and a band will signify if the protein localizes in the respective subfraction. For example, if a band appeared in the mitochondrial subfraction column, the protein is localized in the mitochondria. 

There is overwhelming evidence indicating that the combined protein of NIT1 and YIL165C is a homolog of nitrilase (Figure 2). Nitrilases are highly valued by chemical and pharmaceutical industries, as the nitrilase reaction does not produce unwanted inorganic waste and as it catabolizes nitriles without requiring strong acids and bases or high temperatures (Dennett and Blamey 2016). In addition, SUPERFAMILY identifies the protein to be structurally similar to carbamylase. Of nitrilase’s 13 branches, 2 are carbamylases, which is distinguished by the additional glutamate at the 142 position that is hydrogen bonded to the K in the E-K-C catalytic triad (Weber et al. 2013). It is proposed that the nucleophilicity of C is enhanced by the H-bonded glutamate, which enables it to stick to the amide’s carbonyl carbon (Weber et al. 2013). Carbamylases are hydrolases which decarbamylate D-amino acids and carry out a special type of amidase reaction (Figure 4; Pace and Brenner 2001). Thus, NIT1/YIL165C may catabolize nitriles and produce D-amino acids, the corresponding carboxylic acid, and ammonia. 

To determine the function of the protein, we will use a colorimetric assay. As the enzyme catalyzes a nitrile into carboxylic acid, there should be a drop in pH if there is enzyme activity. Thus, the enzyme assay will be performed using pH indicator dye (bromothymol blue), and a color change from blue to yellow will confirm carboxylic acid formation and nitrilase activity (Sahu et al. 2019). On the other hand, a lack of color change will signify the absence of nitrilase activity (Sahu et al. 2019). Instead of using purified proteins, whole yeast or bacterial cells will be used for the assay (Sahu et al. 2019). Rhodococcus rhodochrous (MTCC-291) will be the positive control and Escherichia coli (MTCC-729) the negative (Sahu et al. 2019). The WT protein and the E142D mutant will be the experimental groups (Weber et al. 2013). The WT is predicted to have a color change from blue to yellow as it is likely to be a nitrilase, while the mutant is predicted to have no change, as E142D creates an unstable active site (Table 1). 

The goal of this research is to understand the function and localization of the YIL165C ORFan in S. cerevisiae. Through my analysis, I think that YIL165C may actually be a single protein in combination with the upstream NIT1 protein. As the Gene Ontology Consortium (GO), a database which determines protein function and structure through homology, mostly uses yeast to derive its annotations, understanding ORFans in yeast may help understand the function of ORFans in other species through homology. This may result in a new protein discovery and a better understanding of ORFan function in the cell.


Botstein D, Fink GR. 2011. Yeast: An Experimental Organism for 21st Century Biology. Genetics. 189(3):695–704. doi:10.1534/genetics.111.130765.

Dallmeier K, Neyts J. 2013. Simple and inexpensive three-step rapid amplification of cDNA 5′ ends using 5′ phosphorylated primers. Analytical Biochemistry. 434(1):1–3. doi:10.1016/j.ab.2012.10.031.

Daum G, Böhni PC, Schatz G. 1982. Import of proteins into mitochondria. Cytochrome b2 and cytochrome c peroxidase are located in the intermembrane space of yeast mitochondria. J Biol Chem. 257(21):13028–13033.

Dennett GV, Blamey JM. 2016. A New Thermophilic Nitrilase from an Antarctic Hyperthermophilic Microorganism. Front Bioeng Biotechnol. 4. doi:10.3389/fbioe.2016.00005. [accessed 2020 Nov 4]. https://www.frontiersin.org/articles/10.3389/fbioe.2016.00005/full.

Duina AA, Miller ME, Keeney JB. 2014. Budding Yeast for Budding Geneticists: A Primer on the Saccharomyces cerevisiae Model System. Genetics. 197(1):33–48. doi:10.1534/genetics.114.163188.

Dujon B. 1996. The yeast genome project: what did we learn? Trends in Genetics. 12(7):263–270. doi:10.1016/0168-9525(96)10027-5.

Gong J-S, Lu Z-M, Li H, Shi J-S, Zhou Z-M, Xu Z-H. 2012. Nitrilases in nitrile biocatalysis: recent progress and forthcoming research. Microbial Cell Factories. 11(1):142. doi:10.1186/1475-2859-11-142.

Kanki T, Wang K, Baba M, Bartholomew CR, Lynch-Day MA, Du Z, Geng J, Mao K, Yang Z, Yen W-L, et al. 2009. A Genomic Screen for Yeast Mutants Defective in Selective Mitochondria Autophagy. MBoC. 20(22):4730–4738. doi:10.1091/mbc.e09-03-0225.

Lin D, Yin X, Wang X, Zhou P, Guo F-B. 2013. Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines. PLoS One. 8(7). doi:10.1371/journal.pone.0064477. [accessed 2020 Nov 15]. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707884/

Mewes H-W. 2019. The bioinformatics of the yeast genome—A historical perspective. Yeast. 36(4):161–165. doi:10.1002/yea.3378.

Pace HC, Brenner C. 2001. The nitrilase superfamily: classification, structure and function. Genome Biology. 2(1):reviews0001.1. doi:10.1186/gb-2001-2-1-reviews0001.

Peracchi A, Veiga-da-Cunha M, Kuhara T, Ellens KW, Paczia N, Stroobant V, Seliga AK, Marlaire S, Jaisson S, Bommer GT, et al. 2017. Nit1 is a metabolite repair enzyme that hydrolyzes deaminated glutathione. PNAS. 114(16):E3233–E3242. doi:10.1073/pnas.1613736114.

Reinders J, Zahedi R, Pfanner N, Meisinger C, Sickmann A. 2006 Jun 6. Toward the Complete Yeast Mitochondrial Proteome: Multidimensional Separation Techniques for Mitochondrial Proteomics. doi:10.1021/pr050477f. [accessed 2020 Nov 26]. https://pubs.acs.org/doi/pdf/10.1021/pr050477f.

Renvoisé M, Bonhomme L, Davanture M, Valot B, Zivy M, Lemaire C. 2014. Quantitative variations of the mitochondrial proteome and phosphoproteome during fermentative and respiratory growth in Saccharomyces cerevisiae. Journal of Proteomics. 106:140–150. doi:10.1016/j.jprot.2014.04.022.

Sahu R, Meghavarnam AK, Janakiraman S. 2019. A simple, efficient and rapid screening technique for differentiating nitrile hydratase and nitrilase producing bacteria. Biotechnology Reports. 24:e00396. doi:10.1016/j.btre.2019.e00396.

Siew N, Fischer D. 2004. Structural Biology Sheds Light on the Puzzle of Genomic ORFans. Journal of Molecular Biology. 342(2):369–373. doi:10.1016/j.jmb.2004.06.073.

Weber BW, Kimani SW, Varsani A, Cowan DA, Hunter R, Venter GA, Gumbart JC, Sewell BT. 2013. The Mechanism of the Amidases. J Biol Chem. 288(40):28514–28523. doi:10.1074/jbc.M113.503284.

Figures and Legends