|
|
|
REVIEW
1 Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA; 2 Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA; 3 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| Abstract |
|---|
|
|
|---|
[Keywords: Network topology; regulatory circuits; scale-free networks]
Until recently dissection of biological networks has occurred through the efforts of individual laboratories working on one or a few components, limiting a thorough understanding of individual biological processes in the context of the entire cellular network. Detailed analysis of specific components and their interacting partners or substrates can be used to assemble high-confidence pathways. For example, analysis of the NF-
B and TGF-
signaling pathways has revealed many components whose functions are reasonably well known for each of these pathways (Mishra et al. 2005
; Karin 2006
). Nonetheless, in spite of the intensive study of such pathways, new components of these pathways continue to be discovered (Covert et al. 2005
; Ma et al. 2006
), indicating that our analysis of even the most well-studied pathways is likely to be incomplete.
The advent of high-throughput techniques has allowed the large-scale identification of components (genes, RNAs, and proteins), their expression patterns, and their biochemical and genetic interactions. Although useful for generating large amounts of biological information, the data from such studies are often incomplete and contain errors. Nonetheless, they can provide valuable information about the functions of individual components and unexpected relationships between components and cellular processes. For example, Arg5,6, a well-characterized metabolic enzyme, was identified to have a DNA-binding activity through a proteome microarray screen and was later confirmed to regulate gene expression in vivo (Hall et al. 2004
). Thus far a variety of large-scale data sets have been identified and used to assemble different networks. Below we briefly describe the different types of biological networks and general features and principles that result from the analysis of such networks.
| Types of biological networks |
|---|
|
|
|---|
|
|
Transcription factor-binding networks have been assembled in two ways: (1) The analysis of individual components has been used to develop intricate maps in sea urchins and other model organisms (Davidson et al. 2002
); and (2) the large-scale identification of transcription factor-binding sites using chromatin immunoprecipitation followed by probing of genomic microarrays (ChIPchip) or DNA sequencing (ChIPPET or STAGE) has been used to assemble networks in yeast and other organisms (Horak and Snyder 2002
; Kim et al. 2005
; Wei et al. 2006
).
Thus far a large number of ChIP mapping experiments have been performed in yeast and mammalian cells. The data from ChIP experiments are often of variable quality, particularly in mammalian cells. Most of the initial ChIPchip experiments used genomic arrays comprised of PCR products that allowed crude mapping of binding sites and often lower-quality results. More recent experiments use oligonucleotide arrays that allow higher-resolution mapping of the binding regions (Cawley et al. 2004
; Borneman et al. 2006
). The calling of targets is not trivial as there is a considerable range of signals and probability values associated with each target, often leading to arbitrary assignment of thresholds to the data. Nonetheless, interesting networks have been assembled using these data sets.
For yeast, >250 ChIPchip experiments have been performed using cells incubated in a variety of experimental conditions or treated with different stimuli, and >10,000 interactions have been reported (Horak et al. 2002
; Lee et al. 2002
; Harbison et al. 2004
; Borneman et al. 2006
). These have been assembled into a variety of global networks and subnetworks. For mammalian cells, a large number of experiments have also been performed, often by analyzing selected regions of the genome (Martone et al. 2003
; Cawley et al. 2004
) or promoter regions (Li et al. 2003
). For example, the global identification of targets of three factors involved in embryonic stem cell maintenance has suggested pathways important for stem cell self-renewal (Boyer et al. 2005
). Similarly, the analysis of targets of three major transcription factors has revealed a transcriptional map of skeletal myogenesis (Blais et al. 2005
).
By combining binding data with expression data, the putative effect of binding on transcriptional output (i.e., activation or repression) can often be obtained. For inducible factors, studies with human NF-
B and STAT1 indicate that only a subset (30%40%) of differentially expressed genes appear to be direct targets of the factor of interest; presumably many differentially expressed genes are regulated by factors other than the one of interest. Likewise, only a small fraction of binding sites appear to be directly modulating nearby gene expression, as many binding sites do not reside near genes whose expression is altered. For example, the majority of NF-
B- and STAT1-binding sites reside near genes whose expression is not altered by the conditions that activate the factor (Martone et al. 2003
; Cawley et al. 2004
; Hartman et al. 2005
). In addition, experiments with yeast have shown that deletion of a transcription factor typically affects only a subset of targets (Gasch et al. 2000
). These observations indicate that many binding sites lack biological function, or more likely, are functionally redundant with other regulatory sites or affect gene expression under other conditions. For the case of mammalian systems, they might also operate on genes that reside at distant locations (Carroll et al. 2005
).
Proteinprotein interaction networks
Proteinprotein interaction maps represent the largest and most diverse data sets available to date. The first maps were generated using two-hybrid studies in which interactions of protein partners are accessed in yeast using a transcriptional readout (Uetz et al. 2000
; Ito et al. 2001
). Large-scale two-hybrid studies have been used to study interactions in other organisms such as Drosophila, Caenorhabditis elegans, and humans (Giot et al. 2003
; Li et al. 2004
; Rual et al. 2005
). More recently, high-throughput studies using affinity purification followed by identification of associated proteins using mass spectrometry have resulted in large data sets of protein interactions. Two recent studies have described the purification of most proteins present in a eukaryotic cell, and both identified
500 protein complexes in yeast (Gavin et al. 2006
; Krogan et al. 2006
). Considering the coverage of the experiments, these studies suggest there are
800 protein complexes in yeast. Extrapolation to the human proteome based on gene number predicts an estimate of 3000 human protein complexes.
Interactions studies each have technical concerns associated with them (Goll and Uetz 2006
). Two-hybrid studies may reveal interactions that do not normally occur in vivo. Affinity purification, on the other hand, may yield protein contaminants and may not detect interactions in which binding partners are present substoichiometrically in a complex. Comparison between these data sets reveals only partial overlap even for the most comprehensive studies. This is likely due to the incomplete coverage of each study and diverse computational methods or stringencies applied to interpret the raw data sets. Nonetheless, these interaction maps, when integrated together, have revealed global topological and dynamic features of interactome networks that relate to known biological properties (see below).
Protein phosphorylation networks
Studies of yeast and humans have suggested that 30% of cellular proteins are phosphorylated in vivo (Cohen 2000
; Ficarro et al. 2002
; Manning et al. 2002a
); this figure is most likely a large underestimate of the number of phosphorylated residues since comprehensive mapping studies have not been performed. Consistent with the importance of phosphorylation as a regulatory mechanism, eukaryotes devote
2% of their protein-coding genes to protein kinases, ranging from 122 for yeast to 518 for humans (Zhu et al. 2000
; Manning et al. 2002b
).
Until recently, protein phosphorylation has generally been mapped on a limited scale. However, newly developed approaches in mass spectrometry have allowed the identification of a large number of phosphorylated residues including those regulated during cell stimuli and developmental responses (Ficarro et al. 2002
; Gruhler et al. 2005
; Ptacek and Snyder 2006
). These approaches usually involve enrichment of phospho-proteins using matrices that bind phospho-modified proteins. For example, one study of the developing forebrain and midbrain tissues of embryonic mice used strong cation exchange columns followed by tandem mass spectrometry to identify >500 serine, threonine, or tyrosine phospho-sites (Ballif et al. 2004
). Other studies have used immunoprecipitation to enrich for tyrosine phospho-proteins followed by mass spectrometry; these have led to discovery of novel phospho-tyrosine protein modifications in human T cells (Brill et al. 2004
; Tao et al. 2005
).
In addition to the identification of phosphorylated residues, two new approaches have shed light on discovering substrates of protein kinases. The use of modified kinases that accept only radiolabeled ATP analogs has revealed many substrates for several yeast kinases including the cyclin-dependent kinases Pho85 and Cdc28 (Dephoure et al. 2005
; Loog and Morgan 2005
). A second approach used a proteome microarray containing 4400 yeast proteins to detect in vitro substrates for the majority of yeast protein kinases. This study identified
4200 phosphorylations affecting >1300 substrates (Ptacek et al. 2005
). These different studies have identified a large number of phosphorylation events, many of which were validated in vivo. Many of the phosphorylations involved substrates that operate in a known pathway of the kinase; however, several validated substrates function in different cellular processes from those known for the kinase, thereby revealing new functions for the protein kinases.
Metabolic interaction networks
The wealth of biochemical data generated in the past century when combined with genome sequences allows the construction of metabolic networks. The metabolic network usually focuses on the mass flow in basic chemical pathways that generate essential components such as amino acids, sugars, and lipids, and the energy required by the biochemical reactions. As such, these networks typically present both protein and metabolite information. Literature curation and genome annotation have elucidated many complex biochemical pathways (Kanehisa and Goto 2000
; Overbeek et al. 2000
) from which various metabolic networks have been reconstructed in a wide variety of organisms such as Escherichia coli (Reed et al. 2003
), Saccharomyces cerevisiae (Duarte et al. 2004
), and human mitochondria (Vo et al. 2004
).
Interactions in metabolic networks are closely related to the gene functions, and therefore have great potential for immediate applications in the interpretation of gene roles. Considerable attention has been focused on the network dynamics using constraint-based analyses such as flux balance analysis (FBA), which assumes the steady state of all metabolites and that the organisms will optimize the metabolite fluxes to maximize biomass production (Segre et al. 2002
; Famili et al. 2003
; Forster et al. 2003
). This approach has led to many successful predictions. For example, an in silico flux model was used to predict the phenotypes of yeast strains containing gene deletion mutations grown under various media conditions and achieved a remarkable 83% accuracy (Duarte et al. 2004
). In addition, a flux model on a yeast metabolic network was able to explain enzyme dispensability; that is, how loss-of-function mutations of many yeast enzymes result in viable strains (Papp et al. 2004
). This model suggested that the majority of nonessential enzymes are vital for cell growth under certain previously untested conditions, whereas only a small subset are compensated by isoenzymes or parallel pathways. Other successful constraint-based analyses in metabolic networks have also been performed. These include (1) re-engineering micro-organisms with gene deletions for the purpose of manipulating their chemical products (Burgard et al. 2003
) and (2) evaluating steady-flux distributions in human mitochondria using constraints related to normal, disease, and dietetic treatment conditions (Thiele et al. 2005
). Additional examples of constraint-based analysis can be found in a detailed review (Price et al. 2004
). Although many metabolic network studies were developed in micro-organisms and S. cerevisiae. These studies may also shed light in other organisms since the fundamental network structures may be conserved in evolution. Topological analysis of metabolic networks in 43 organisms covering all three life domains revealed highly similar topological properties, although great diversity exists among individual pathways and components (Jeong et al. 2000
).
Genetic and small molecule interaction networks
Combining mutations in two different genes can either synergistically reduce or enhance the growth or fitness of an organism, relative to organisms containing individual mutations. One of the most common interactions analyzed is "synthetic lethality" in which mutations that do not individually cause loss of viability are lethal when combined (Bender and Pringle 1991
; Costigan et al. 1992
). For manyif not mostspecies, the majority of genes are not lethal when mutated individually; this is likely because of either genetic redundancy or because the affected genes normally enhance the fitness of the organism rather than are essential for its viability. When mutations are combined in the same strain to produce a phenotype stronger than that caused by an individual mutation, the mutated genes are often thought to reside in parallel redundant pathways, although other interpretations are possible. Regardless of the reason, the ability to combine mutations to produce strong phenotypes provides the opportunity to carry out synthetic lethal analysis on a large scale that provides a wealth of useful information.
Large-scale synthetic lethal screens have been performed in S. cerevisiae in which deletion mutations in only 1100 protein-coding genes (of
6000 total) prevent growth in standard rich medium (Winzeler et al. 1999
; Giaever et al. 2002
). Genetic interaction screens using either plate (SGA) or microarray readouts (dSLAM) with yeast strains containing mutations in nonessential genes have been used to systematically uncover synthetic lethal interactions (Tong et al. 2001
, 2004
; Pan et al. 2004
). One recent study that combined genetic interactions from high-throughput methods and a literature curation of 53,117 publications in PubMed produced an S. cerevisiae genetic network containing 3258 genes and 13,963 interactions; this network revealed a significant overlap with proteinprotein interactions (Reguly et al. 2006
). For essential genes, strains containing conditional mutations such as those that confer a temperature-sensitive growth defect or with the gene under the control of a tetracycline titratable promoter can be analyzed under conditions that reduce, but do not eliminate, the activity of the gene product (Davierwala et al. 2005
). Analysis of these interactions has also revealed functional relationships between genes and a high correlation with other properties, such as mutant phenotypes and cellular localization, thus helping to assign biological roles for unknown genes and infer novel functions to annotated genes.
In addition to synthetic lethal screens, other types of genetic interactions can be measured. These include combining mutations that disrupt inhibitory interactions and thus enhance growth. In fact, interactions that when combined either enhance or reduce growth have been investigated to generate a detailed genetic interaction map, E-MAPs (for epistatic miniarray profiles), for genes involved in the yeast early secretory pathway (Schuldiner et al. 2005
). Another type of genetic interaction is a synthetic dosage lethal screen in which overexpressed genes are introduced into a mutant strain background; synthetic dosage lethality can provide additional, and often nonoverlapping interaction data to those found by combining inactivating mutations (Measday et al. 2005
). For example, overexpression of genes that inhibit growth in a mutant strain background has been used to screen for genes that would negatively regulate protein kinase substrates (Sopko et al. 2006
). Finally, a conceptually similar approach to synthetic lethality is to screen for mutant strains that are hypersensitive to inhibitory small molecules. Thus far, screens have been performed between inhibitory chemical compounds and deletion mutants of all yeast nonessential genes or strains heterozygous for mutations in essential genes (Giaever et al. 2004
; Parsons et al. 2004
). Such chemical genetic interactions, when integrated with genetic interactions, often suggest pathways targeted by the drugs as well as potential direct drug targets. Thus, this approach offers a powerful tool in deciphering the mechanisms of action of drugs as well defining suitable biological pathways that can be targeted for inhibition.
Other biological networks
The global behavior of gene interactions can also be investigated by networks connecting genes and/or proteins sharing certain properties. A coexpression network, in which genes are connected if their transcripts are coregulated, was assembled in S. cerevisiae and contains 4077 genes connected by 65,430 interactions (Stuart et al. 2003
; van Noort et al. 2004
). Proteins that share other properties, such as biological processes (Tari et al. 2005
) and mutant phenotypes (Gunsalus et al. 2005
; Ohya et al. 2005
), can also be linked with each other and assembled into networks. The coexpression and homolog networks differ from the other networks described above in that the interactions are based on similarities not related to gene function. Nonetheless, they can still be investigated with similar approaches and often exhibit comparable network topology. Moreover, these networks also share the "guilt by association" property with the five biological networks: Highly connected proteins are likely to be functionally related. Therefore studies on these networks may also discover novel protein roles and help to decipher the complex cellular networks, especially when integrated with other biological networks.
| Global topology |
|---|
|
|
|---|
Network topology plays a vital role in understanding network architecture and performance. Several of the most important and commonly used topological features include degree, clustering coefficient, shortest path length, and betweenness (Fig. 2). Detailed descriptions of each these statistics are listed as follows: (1) Degree: The number of links connected to one vertex is defined as its degree. In directed networks, the number of arcs that end at the node is termed as "in-degree," and the number of arcs that start from the node is termed as "out-degree." A node with high degree is better connected in the network and therefore may play a more important role in maintaining the network structure. (2) Distance: The shortest path length between two vertices is defined as their distance. In an interaction network, the maximum distance between any two nodes is termed as the graph diameter. The average distance and diameter of a network measure the approximate distance between vertices in a network. A network with a small diameter is often termed as a "small world" network (Milgram 1967
), in which any two nodes can be connected with relatively short paths. Many real world networks such as metabolic networks have a small world architecture (Watts and Strogatz 1998
), which may serve to minimize transition times between metabolic states (Wagner and Fell 2001
). (3) Clustering coefficient: The clustering coefficient of one vertex can be calculated as the number of links between the vertices within its neighborhood divided by the number of links that are possible between them. A high clustering coefficient for a network is another indicator of a small world. (4) Betweenness: Betweenness is the fraction of the shortest paths between all pairs of vertices that pass through one vertex or link. Betweenness estimates the traffic load through one node or link assuming that the information flows over a network primarily following the shortest available paths.
|
k
, in which k is the degree and P(k) is the probability that a randomly selected node has a degree k. This results in a "fat-tailed" distribution in which there are vertices with high degrees termed "hubs." The advantage of this type of organization is that the system is more robust; random loss of individual nonhub vertices is less disruptive in a scale-free network than a random network.
|
Further analysis of the transcription factor network has also revealed an additional novel aspect of regulatory network hierarchy. When the binding targets of E. coli and S. cerevisiae transcription factors are analyzed with respect to binding to other transcription factors, a pyramid-shaped hierarchical organization can be assembled with a few key regulators at the top to which few other factors bind and most transcription factors on the bottom as the functional units for specific pathways (Yu and Gerstein 2006
). Similar to the middle managers in social networks such as governmental hierarchies, transcription factors in the middle layers often regulate more targets and have higher betweenness, indicating that they may function as bottlenecks in the hierarchy. With more interaction data gathered in the future, such hierarchical structures can also be investigated in other directed networks such as metabolic networks and phosphorylation networks.
| Similarities between the transcription and phosphorylation networks |
|---|
|
|
|---|
250 transcription factors and 122 protein kinases in yeast (Zhu et al. 2000
1300 transcription factors and 518 protein kinases in humans. As shown in Figure 4, we have performed a detailed comparison of the network topologies of the yeast transcription factor-binding network and phosphorylation network under rich-nutrient conditions. These networks contain a remarkable number of similarities. First, the two networks share similar degree distributions: exponential in-degree distributions (Fig. 4A) and power law out-degree distributions (Fig. 4B). Second, many topological parameters are comparable between the two networks; however, the phosphorylation network is denser than the transcription factor-binding network and contains more nodes with large in- and out-degrees. Finally, the current phosphorylation network is smaller than the transcription factor-binding network. Both networks are built on incomplete data sets and may contain errors. The yeast phosphorylation data, in particular, are primarily collected from one large-scale study covering only two-thirds of all the yeast kinases. The transcription factor-binding network has more experimental sources and therefore a larger coverage. Since diameter is positively correlated with the network size, and limited sampling of a network often lowers the average clustering coefficient (Friedel and Zimmer 2006
|
| Network modules |
|---|
|
|
|---|
Many methods have been developed to identify possible network modules. A traditional method, hierarchical clustering, assigns a weight value to the distance between any two nodes in a network, and then gathers nodes with similar weight vectors together into strongly connected cores (Rives and Galitski 2003
). Instead of detecting cores of modules in hierarchical clustering, the Girvan-Newman algorithm focuses on defining the boundaries of modules by searching for edges with high betweenness and therefore those that are more likely to link different modules (Girvan and Newman 2002
). Other algorithms have been introduced recently and may demonstrate improvement in module identification (Guimera and Nunes Amaral 2005
; Adamcsek et al. 2006
; Newman 2006
). One concern, however, is that network modules are often dependent on the methods and parameters used in the initial data partitioning, and in general it is difficult to tell which method is better (Barabasi and Oltvai 2004
). Furthermore, inaccurate and incomplete data of the interaction networks may also lead to biased module predictions. Nonetheless, networks modules are still ubiquitous structures in most biological networks and may help one to better understand the interplay between network structure and function.
| Network motifs |
|---|
|
|
|---|
We applied a tool, mfinder (Milo et al. 2002
), to identify enriched three-element and four-element motifs in an updated yeast transcription factor-binding network and the yeast phosphorylation network. Both data sets were generated in yeast cells grown in rich media conditions. Among all possible three-element motifs, the FFL was found to be well overrepresented in transcriptional networks (Fig. 5). Coherent FFL, in which both transcription factors have the same regulation effects (induction or repression) on the target, may suggest a functional design for gene transcription regulation. Studies have shown that coherent FFLs can control downstream processes in a fashion that is resistant to transient noise, since targets in FFL can only be effectively regulated through persistent signals (Shen-Orr et al. 2002
). A FFL motif can be easily extended to a four-element motif, "bi-FFL," in which the two regulators collectively control two targets. Bi-FFL motifs are also significantly enriched in yeast transcription factor-binding networks.
|
Two four-element motifs were enriched in both the yeast transcriptional network and the phosphorylation network (Fig. 5). A simple version of the DOR motif, the "bi-fan motif," in which two regulators bind common targets, may suggest a way to use a limited number of regulators to precisely control a large number of targets under several different conditions. Moreover, the cooperation of transcription factors to regulate targets can also compensate for the degeneracy and low affinity of single transcription factor-binding sites (Pilpel et al. 2001
). The other enriched four-element motif, the "bi-parallel motif," comprises a regulator controlling two other regulators that further regulate one target gene. Bi-parallel motifs are found in both transcriptional and phosphorylation networks and indicate redundancy. In addition to the two four-element motifs shared by both networks, the single input motif (SIM) was found to be overrepresented only in the yeast phosphorylation network. This likely reflects the lack of phosphorylation data currently available.
| Network integration |
|---|
|
|
|---|
Recent bioinformatics software platforms enable users to query and integrate very different types of interaction data to learn new information (Breitkreutz et al. 2003
; Shannon et al. 2003
; Stark et al. 2006
). Instead of searching for overlapping interactions, integration of very different types of interaction data can also be performed to reveal composite motifs that contain multiple types of interactions and elements as basic units. An integration of transcription factor binding, proteinprotein interactions, and phosphorylation data from yeast has revealed a mega-network of >60,000 interactions (Fig. 6A). Investigations in this mega-network revealed seven three-element kinase-centered composite motifs (Fig. 6B), of which five (motifs 15) were shown to be overrepresented (Ptacek et al. 2005
). These composite motifs involve at least one kinasesubstrate interaction pair (referred to as "kinates") and one other type of interaction (proteinprotein interaction or transcription factor binding). Thus, network integration combines various data sources together and therefore can assist in uncovering proteins that are important in multiple types of interactions and provide a more comprehensive view on their cellular functions. Moreover, this network can be combined with other networks such as biochemical and gene interaction data to reveal a more comprehensive view of regulation in yeast.
|
| Network dynamics |
|---|
|
|
|---|
In proteinprotein interaction networks, proteins may vary their partners according to time and location. By integrating gene expression data with a high-quality yeast proteinprotein interaction data set, Han et al. (2004)
studied the network dynamics in proteinprotein interaction networks and revealed two types of hubs: "party hubs" and "date hubs." Party hubs interact with all their partners simultaneouslythat is, at the same time and spatial locationsand are more likely to function within the same cellular processes. Date hubs, on the other hand, vary their connections to other proteins at different times and locations and therefore link various biological processes. When considering the modular designs of networks, in silico deletions of these hubs implied that party hubs are more likely to be the module organizers and date hubs to be the module connectors.
The dynamics of the transcriptional network in yeast has been examined on a genomic scale by integrating gene expression data for five cellular conditions with known transcriptional regulatory relationships (Luscombe et al. 2004
). A trace-back algorithm was applied to uncover subnetworks that are active under specific conditions. Luscombe et al. (2004)
found that these subnetworks exhibit vastly different topologies on both a local and a global level and uncovered two separate groups of cellular states. In so-called exogenous states (e.g., stress response), the network has a shorter diameter and large hubs that should allow cells to respond quickly to external conditions. In endogenous states (e.g., cell cycle), loops and highly intricate connections are more prevalent, indicating a multistage internal program. Different sets of transcription factors become key regulatory hubs at different times, portraying a network that shifts its weight between different foci to bring about distinct cellular states.
| Network evolution |
|---|
|
|
|---|
In general, core components of a network tend to be conserved, whereas components at the periphery or false interactions are not. In transcription factor-binding networks, this concept has been applied to identify functional regulatory elements that are conserved in several yeast species (Cliften et al. 2003
; Kellis et al. 2003
). Studies also have shown that interactions in one organism can be mapped to another organism if both partners are highly conserved (Yu et al. 2004
). Conserved proteinprotein interaction pairs are termed as interologs (Walhout et al. 2000
), whereas conserved transcriptional binding interaction pairs are termed as regulogs (Yu et al. 2004
). New interactions in novel organisms can then be discovered through mapping interologs or regulogs.
Although conservation of network components and connections is extremely valuable for mapping conserved interactions and common features among organisms, it is likely that many regulatory interactions are not conserved. Mapping of Ste12- and Tec1-binding sites in closely related yeast S. cerevisiae, Saccharomyces mikatae, and Saccharomyces bayanus reveals extensive divergence in binding sites in these different yeasts (A. Borneman and M. Snyder, unpubl.). These changes likely lead to species diversity and the ability of organisms to occupy distinct ecological niches.
| Networks and human disease |
|---|
|
|
|---|
Identification of functional roles of unknown pathogenic genes can also shed light on discovering disease pathogenic mechanisms. Proteins connected tightly in biological networks often work in similar processes. Hence, functional annotations of interacting partners may indicate potential roles of unannotated disease-related genes and help us to better understand the pathological mechanisms of the disease. Lim et al. (2006)
constructed an interactome map focusing on proteins responsive to human inherited ataxias and purkinje cell degeneration with a yeast two-hybrid screen. The majority of known ataxia-causing proteins were connected with short paths, suggesting that other components in the network might contain candidates responsive to other related inherited ataxias with unknown causative genes. Furthermore, the hubs of this network had crucial roles for disease development in animal models, implying a relationship between the disease and the biological processes in which they are involved: RNA binding or splicing. Such systematic studies can easily be applied to other diseases and organisms and will help to identify crucial components for the disease pathology.
| Challenges and future directions |
|---|
|
|
|---|