类风湿下载_在线阅读_12

首页 > 类风湿

is_177991

暂无简介

类风湿 GENOMICS PROTEOMICS & BIOINFORMATICS www.sciencedirect.com/science/journal/16720229 Article Mining Functional Gene Modules Linked with Rheumatoid Arthritis Using a SNP-SNP Network Lin Hua, Hui Lin, Dongguo Li, Lin Li*, and Zhich...

GENOMICS PROTEOMICS & BIOINFORMATICS www.sciencedirect.com/science/journal/16720229 Article Mining Functional Gene Modules Linked with Rheumatoid Arthritis Using a SNP-SNP Network Lin Hua, Hui Lin, Dongguo Li, Lin Li*, and Zhicheng Liu Biomedical Engineering Institute, Capital Medical University, Beijing 100069, China. Genomics Proteomics Bioinformatics 2012 Feb; 10(1): 23-34 DOI: 10.1016/S1672-0229(11)60030-2 Received: Mar 11, 2011; Accepted: Aug 31, 2011 Abstract The identification of functional gene modules that are derived from integration of information from different types of networks is a powerful strategy for interpreting the etiology of complex diseases such as rheumatoid arthritis (RA). Genetic variants are known to increase the risk of developing RA. Here, a novel method, the construction of a genetic network, was used to mine functional gene modules linked with RA. A polymorphism interaction analy- sis (PIA) algorithm was used to obtain cooperating single nucleotide polymorphisms (SNPs) that contribute to RA disease. The acquired SNP pairs were used to construct a SNP-SNP network. Sub-networks defined by hub SNPs were then extracted and turned into gene modules by mapping SNPs to genes using dbSNP database. We per- formed Gene Ontology (GO) analysis on each gene module, and some GO terms enriched in the gene modules can be used to investigate clustered gene function for better understanding RA pathogenesis. This method was applied to the Genetic Analysis Workshop 15 (GAW 15) RA dataset. The results show that genes involved in func- tional gene modules, such as CD160 (rs744877) and RUNX1 (rs2051179), are especially relevant to RA, which is supported by previous reports. Furthermore, the 43 SNPs involved in the identified gene modules were found to be the best classifiers when used as variables for sample classification. Key words: polymorphism interaction analysis, hub SNP, sub-networks, GO enrichment analysis Introduction It is well-recognized that complex diseases are caused by multiple gene-gene interactions, in which each gene may have a small effect on disease development, rather than by single gene defects (1). As high-density single nucleotide polymorphism (SNP) arrays and subsequent genome-wide association studies (GWAS) were developed, the study of complex diseases has *Corresponding author. E-mail: lil@ccmu.edu.cn © 2012 Beijing Institute of Genomics. All rights reserved. become of widespread interest for researchers. Tradi- tional methods of genetic analysis are often weak when applied to some complex diseases, which are most likely to be both genetically multifactorial and phenotypically heterogeneous. It is therefore sug- gested that the study of complex diseases should not be restricted to single gene identification, but should focus on gene interaction studies. Recently, there have been several studies exploring gene-gene interactions in different ways (2-5). Furthermore, more and more evidence shows that investigating gene-gene interactions may lead to the development of a functional network and functional Hua et al. / Mining Gene Modules Using SNP-SNP Network Genomics Proteomics Bioinformatics 2012 Feb; 10(1): 23-34 24 modules (6). Iossifov et al (7) predicted pathways or networks of interacting genes that contribute to com- mon heritable disorders by combining standard ge- netic linkage formalism with whole-genome molecu- lar interaction data. Similarly, Wang et al (8) demon- strated pathway-based approaches, which jointly con- sidered multiple contributing factors in the same pathway. In addition, Franke et al (9) developed a functional human gene network that integrated infor- mation on genes and the functional relationships be- tween genes based on multiple databases. They used the network to identify important candidate genes from numerous loci on the basis of their functional interactions and reduced the cost of pinpointing true disease genes in the analyses of disorders. In summary, molecular networks can be obtained from many levels including co-expression (10), co-regulation (11) or protein-protein interactions (6). Depending on the different networks, a variety of methods have been suggested for the mining of useful functional infor- mation, such as clustering genes that show high cor- relation coefficients between gene expression profiles (12, 13), identifying functional modules based on the structure of transcriptional regulation (14), or pre- dicting functional modules encoded in a microbial genome (15). With the rapid development of GWAS, the construction methods of molecular networks pro- vided us with a potential strategy for obtaining a net- work at the genetic level by using predicted interac- tions between SNPs. Networks like this may show special features due to the genetic component and may aid in the explanation of complex diseases. Ac- cordingly, by introducing disease information into such a network and further analyzing functional gene modules, we can learn more about the functional cha- racteristics of disease etiology. As we know, rheumatoid arthritis (RA) is a chronic disease that leads to inflammation of the joints and surrounding tissues. Recent studies have indicated that genetic factors play important roles in the in- creased risk of developing RA. In the present study, we present a novel method for mining functional gene modules linked with RA. First, we carried out the Haseman-Elston (H-E) test (16) and Random Forest (RF) algorithm (17) to screen out disease-related SNPs from a whole-genome dataset. Secondly, can- didate SNPs shared by the H-E test and RF algorithm were used to construct the SNP-SNP network with polymorphism interaction analysis (PIA) algorithm (18), which was developed as a new method to iden- tify the synergistic contribution of SNPs to diseases. Then, sub-networks were extracted by analyzing the structure of the SNP-SNP network. Further, using the dbSNP database, all of sub-networks were mapped onto gene modules. We used a permutation-based procedure to evaluate the significance of associated SNP pairs. For the five gene modules we discovered, Gene Ontology (GO) analysis indicated that genes within a common module were likely to be enriched on some RA-related GO terms. Furthermore, the 43 SNPs involved in the identified gene modules, were found to be the best classifiers when used as variables for sample classification. Finally, we compared the results of our method to existing tools including GRAIL (19) and GSEA-SNP (20) to evaluate the similarity and novelty of our results. Results Construction of RA-specific SNP-SNP net- work In this study, we defined a total score for each SNP- SNP pair as described in the Materials and Methods section. We found when we kept the top 1,000 SNP pairs obtained using each of seven scores involved in PIA, only a small number of overlapping SNP pairs were found. However, it is interesting to note that us- ing the aforementioned total score, we acquired the maximum number of overlapping SNP pairs with all of seven measures involved in PIA. Therefore, total score was a more reasonable measure for evaluating cooperating SNP pairs contributing to disease. As a result, we used total score to evaluate the interaction strength of each SNP-SNP pair (Table S1). According to our permutation test as described in the Materials and Methods section, the empirical dis- tribution of total scores was formed from 1,000,000 scores, and a threshold value of total score ( 1.3394S  ) was considered as a cut-off value at a significance level (P=0.05) to screen out SNP pairs (Figure 1). Among the top 1,000 SNP pairs acquired with the original dataset, we found that the total Hua et al. / Mining Gene Modules Using SNP-SNP Network Genomics Proteomics Bioinformatics 2012 Feb; 10(1): 23-34 25 scores of the top 100 SNP pairs were all greater than the threshold value. We used these significant SNP pairs to construct a SNP-SNP network specific to RA. This network contains 110 SNPs and 100 edges, where an edge indicated a SNP pair. The total scores and their corresponding P-values for the top 100 SNP pairs are shown in Table S2. The SNP-SNP network shown in Figure S1 was generated with MAVisto software (21). Figure 1 The empirical distribution of total scores. By per- muting sample labels 1,000 times, the PIA algorithm is per- formed repeatedly for 1,000 new datasets. The empirical dis- tribution of total scores is formed from all above results. The threshold value of 1.3394 corresponds to a significance level of 0.045. Identification of functional gene modules According to our rule for extracting hub SNP, five hub SNPs with a degree greater than 5 were extracted: rs1424903 (degree=18, P=1.5×10-13), rs744877 (de- gree=9, P=2.3×105), rs164466 (degree=5, P=0.010), rs1004531 (degree=5, P=0.010) and rs759382 (de- gree=5, P=0.010). Then, five sub-networks defined by hub SNPs were extracted. To mine functional gene modules and high risk genes linked with RA, we mapped the SNPs onto genes using a dbSNP database. We calculated the distances of all SNPs to the splice variants of their nearest genes along chromosomes. The highest frequency occurs in the range from 0 to 4,000 base pairs. This result closely agrees with pre- vious reports, in which SNPs that are >500 kb away from any gene are not considered because most en- hancers and repressors are <500 kb away from genes, and most linkage disequilibrium blocks are <500 kb (8). This mapping method allowed us to identify genes implicated by SNPs of sub-networks to obtain gene-gene interaction modules. As a result, 59 genes associated with RA were identified from 110 SNPs involved in the SNP-SNP network. A total of 12 genes were associated with 19 SNPs in the rs1424903-related gene module. The rs744877 (CD160)-related gene module contained seven genes associated with 10 SNPs. five genes corresponding to six SNPs were included in the rs1004531 (TNFAIP8) related gene module. The rs164466-related and rs759382 (SLC9A4)-related gene modules each con- tained two genes corresponding to six SNPs. We identified 59 genes involved in the SNP-SNP network as the background set, and genes involved in 5 gene modules as the test sets. Using a significance level of 0.05, most enrichment results were found in the rs1424903-related and rs744877-related gene modules. At a significance level of 0.1, two enrich- ment GO terms occurred in the rs1004531-related gene module: GO: 0005515 (P=0.08591) and GO: 0005886 (P=0.0702). However, no distinct enrich- ment phenomena were seen in other gene modules owing to their low number of genes. Those gene modules with significant GO terms were considered functional gene modules relevant to RA. Two gene modules particularly enriched in GO terms (the rs1424903-related and rs744877-related gene modules) are shown in Table 1. In addition, the enrichment re- sults for five gene modules are shown in the heat map generated by Cytoscape software (22) in Figure S2. We found that SNPs [rs2051179 (RUNX1), rs164466, rs1424903, rs744877 (CD160) and rs759382 (SLC9A4)] involved in functional gene modules were previously identified as susceptibility loci in a study using ensemble decision trees (4) and another study using the BGTA algorithm (14). As shown by “Molecular Function” in Table 1, the rs1424903-related gene module was relevant to pro- tein binding (GO: 0005515). Kristensen has previ- ously reported that there seems to be a qualitative rather than a quantitative change in 3H-imipramine binding in patients with RA (23). As shown by the dimension “Biological Process", a significant GO term was regulation of transcription (GO: 0006355). Hua et al. / Mining Gene Modules Using SNP-SNP Network Genomics Proteomics Bioinformatics 2012 Feb; 10(1): 23-34 26 Table 1 Enriched GO terms with P<0.1 in the rs1424903-related and rs744877-related gene modules Gene module Category GO term P n# m# Description rs1424903-related MF GO:0005515 0.0550 16 5 Protein-binding GO:0003700 0.0523 5 2 Transcriptional activator activity GO:0008270 0.0461 8 3 Zinc ion-binding GO:0005524 0.0729 9 3 ATP-binding BP GO:0006355 0.0461 8 3 Regulation of transcription CC GO:0005634 0.0729 9 3 Nucleus GO:0005622 0.0461 8 3 Intracellular GO:0005737 0.0360 11 4 Cytoplasm GO:0016021 0.0729 9 3 Integral to membrane GO:0005886 0.0919 6 2 Plasma membrane rs744877-related MF GO:0005524 0.0401 9 2 ATP-binding BP GO:0007165 0.0182 7 2 Signal transduction CC GO:0016021 0.0401 9 2 Integral to membrane GO:0005886 0.0109 6 2 Plasma membrane Note: n#, Number of genes contained in a category counted using 59 background genes. m#, Number of genes contained in a category counted using 12 genes and 6 genes for the rs1424903-related and rs744877-related gene modules, respectively. MF stands for Molecular Function; BP and CC stand for Biological Process and Cellular Component, respectively. Enriched GO terms with P<0.05 are in bold. Redlich et al have previously found that overexpres- sion of Ets-1 in RA synovial tissue may be due to tu- mor necrosis factor-alpha (TNF-) and interleukin 1 (IL-1). Therefore, they suggested that Ets-1 may be an important transcription factor in the cytokine-mediated inflammatory pathway and destructive cascade char- acteristic of RA (24). Aud and Peng also investigated whether transcription factors have important roles in the pathogenesis of inflammatory arthritis, and they have proposed several targets for anti-inflammatory therapies to modulate transcription factor activity (25). Based on the dimension “Cellular Component”, there is also evidence to support the significance of cyto- plasm (GO: 0005737). Anti-neutrophil cytoplasm an- tibodies (ANCA) occur occasionally in RA, but their incidence and clinical significance are unknown. Sa- vige et al demonstrated that ANCA may be associated with systemic vasculitis, and there is an incomplete correlation between indirect immunofluorescence patterns and antibody specificity in enzyme-linked immunosorbent assay (ELISA) systems (26). For the rs744877-related gene module, four sig- nificant GO terms were found. As shown by “Mo- lecular Function”, GO: 0005524 (ATP binding) was related to the inflammatory response (27). Schimitz et al (28) demonstrated that the ATP-binding cassette (ABC) transporter, ABCA1, was induced during dif- ferentiation of human monocytes into macrophages, and there was a dual regulatory function for ABCA1 in macrophage lipid metabolism and inflammation. As shown by 'Biological Process', the significance of GO: 0007165 (signal transduction) is also supported by previous studies. Extracellular signals are transduced intracellularly via multiple pathways, resulting in al- terations in the transcription and translation of spe- cific proteins. Some of these signaling pathways re- sult in the production of proteins, including cytokines and matrix metalloproteinases, which are implicated in the pathogenesis of RA (29). Further analysis revealed more valuable informa- tion in the functional gene modules. Among the 12 genes included in the rs1424903-related gene module, zinc-finger protein 238 (ZNF238) is attached to zinc-finger proteins that can regulate the human im- munodeficiency virus type 1 (HIV-1) long terminal Hua et al. / Mining Gene Modules Using SNP-SNP Network Genomics Proteomics Bioinformatics 2012 Feb; 10(1): 23-34 27 repeat (LTR) (30). In the rs744877-related gene mod- ule, hub gene CD160 is a potential RA association gene. The CD160 receptor represents a unique trig- gering surface molecule that is expressed by cytotoxic NK cells, participates in the inflammatory response and determines the type of subsequent specific immu- nity (31). In addition, it is interesting to observe two links among five hub SNPs. One pair was rs164466- rs1004531 (TNFAIP8), and the other was rs1424903 rs759382 (SLC9A4). This may suggest that functional gene modules cooperate to affect RA and highlight the need for further study. Comparison with GRAIL We also sought to compare our method with another SNP analytical tool, GRAIL. In the GRAIL program, we took all query regions involved in the whole net- work as the input to GRAIL. Interestingly, common gene cliques were found between gene groups ac- quired with GRAIL and the functional gene modules identified with our methods (Table 2). For example, MYH9, CTSB, ELOVL6 and PHACTR1 in the rs1424903-related gene module were also included in the gene group obtained with GRAIL (Ptext=0.0026). GRAIL and our study are two methods based on dif- ferent paths for mining gene groups associated with disease. GRAIL can extract similar genes from all query regions that the user is attempting to evaluate. Compared to those gene groups acquired by GRAIL, functional gene modules linked with RA identified using our method are more conservative and represent higher risk because these modules are prioritized layer by layer, and include gene-gene interactions associ- ated with disease. Comparison with Gene Set Enrichment Anal- ysis-SNP (GSEA-SNP) on five sub-networks defined by hub SNPs GSEA-SNP programs were performed for five sub-networks. An enrichment score (ES) was com- puted for each sub-network. Using a permutation test, we obtained the threshold value of ES at a signifi- cance level of 0.05 for each sub-network (Figure S3). Those sub-networks with P<0.05 were extracted as enrichment sub-networks associated with disease. The results showed that three sub-networks were signifi- cant: the rs1424903-related (P=0.0020), rs164466- related (P=0.0020) and rs759382-related sub-netw- orks (P<0.0001) (Table 3). It is worth noting that the rs1424903-related gene module was also the functional gene module with the most significant GO terms, Table 2 The comparison between gene modules identified by our method and similar genes acquired with GRAIL Gene modules identified by our method Similar genes acquired with GRAIL Gene module (sub-network) Genes included in gene module Pmodulea Similar genes Ptext b rs1424903-related ZNF238, NDEL1, GMDS, RUNX1, MYH9, ELOVL6, CTSB, PHACTR1, BHMT2, SLC9A4, LOC339977, ANKH 1.5E-13 PRKCB1, PLEK, IQGAP2, MYH9, ELOVL6, CTSB, PHACTR1 0.0026 rs744877 (CD160)-related CD160, C21orf34, LHFP, GRPEL2, CNTN4, CSNK2A2, WDR62 2.3E-5 CTSB, CASP6, PRKCB1,GRPEL2, CNTN4, CSNK2A2 0.0131 rs1004531 (TNFAIP8)-related TNFAIP8, PLAU, RIMS1, C10orf55, ELF1 0.010 BTBD9, CASP6, CSNK2A2, ELF1, TNFATP8, RIMS1, PLAU 0.0249 rs164466-related TNFAIP8, MTMR9 0.010 MYH9, NLRP7, CSNK2A2, ELOVL6, PRKCB1 0.0038 rs759382 (SLC9A4)-related SLC9A4, C21orf34 0.010 Note: aPmodule indicates the probability of a hub SNP with >t connections (t is the degree of the hub) in a random network. bPtext is the text-based simi- larity metric based on GRAIL. Similar genes with Ptext<0.01 by GRAIL analysis are shown. Genes underlined represent common genes shared by two gene sets, which are gene modules identified by our method and genes acquired with GRAIL. Hua et al. / Mining Gene Modules Using SNP-SNP Network Genomics Proteomics Bioinformatics 2012 Feb; 10(1): 23-34 28 Table 3 GSEA-SNP results for five sub-networks defined by hub SNPs Number of SNPs/genes Sub-network SNPs Genes ES Number of significant SNPs by x2-test (P<0.05) P FDR ES threshold values rs1424903-related 19 12 0.6672 9 0.0020 <0.001 0.5807 rs744877-related 10 7 0.5652 3 0.2398 0.002 0.6566 rs164466-related 6 2 0.8032 4 0.0020 0.002 0.7455 rs1004531-related 6 5 0.7493 3 0.0551 0

本文档为【类风湿】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

类风湿

热门搜索

历史搜索