Chapter 11

Computational Platform for Integration and Analysis of MicroRNA Annotation

Mariana Yuri Sasazaki; Joaquim Cezar Felipe [email protected]    Department of Computing and Mathematics, Faculty of Philosophy, Sciences, and Languages of Ribeirão Preto, University of São Paulo at Ribeirão Preto, Ribeirão Preto, SP, Brazil

Abstract

MicroRNA (miRNA) is a small, noncoding RNA that plays a critical role in important biological processes. Currently, there are many data sources about miRNA, which allow us to study its functions and to indirectly infer functional similarity of them. An miRNA can be annotated with information such as if it acts as a tumor suppressor, the organism to which it belongs, and its associations with genes, diseases, and pathological events. However, the existing sources are not standardized and integrated, and this complicates research and studies. This chapter proposes the creation and instantiation of an miRNA integrated database platform, containing information from heterogeneous databases and ontologies, and the creation of a functional similarity method using these data. Furthermore, to enable the access to our database, an environment was designed and implemented that contains the necessary infrastructure so that researchers can include data from new findings and seek information through an intuitive web framework.

Keywords

Data integration

microRNA (miRNA) biological databases

miRNA annotation

miRNA functional similarity

1 Introduction

MicroRNA (miRNA) consists of small, noncoding RNA genes that are about 25 nt long and are found in several organisms, which regulate the expression of target genes by binding to complementary regions of messenger transcripts to repress their translation or to regulate degradation. MiRNAs are highly conserved across species and have crucial functions in the regulation of important processes, such as development, proliferation, differentiation, apoptosis, and metabolism. An aberrant expression of miRNAs has been associated with the pathogenesis of several diseases, such as cancer (Ambros, 2004; Ferracin and Negrini, 2012; Esquela-Kerscher and Slack, 2006).

It is usual to explore meaningful molecular targets and infer new functions of genes through gene functional similarity measuring, specially using gene ontology (GO) (Gene Ontology Consortium, 2015). However, few studies are available in this field for miRNA genes due to limited miRNA functional annotations. On the other hand, it is known that two genes (including miRNA genes) that present similar functions are often associated with similar diseases, and the relationship of different diseases can also be represented by a structure of a directed acyclic graph (DAG), such as Medical Subject Headings (MeSH; U.S. National Library of Medicine, 2015).

A well-known database of miRNA annotation utilized by researchers from the biomedical area is miRBase (miRBase, 2015). It contains published information about miRNAs, such as sequences and annotations, as well as the species and the family to which they belong. In addition, this database allows the registration of names and new miRNAs considering standardization. The information in miRBase is frequently updated, and its current version contains more than 24,000 precursors of miRNAs and more than 30,000 mature miRNAs of 206 different species.

Other relevant miRNA databases that can be cited are as follows:

 MicroRNA.org: A comprehensive resource of microRNA target predictions and expression profiles. Target predictions are based on a development of the miRanda algorithm that incorporates current biological knowledge on target rules and on the use of an up-to-date compendium of mammalian miRNAs. MiRNA expression profiles are derived from a comprehensive sequencing project of a large set of mammalian tissues and cell lines of normal and disease origin (Betel et al., 2008).

 MiRNA—Target Gene Prediction at EMBL: Contains a file with all miRNATarget Gene Predictions for Drosophila miRNA. The file is structured as registers of predicted miRNA/gene pairs with the following categories: miRNA, CG-ID, gene name, Flybase-ID, validated or predicted untranslated region (UTR), score of best site, total score of all sites, number of sites (Brennecke et al., 2005).

 MicroCosm Targets (formerly miRBase Targets): A web resource developed by the Enright Lab at the EMBL-EBI containing computationally predicted targets for miRNA across many species. The miRNA sequences are obtained from the miRBase Sequence database and most genomic sequence from EnsEMBL. It uses the miRanda algorithm to identify potential binding sites for a given miRNA in genomic sequences (Griffiths-Jones et al., 2008).

 MiRNAMap: Contains experimental verified miRNAs and experimental verified miRNA target genes in human, mouse, rat, and other metazoan genomes. In addition to known miRNA targets, computational tools previously developed were applied for the purpose of identifying miRNA targets in 3’-UTR of genes (Hsu et al., 2008).

 miR2Disease: A manually curated database aimed at providing a comprehensive resource of miRNA deregulation in several human diseases. Each entry in the miR2Disease contains detailed information on a miRNA-disease relationship, including miRNA ID, disease name, a brief description of the miRNA-disease relationship, miRNA expression pattern in the disease state, detection method for miRNA expression, experimentally verified miRNA target genes, and bibliographic references (Jiang et al., 2009).

 TargetScan Dataset: A data set composed of all predicted biological targets of miRNA by searching for the presence of conserved 8-mer and 7-mer sites that match the seed region of each miRNA. As an option, nonconserved sites are also predicted. Sites with mismatches in the seed region that are compensated for by conserved 3’ pairing are also identified (Lewis et al., 2003).

The amount of records stored in these databases is increasing considerably. An example can be seen in Figure 11.1, which shows the number of associations between miRNA and diseases included in the Human MicroRNA Disease Database (HMDD).

f11-01-9780128025086
Figure 11.1 Increasing numbers of miRNA-disease associations in HMDD.

Concerning standardization, Huang et al. (2010) created the first ontology about a miRNA domain called Ontology for MicroRNA Target (OMIT). With this formal representation of knowledge, it is possible to make easier knowledge acquisition and data sharing from existing data sources, getting standardization of heterogeneous information that come from different miRNA databases. OMIT includes several concepts that cover different information contained in general miRNA annotation.

Furthermore, if we consider the structure of OMIT, one miRNA can be annotated with target genes, related diseases, its action as an oncogenic miRNA or as a tumor suppressor one, the organism to which it belongs, the experiment used to validate it, its associations with proteins and pathological events. Thus, the functional similarity between miRNAs can be inferred based on the similarity of target genes or associated diseases, supplemented with other information contained in their annotations. If all these annotations were integrated, we could infer the functional similarity of miRNAs based on a combination of all this information simultaneously, so that we could obtain a similarity value that is more effective and accurate, characterizing the calculation of a “composed” functional similarity.

In this chapter, we describe the creation and instantiation of the miRNA Integration and Analysis (MIRIA) platform. MIRIA consists of an integrative database for global miRNA information based on ontologies, together with a web tool for maintaining and querying this database and a method to calculate the functional similarity between miRNAs. The integrative database encompasses the information from a set of existing miRNA databases and was designed to cover the OMIT concepts. It also includes the deployment of related ontologies. The web tool allows the researchers to update and access miRNA annotation based on published articles and to use our composed functional similarity method.

2 Material

After searching and studying several different databases in the context of miRNAs, we chose those that best compose a robust integrative information set. The following describes the miRNA databases that we used to create our own database based on the OMIT structure, as well as some ontologies that we also integrated to MIRIA:

 Ontology for MicroRNA Target (OMIT): The first ontology on the miRNA domain created to facilitate knowledge acquisition from existing sources. The miRNA concept and its relations with other concepts are represented in Figure 11.2. With this structure, we can retain a set of information related to a specific miRNA: organisms where it occurs, diseases regulated by it, its target genes, proteins regulated by it, related processes, and its nature as oncogenic or as a tumor suppressor.

f11-02-9780128025086
Figure 11.2 MiRNA concepts and their relations in OMIT.

 GO: The most famous and widely used ontology on genome domain, which provides a controlled vocabulary of terms used to describe gene product characteristics and annotation data. The aim of this project is standardizing the representation of gene and gene product attributes across species and databases. With this ontology, we can represent and compare the target genes of given miRNAs.

 MeSH: The National Library of Medicine’s controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that allows searching at various levels of specificity. It is also used as a disease classification system. Using this “ontology,” we can represent and compare the diseases regulated by given miRNAs.

 miRBase: A database of published miRNA sequences and annotation, including the species to which miRNAs belong. This database contains the basic information related to miRNAs.

 HMDD: A database with human microRNA-disease association data, which is manually collected from publications. Using this database, we can obtain the set of diseases related to a given miRNA (Lu et al., 2008).

 TarBase: The largest available manually curated target database, indexing more than 65,000 miRNA-gene interactions experimentally tested. From this database, we can obtain the set of target genes of a given miRNA (Sethupathy et al., 2006).

 Genetic Association Database (GAD): A public repository containing non-Mendelian, common complex disorders (diseases) associated with human genes that aim to standardize and archive genetic association study data. We used this database to verify redundancies concerning diseases and target genes, in order to calculate the similarity (Zhang et al., 2010).

 Human MicroRNA oncogenic and tumor suppressors: Contains human miRNAs that have been identified as oncogenic or tumor suppressors by manually reviewing publications (Wang et al., 2010).

All of these information sources were integrated in our MIRIA database so that we could collect a useful set of information about miRNAs. In addition, the direct information about miRNAs associated with related gene and disease information gathered in their specific ontologies allow the computation of the functional similarity measure that we proposed.

3 MIRIA Database

The MIRIA database was developed using the Java language at NetBeans IDE and the database management system PostgreSQL. As we used the OMIT structure to develop our relational database, each concept from the ontology was represented as a table, and the associations between the concepts were represented by relationships. We chose the data sources cited in section 2 to populate our database because all of them use miRNA nomenclature as defined at miRBase; TarBase and GAD use the gene nomenclature Human Genome Organization (HUGO); and HMDD and GAD use disease nomenclature as defined in MeSH. All of the selected databases were obtained from their most recent published version, and some of them had their data preprocessed. For example, GAD has repeated instances of some diseases caused by insignificant differences such as changes in plurality or the appearance or omission of the letter s after an apostrophe; and other databases have similar typos.

Information about miRNAs, such as name, type, and sequence were taken from miRBase. Initially, we considered only human miRNAs, representing the inOrganism relationship. Information about miRNAs acting as oncogenic or tumor suppressors to populate isOncogenic and isTumorSuppressor relationships was extracted from the database of Human MicroRNAs Oncogenic and Tumor Suppressors. From HMDD, we obtained associations between miRNAs and human diseases, represented by the regulateDisease relationship, and we took from GAD direct associations between genes and human diseases to check for the existence of redundancies. Finally, from TarBase, it was possible to collect information about most relationships involving miRNAs in our database (e.g., hasTarget, regulateProtein, regulatePathologicalEvent, and experimentContains.

Despite the fact that OMIT ontology does not have in its structure the concept of the family to which a given miRNA belongs, and given the importance of this information, we implemented in MIRIA a table to store information about the families, as well as the relationship between records of miRNAs and their families. This information was derived from miRBase.

Furthermore, GO and MeSH ontologies were downloaded from their respective websites and integrated to MIRIA. Moreover, as the miRBase website has a tree of all the species from which records of miRNAs have been discovered, we implemented an ontology of organisms and integrated it into our database.

In order to make easy the recovery of the sources of each record pertaining to relationships in our database (e.g., records of associations between miRNAs and diseases, miRNAs and target genes, or a direct association between genes and diseases) and to guarantee MIRIA data reliability, we created a table containing information of their respective publications, such as PubMed ID, title, authors, year of publication, and a field for notes.

4 MiRNA CFSim

After the creation and instantiation of the MIRIA database, we decided to implement a method to evaluate the degree of similarity between two miRNAs. Our proposal was to make this calculation indirectly, using the similarity between the target genes and between associated diseases to each miRNA. For this, we studied various semantic similarity methods based on ontologies existing in the literature to calculate the similarity between two genes using GO or between two diseases using MeSH. We selected the Wang method (Wang et al., 2007) to achieve this because it is a hybrid method, based in edges and nodes. In other words, the Wang method considers not only the position of the terms being compared within an ontology, but also their semantic relations with their ancestors.

The proposed method, composed functional similarity (CFSim), considers the existence of a set of information about the two miRNAs that are being compared, so that all this information is considered in the calculation of similarity. This may involve different similarity values obtained for different categories of information. For example, if both miRNAs have annotations in the category of target genes, we calculate the similarity between the target genes based on GO; in addition, if they have annotations in the disease category, we calculate the similarity between the diseases. Thus, the functional similarity is characterized here as “composed.” Then, to calculate the CFSim between two miRNAs, we used the Wang method to the categories of information that have related ontologies, calculated the similarity for these categories, and then combined their values based on weights and redundancies.

As miRNAs belonging to the same family have very similar sequences and identical seed regions, regulating a common set of target genes and therefore having high functional similarity, we used miRNA families to validate our method. All implementations of CFSim and evaluation of this method was made in the C# language at Microsoft Visual Studio, accessing the database through PostgreSQL.

5 Web Framework

In order to make feasible the access of researchers to our integrated database, we developed a web framework with the following functional requirements:

 Search/insert/update/delete annotations about an miRNA, such as name, sequence, organism, family, functions, and its action as oncogenic or tumor suppressor

 Search/insert/update/delete annotations about associations between a miRNA and target genes

 Search/insert/update/delete annotations about miRNA-disease associations

On the other hand, the MIRIA environment also has the following nonfunctional requirements:

 User-friendly interface: The web framework interface should present its features in an intuitive way to the user.

 Response time: The environment must submit a short response time while executing its functions.

 Usability: The environment should be as simple as possible to enable the user to perform its tasks without any training and with the greatest possible satisfaction.

 Portability: The environment must allow access from any machine wherever the user is, so long as it is connected to a network and no installation is required.

Thus, the implementation of this web framework began with its integration with MIRIA in order to permit access to and update of the miRNA annotations. To search for annotations, we implemented methods that look for miRNA information and their relations with target genes and diseases in MIRIA. On the other hand, for the insert, update, and delete functionalities of miRNA annotations, we developed methods that execute these functions in our integrated database. Finally, to calculate functional similarity, the CFSim method described in section 4 was implemented.

This web framework was developed using C# at Visual Studio and the access to the annotations contained in the MIRIA database, as well as the insertions, updates, and deletions of information defined in PostgreSQL.

6 Results

In order to achieve the objectives of the MIRIA platform proposal in an efficient and user-friendly way, we designed the framework interface by organizing its functions into two sections in the interface of the web framework: Home and Insert/Delete.

The Home section can be accessed through the first tab in the interface, which is the home page of the framework. On this tab, the user can search for information about an miRNA of interest (using the Search button) and calculate the functional similarity between two miRNAs through CFSim (using the Calculate button), as shown in Figure 11.3.

f11-03-9780128025086
Figure 11.3 The Home tab of the web framework.

When conducting a search for a specific miRNA annotation, the framework returns all data about the miRNA of interest contained in the unified MIRIA database, organized in frames. Then, basic information about the miRNA of interest, such as name, family, sequence, organization, action (oncogenic or tumor suppressor), and its functions, are shown in the first frame. The second frame shows information about the associations of the miRNA with its target genes, such as name and symbol of the target gene, Ensembl code, type of experiment used on its validation, related pathological events, and the PubMedID from which this information was taken. Finally, the third frame presents the user with associations between the miRNA and various human diseases, with information such as the name of each disease, the pubmedID of the publication from where this data was taken, and some notes. An example of a search result (showing only part of the data due to space limitations) is presented in Figure 11.4.

f11-04-9780128025086
Figure 11.4 An example of the results of a search for information about an miRNA of interest.

When a user performs the calculation of similarity between two miRNAs, the system returns a numeric value consisting of the composed functional similarity. Also, some more detailed descriptions of both miRNAs are shown, such as names, families, related target genes, associated diseases, and if they act as oncogenic or tumor suppressors. Figure 11.5 presents an example of functional similarity calculated using CFSim for miRNAs belonging to distinct families.

f11-05-9780128025086
Figure 11.5 An example of CFSim applied on the miRNAs hsa-let-7d and hsa-mir-221.

The second tab, Insert/Delete, shown in Figure 11.6, allows the user to update the records in the MIRIA database through the insertion or modification of new information through clicking the Save button and delete existing information through clicking the Delete button. If the user wants to clear all fields to perform a new search or insert or update, he or she must click the Clear button.

f11-06-9780128025086
Figure 11.6 The Insert/Delete tab of the web framework.

When inserting a new miRNA, initially the user must search for information about the miRNA of interest, ensuring that it does not already exist in the database. If it is not present in MIRIA, the user can add new annotations about the new miRNA by clicking the blue plus sign buttons. In the insertion of each new association of a disease with an miRNA, the framework allows the user to type and add a new disease (Figure 11.7) or to choose an already existing disease from the database (Figure 11.8). The same is true for associations between the miRNAs of interest and target genes.

f11-07-9780128025086
Figure 11.7 The user can insert a new disease in MIRIA to associate it to an miRNA.
f11-08-9780128025086
Figure 11.8 The user can choose an already-existing disease in MIRIA to associate it with a miRNA.

On the other hand, if there is already information about this miRNA in the MIRIA database, the user can change or remove it by clicking the red X buttons in Figure 11.6. Whenever a user chooses to remove some information, a dialog box pops up for the user to click to actively confirm the deletion.

It is important to emphasize that the specification of pubmedID of the publication from which the information was taken is required to ensure data integrity in the MIRIA database. Along with the pubmedID, the user can add the name of the lead author and the year of publication (in citation form) and some notes. Consequently, there is also a feature where the user can view more detailed information about a publication by clicking on the magnifier icon represented by an eye in Figure 11.6.

Finally, there is a third tab, About, which contains the description of the framework and CFSim methodm as well as the contact of the project leaders and developers. The MIRIA web framework can be accessed at http://143.107.137.43/?pg=cfsim.

7 Conclusions

Currently, there are several different databases containing information related to miRNAs. However, the lack of standardization and integration of many of them makes it difficult for researchers to study, analyze, and compare miRNAs. Therefore, after studying different existing databases and environments described in the literature, we created a platform called MIRIA that includes an integrative database plus a method for miRNA comparison and a web framework for data maintenance and analysis. The MIRIA database is composed by annotations of miRNAs from data taken from heterogeneous existing sources, respecting the standardization of miRBase, gene nomenclature defined by HUGO, and disease nomenclature defined by MeSH. In addition, the MIRIA database is organized with respect to the structure of OMIT ontology.

Furthermore, the web framework enables the user to search, insert, update, and delete information in the MIRIA database, and the CFSim method can be used to calculate the composed functional similarity between two miRNAs. MIRIA integrates several information about miRNAs originated from different databases and presents it in a standardized and structurally well organized way. The web framework presents a user-friendly interface that allows and facilitates miRNA data search, acquisition, and update from the existing literature by experts in this field.

In conclusion, the use of MIRIA has the potential to help researchers in biomedicine to better understand the important roles that miRNAs play in biological processes, their functions, and their associations with several human diseases, especially considering the functional similarity between them.

References

Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355.

Betel D, et al. The microRNA.org resource: targets and expression. Nucleic Acids Res. 2008;36:149–153.

Brennecke J, et al. Principles of microRNA–target recognition. PLoS Biol. 2005;3(3):e85.

Esquela-Kerscher A, Slack FJ. Oncomirs - microRNAs with a role in cancer. Nat. Rev. Cancer. 2006;6:259–269.

Ferracin M, Negrini M. MicroRNAs and Their Role in Cancer. eLS. John Wiley & Sons, Ltd; 2012.

Gene Ontology Consortium. Gene Ontology. 2015. [Online] Available from: http://www.geneontology.org/ (accessed 02.03.15.).

Griffiths-Jones S, et al. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36:154–158.

Hsu SD, et al. miRNAMap 2.0: genomic maps of microRNAs in metazoan genomes. Nucleic Acids Res. 2008;36(Database issue):165–169.

Huang J, et al. Ontology for MiRNA target prediction in human cancer. In: First ACM International Conference on Bioinformatics and Computational Biology. Niagara Falls, NY; New York, NY, USA: ACM; 2010:472–474.

Jiang Q, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37:98–104.

Lewis BP, et al. Prediction of mammalian microRNA targets. Cell. 2003;115:787–798.

Lu M, et al. An analysis of human microRNA and disease associations. PLoS One. 2008;3(10):e3420. doi:10.1371/journal.pone.0003420.

miRBase. miRBase: the microRNA database. 2015. [Online] Available from: http://www.mirbase.org/ (accessed 02.03.15.).

Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA. 2006;12:192–197.

U. S. National Library of Medicine. Medical Subject Headings. 2015. [Online] Available from: http://www.nlm.nih.gov/mesh (accessed 02.03.15.).

Wang JZ, et al. A new method to measure the semantic similarity of GO terms. Bioinformatics (Oxford, England). 2007;23:1274–1281.

Wang D, et al. Human MicroRNA oncogenes and tumor suppressors show significantly different biological patterns: from functions to targets. PLoS One. 2010;5(9):e13067. doi:10.1371/journal.pone.0013067.

Zhang, et al. Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information. BMC Med. Genom. 2010;3:1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset