Semantic-enabled Hybrid Genetic Disease Diagnostics in Next-Generation Sequenced Data


  • Emilia Zawadzka-Gosk Polish-Japanese Academy of Information Technology
  • Krzysztof Wołk Polish-Japanese Academy of Information Technology



Next Generation Sequencing is a technology for genome sequencing used in genetics for diseased diagnosis. NGS provides the list of all mutations in a genome, so identifying the one which causes a disease is not trivial. A number of applications for variant prioritization was developed, but the data they provide is rather a suggestion than a diagnosis, moreover they suffer from issues as identifying nonpathogenic variant as a causal one or inability to identify the causal gene. These issues inspired us to create a strategy for variant prioritization which includes the use of Exomiser and OmimExplorer result sets improved by semantic analysis of abstracts and articles freely available from PubMed and PubMed Central databases. For the wider scope of scientific articles Google Scholar repository will be used. Described approach enables to present latest and most accurate information about potential pathogenic variants.


Download data is not yet available.


Aerts S., Lambrechts D., Maity S., Van Loo P., Coessens B., De Smet F., Tranchevent L.C., De Moor B., Marynen P., Hassan B., Carmeliet P., Moreau Y.: Gene prioritization through genomic data fusion, Nature Biotechnology, vol. 24(6), pp. 537–544, 2006.

Amberger J., Bocchini C.A., Scott A.F., Hamosh A.: McKusick’s online Mendelian inheritance in man (OMIM R ), Nucleic Acids Research, vol. 37(suppl 1), pp. D793–D796, 2009.

Amberger J.S., Bocchini C.A., Schiettecatte F., Scott A.F., Hamosh A.: OMIM. org: Online Mendelian Inheritance in Man (OMIM R ). An online catalog of human genes and genetic disorders, Nucleic Acids Research, vol. 43(D1), pp. D789–D798, 2015.

Behjati S., Tarpey P.S.: What is next generation sequencing? Archives of Disease in Childhood. Education & Practice Edition, vol. 98(6), pp. 236–238, 2013.

Brown S.D., Moore M.W.: The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping, Mammalian Genome, vol. 23(9-10), pp. 632–640, 2012.

Bult C.J., Eppig J.T., Kadin J.A., Richardson J.E., Blake J.A., Mouse Genome Database Group: The Mouse Genome Database (MGD): mouse biology and model systems, Nucleic Acids Research, vol. 36(suppl 1), pp. D724–D728, 2008.

Consortium G.O.: Gene ontology consortium: going forward, Nucleic Acids Research, vol. 43(D1), pp. D1049–D1056, 2015.

Cowley M.J., Pinese M., Kassahn K.S., Waddell N., Pearson J.V., Grimmond S.M., Biankin A.V., Hautaniemi S., Wu J.: PINA v2.0: mining interactome modules, Nucleic Acids Research, pp. D862–D865, 2011.

Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T.: The variant call format and VCFtools, Bioinformatics, vol. 27(15), pp. 2156–2158, 2011.

Eppig J.T., Blake J.A., Bult C.J., Kadin J.A., Richardson J.E., Mouse Genome Database Group: The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease, Nucleic Acids Research, vol. 43(D1), pp. D726–D736, 2015.

ExAC Browser.

Falagas M.E., Pitsouni E.I., Malietzis G.A., Pappas G.: Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses, The FASEB Journal, vol. 22(2), pp. 338–342, 2008.

Gormley C., Tong Z.: Elasticsearch: The Definitive Guide, O’Reilly Media, Inc., 2015.

Grady L.: Random walks for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28(11), pp. 1768–1783, 2006.

Green R.C., Berg J.S., Grody W.W., Kalia S.S., Korf B.R., Martin C.L., McGuire A.L., Nussbaum R.L., O’Daniel J.M., Ormond K.E.: ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing, Genetics in Medicine, vol. 15(7), pp. 565–574, 2013.

Groza T., Köhler S., Doelken S., Collier N., Oellrich A., Smedley D., Couto F.M., Baynam G., Zankl A., Robinson P.N.: Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora, Database, vol. 2015, bav005, pp. 1–13, 2015.

James R.A., Campbell I.M., Chen E.S., Boone P.M., Rao M.A., Bainbridge M.N., Lupski J.R., Yang Y., Eng C.M., Posey J.E., Shaw C.A.: A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics, Genome Medicine, vol. 8(1), pp. 1–17, 2016.

Jamsheer A., Olech E.M., Kozlowski K., Niedziela M., Sowińska-Seidler A., Obara-Moszyńska M., Latos-Bieleńska A., Karczewski M., Zemojtel T., Shaw C.A.: Exome sequencing reveals two novel compound heterozygous XYLT1 mutations in a Polish patient with Desbuquois dysplasia type 2 and growth hormone deficiency, Journal of Human Genetics, vol. 61(7), pp. 577–583, 2016.

Javed A., Agrawal S., Ng P.C.: Phen-Gen: combining phenotype and genotype to analyze rare disorders, Nature Methods, vol. 11(9), pp. 935–937, 2014.

Köhler S., Schulz M.H., Krawitz P., Bauer S., Dölken S., Ott C.E., Mundlos C., Horn D., Mundlos S., Robinson P.N.: Clinical diagnostics in human genetics with semantic similarity searches in ontologies, The American Journal of Human Genetics, vol. 85(4), pp. 457–464, 2009.

Kibbe W.A., Arze C., Felix V., Mitraka E., Bolton E., Fu G., Mungall C.J., Binder J.X., Malone J., Vasant D.: Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data, Nucleic Acids Research, vol. 43(D1), pp. D1071–D1078, 2015.

Kilicoglu H., Shin D., Fiszman M., Rosemblat G., Rindflesch T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications, Bioinformatics, vol. 28(23), pp. 3158–3160, 2012.

Kononenko O., Baysal O., Holmes R., Godfrey M.W.: Mining modern repositories with elasticsearch. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 328–331. ACM, 2014.

Landrum M.J., Lee J.M., Riley G.R., Jang W., Rubinstein W.S., Church D.M., Maglott D.R.: ClinVar: public archive of relationships among sequence variation and human phenotype, Nucleic Acids Research, vol. 42(D1), pp. D980–D985, 2014.


Li M.X., Kwan J.S.H., Bao S.Y., Yang W., Ho S.L., Song Y.Q., Sham P.C.: Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies, PLoS Genetics, vol. 9(1), p. e1003143, 2013.

Little R.D., Folz C., Manning S.P., Swain P.M., Zhao S.C., Eustace B., Lappe M.M., Spitzer L., Zweier S., Braunschweiger K.: A mutation in the LDL receptor-related protein 5 gene results in the autosomal dominant high-bone-mass trait, The American Journal of Human Genetics, vol. 70(1), pp. 11–19, 2002.

Masino A.J., Dechene E.T., Dulik M.C., Wilkens A., Spinner N., Krantz I.D., Pennington J.W., Robinson P.N., White P.S.: Clinical phenotype-based gene prioritization: An initial study using semantic similarity and the human phenotype ontology, BMC Bioinformatics, vol. 15(1), pp. 1–11, 2014.

PhenIX – Phenotypic Interpretation of eXomes.

Phevor: Phenotype Driven Variant Ontological Re-ranking tool.

Resnik P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 1, pp. 448–453, 1995.

Richards C.S., Bale S., Bellissimo D.B., Das S., Grody W.W., Hegde M.R., Lyon E., Ward B.E., the Molecular Subcommittee of the ACMG Laboratory Quality Assurance Committee: ACMG recommendations for standards for interpretation and reporting of sequence variations: Revisions 2007, Genetics in Medicine, vol. 10(4), pp. 294–300, 2008.

Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E., Voelkerding K., Rehm H.L., ACMG Laboratory Quality Assurance Committee: Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology, Genetics in Medicine, vol. 17(5), pp. 405–423, 2015.

Roberts R.J.: PubMed Central: The GenBank of the published literature. In: Proceedings of the National Academy of Sciences, vol. 98(2), pp. 381–382, 2001.

Robinson P.N., Köhler S., Bauer S., Seelow D., Horn D., Mundlos S.: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease, The American Journal of Human Genetics, vol. 83(5), pp. 610–615, 2008.

Robinson P.N., Köhler S., Oellrich A., Wang K., Mungall C.J., Lewis S.E., Washington N., Bauer S., Seelow D., Krawitz P.: Improved exome prioritization of disease genes through cross-species phenotype comparison, Genome Research, vol. 24(2), pp. 340–348, 2014.

Schwarz J.M., Rödelsperger C., Schuelke M., Seelow D.: MutationTaster evaluates disease-causing potential of sequence alterations, Nature Methods, vol. 7(8), pp. 575–576, 2010.

Sifrim A., Popovic D., Tranchevent L.C., Ardeshirdavani A., Sakai R., Konings P., Vermeesch J.R., Aerts J., De Moor B., Moreau Y.: eXtasy: variant prioritization by genomic data fusion, Nature Methods, vol. 10(11), pp. 1083–1084, 2013.

Singleton M.V., Guthery S.L., Voelkerding K.V., Chen K., Kennedy B., Margraf R.L., Durtschi J., Eilbeck K., Reese M.G., Jorde L.B.: Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families, The American Journal of Human Genetics, vol. 94(4), pp. 599–610, 2014.

Siva N.: 1000 Genomes project, Nature Biotechnology, vol. 26(3), pp. 256–256, 2008.

Smedley D., Jacobsen J.O., Jäger M., Köhler S., Holtgrewe M., Schubach M., Siragusa E., Zemojtel T., Buske O.J., Washington N.L., Bone W.P., Haendel M.A., Robinson P.N.: Next-generation diagnostics and disease-gene discovery with the Exomiser, Nature Protocols, vol. 10(12), pp. 2004–2015, 2015.

Smedley D., Köhler S., Czeschik J.C., Amberger J., Bocchini C., Hamosh A., Veldboer J., Zemojtel T., Robinson P.N.: Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases, Bioinformatics, vol. 30(22), pp. 3215–3222, 2014.

Smedley D., Oellrich A., Köhler S., Ruef B., Westerfield M., Robinson P., Lewis S., Mungall C.: PhenoDigm: analyzing curated annotations to associate animal models with human diseases, Database, vol. 2013, p. bat025, 2013.

Smith C.L., Goldsmith C.A., Eppig J.T.: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information, Genome Biology, vol. 6(R7), pp. R7.1–R7.9, 2004.

Wickelmaier F.: An introduction to MDS. In: Sound Quality Research Unit, Aalborg University, Denmark, vol. 46, 2003.

Yandell M., Huff C., Hu H., Singleton M., Moore B., Xing J., Jorde L.B., Reese M.G.: A probabilistic disease-gene finder for personal genomes, Genome Research, vol. 21(9), pp. 1529–1542, 2011.

Zemojtel T., Köhler S., Mackenroth L., Jäger M., Hecht J., Krawitz P., Graul-Neumann L., Doelken S., Ehmke N., Spielmann M., Oien N.C., Schweiger M.R., Krüger U., Frommer G., Fischer B., Kornak U., Flöttmann R., Ardeshirdavani A., Moreau Y., Lewis S.E., Haendel M., Smedley D., Horn D., Mundlos S., Robinson P.N.: Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome, Science Translational Medicine, vol. 6(252), pp. 252ra123, 2014.




How to Cite

Zawadzka-Gosk, E., & Wołk, K. (2018). Semantic-enabled Hybrid Genetic Disease Diagnostics in Next-Generation Sequenced Data. Computer Science, 19(2), 179.