Machine Learning and Deep Learning in Genetics and Genomics

In this chapter, we introduce various machine learning (ML) methods and deep learning (DL) algorithms, commonly adopted in genomics data analysis. We begin with a general introduction of genomics data and present a multi-omics study investigating early childhood oral health. We then review statistical methods and ML/DL methods and their application in genomics data analysis that include the following aspects: (1) association between genetic markers, mostly single nucleotide polymorphisms (SNPs), and complex diseases or traits in genome-wide association studies (GWAS), (2) copy number variation (CNV), and single nucleotide variant (SNV) calling in whole genome sequencing (WGS) or whole exome sequencing (WES) data of tumor samples, (3) association between DNA methylation status and phenotypes, which are commonly referred to as epigenome-wide association studies (EWAS), (4) analysis of genome-wide high-throughput chromosome conformation capture (Hi-C) data, (5) inference related to transcription factor binding sites (TF), and (6) single-cell RNA-seq data analysis. To complete the review, we present the results of a systematic review of the machine learning landscape in oral diseases. We conclude with a discussion of potential future applications of ML/DL in genetics and genomics in oral health.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

eBook EUR 74.89 Price includes VAT (France)

Softcover Book EUR 94.94 Price includes VAT (France)

Hardcover Book EUR 137.14 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

RNA Sequencing and Genetic Disease

Article 21 June 2016

Data mining and machine learning approaches for the integration of genome-wide association and methylation data: methodology and main conclusions from GAW20

Article Open access 17 September 2018

The Origin of Personalized Medicine and the Systems Biology Revolution

Chapter © 2017

References

  1. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943;5(4):115–33. ArticleGoogle Scholar
  2. Park WJ, Park J-B. History and application of artificial neural networks in dentistry. Eur J Dent. 2018;12(04):594–601. ArticlePubMedPubMed CentralGoogle Scholar
  3. Lin E, Lane H-Y. Machine learning and systems genomics approaches for multi-omics data. Biomarker Res. 2017;5(1):2. ArticleGoogle Scholar
  4. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Informn Proc Syst. 2012;25:1097–105. Google Scholar
  5. Hung M, Voss MW, Rosales MN, Li W, Su W, Xu J, et al. Application of machine learning for diagnostic prediction of root caries. Gerodontology. 2019;36(4):395–404. ArticlePubMedPubMed CentralGoogle Scholar
  6. Liu Z, Liu J, Zhou Z, Zhang Q, Wu H, Zhai G, et al. Differential diagnosis of ameloblastoma and odontogenic keratocyst by machine learning of panoramic radiographs. Int J Comput Assist Radiol Surg. 2021;16(3):415–22 ArticlePubMedPubMed CentralGoogle Scholar
  7. Abdalla-Aslan R, Yeshua T, Kabla D, Leichter I, Nadler C. An artificial intelligence system using machine-learning for automatic detection and classification of dental restorations in panoramic radiography. Oral Surg Oral Med Oral Pathol Oral Radiol. 2020;130(5):593–602. ArticlePubMedGoogle Scholar
  8. Xie X, Wang L, Wang A. Artificial neural network modeling for deciding if extractions are necessary prior to orthodontic treatment. Angle Orthod. 2010;80(2):262–6. ArticlePubMedPubMed CentralGoogle Scholar
  9. Montenegro RD, Oliveira AL, Cabral GG, Katz CR, Rosenblatt A. A comparative study of machine learning techniques for caries prediction. In: 2008 20th IEEE International Conference on tools with artificial intelligence. Piscataway, NJ: IEEE; 2008. p. 477–81. ChapterGoogle Scholar
  10. Patil S, Habib Awan K, Arakeri G, Jayampath Seneviratne C, Muddur N, Malik S, et al. Machine learning and its potential applications to the genomic study of head and neck cancer—a systematic review. J Oral Pathol Med. 2019;48(9):773–9. ArticlePubMedGoogle Scholar
  11. Kebschull M, Papapanou PN. Exploring genome-wide expression profiles using machine learning techniques. Methods Oral Biol. 2017;1537:347–64. Springer ArticleGoogle Scholar
  12. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype–phenotype interactions. Nat Rev Genet. 2015;16(2):85–97. ArticlePubMedGoogle Scholar
  13. Misra BB, Langefeld C, Olivier M, Cox LA. Integrated omics: tools, advances and future approaches. J Mol Endocrinol. 2019;62(1):R21–45. ArticleGoogle Scholar
  14. Fröhlich H, Patjoshi S, Yeghiazaryan K, Kehrer C, Kuhn W, Golubnitschaja O. Premenopausal breast cancer: potential clinical utility of a multi-omics based machine learning approach for patient stratification. EPMA J. 2018;9(2):175–86. ArticlePubMedPubMed CentralGoogle Scholar
  15. Divaris K. Fundamentals of precision medicine. Compend Contin Educ Dent. 2017;38(8 Suppl):30–2. PubMedPubMed CentralGoogle Scholar
  16. Selwitz RH, Ismail AI, Pitts NB. Dental caries. Lancet. 2007;369(9555):51–9. https://doi.org/10.1016/S0140-6736(07)60031-2. ArticlePubMedGoogle Scholar
  17. Divaris K. Predicting dental caries outcomes in children: a “risky” concept. J Dent Res. 2016;95(3):248–54. https://doi.org/10.1177/0022034515620779. ArticlePubMedGoogle Scholar
  18. Burne RA, Zeng L, Ahn SJ, Palmer SR, Liu Y, Lefebure T, et al. Progress dissecting the oral microbiome in caries and health. Adv Dent Res. 2012;24(2):77–80. https://doi.org/10.1177/0022034512449462. ArticlePubMedPubMed CentralGoogle Scholar
  19. Marsh PD. Microbial ecology of dental plaque and its significance in health and disease. Adv Dent Res. 1994;8(2):263–71. https://doi.org/10.1177/08959374940080022001. ArticlePubMedGoogle Scholar
  20. Nyvad B, Crielaard W, Mira A, Takahashi N, Beighton D. Dental caries from a molecular microbiological perspective. Caries Res. 2013;47(2):89–102. https://doi.org/10.1159/000345367. ArticlePubMedGoogle Scholar
  21. Falsetta ML, Klein MI, Colonne PM, Scott-Anne K, Gregoire S, Pai CH, et al. Symbiotic relationship between Streptococcus mutants and Candida albicans synergizes virulence of plaque biofilms in vivo. Infect Immun. 2014;82(5):1968–81. https://doi.org/10.1128/IAI.00087-14. ArticlePubMedPubMed CentralGoogle Scholar
  22. Delisle AL, Guo M, Chalmers NI, Barcak GJ, Rousseau GM, Moineau S. Biology and genome sequence of Streptococcus mutans phage M102AD. Appl Environ Microbiol. 2012;78(7):2264–71. https://doi.org/10.1128/AEM.07726-11. ArticlePubMedPubMed CentralGoogle Scholar
  23. Divaris K, Joshi A. The building blocks of precision oral health in early childhood: the ZOE 2.0 study. J Public Health Dent. 2018;80(Suppl 1):S31–6. https://doi.org/10.1111/jphd.12303. ArticlePubMedGoogle Scholar
  24. Ginnis J, Ferreira Zandona AG, Slade GD, Cantrell J, Antonio ME, Pahel BT, et al. Measurement of early childhood Oral health for research purposes: dental caries experience and developmental defects of the enamel in the primary dentition. Methods Mol Biol. 1922;2019:511–23. https://doi.org/10.1007/978-1-4939-9012-2_39. ArticleGoogle Scholar
  25. Divaris K, Shungin D, Rodriguez-Cortes A, Basta PV, Roach J, Cho H, et al. The Supragingival biofilm in early childhood caries: clinical and laboratory protocols and bioinformatics pipelines supporting metagenomics, Metatranscriptomics, and metabolomics studies of the Oral microbiome. Methods Mol Biol. 1922;2019:525–48. https://doi.org/10.1007/978-1-4939-9012-2_40. ArticleGoogle Scholar
  26. Haworth S, Esberg A, Lif Holgerson P, Kuja-Halkola R, Timpson NJ, Magnusson PKE, et al. Heritability of caries scores, trajectories, and disease subtypes. J Dent Res. 2020;99(3):264–70. https://doi.org/10.1177/0022034519897910. ArticlePubMedGoogle Scholar
  27. Shaffer JR, Feingold E, Wang X, Tcuenco KT, Weeks DE, DeSensi RS, et al. Heritable patterns of tooth decay in the permanent dentition: principal components and factor analyses. BMC Oral Health. 2012;12:7. https://doi.org/10.1186/1472-6831-12-7. ArticlePubMedPubMed CentralGoogle Scholar
  28. GlobalSurg C. Writing g, patient r, statistical a, protocol d, project s, et al. global variation in anastomosis and end colostomy formation following left-sided colorectal resection. BJS Open. 2019;3(3):403–14. https://doi.org/10.1002/bjs5.50138. ArticleGoogle Scholar
  29. Divaris K. Searching deep and wide: advances in the molecular understanding of dental caries and periodontal disease. Adv Dent Res. 2019;30(2):40–4. https://doi.org/10.1177/0022034519877387. ArticlePubMedPubMed CentralGoogle Scholar
  30. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46. https://doi.org/10.1186/gb-2014-15-3-r46. ArticlePubMedPubMed CentralGoogle Scholar
  31. Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–86. https://doi.org/10.1101/gr.5969107. ArticlePubMedPubMed CentralGoogle Scholar
  32. Brady A, Salzberg S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat Methods. 2011;8(5):367. https://doi.org/10.1038/nmeth0511-367. ArticlePubMedPubMed CentralGoogle Scholar
  33. Craig J. Complex diseases: research and applications. Nature Education. 2008;1(1):184. Google Scholar
  34. The Human Genome Project. https://www.genome.gov/human-genome-project. 2018; Accessed 2020.
  35. The International HapMap Consortium. The international HapMap project. Nature. 2003;426(6968):789–96. ArticleGoogle Scholar
  36. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–320. ArticlePubMed CentralGoogle Scholar
  37. The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–61. ArticlePubMed CentralGoogle Scholar
  38. The International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467(7311):52–8. https://doi.org/10.1038/nature09298. ArticleGoogle Scholar
  39. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–73. http://www.nature.com/nature/journal/v467/n7319/abs/nature09534.html#supplementary-informationArticlePubMed CentralGoogle Scholar
  40. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. https://doi.org/10.1038/nature11632. ArticlePubMedGoogle Scholar
  41. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. https://doi.org/10.1038/nature15393. ArticleGoogle Scholar
  42. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45(D1):D896–d901. https://doi.org/10.1093/nar/gkw1133. ArticlePubMedGoogle Scholar
  43. Zhang Y, Liu JS. Bayesian inference of epistatic interactions in case-control studies. Nat Genet. 2007;39(9):1167–73. ArticlePubMedGoogle Scholar
  44. Han B, Chen X-W, Talebizadeh Z. FEPI-MB: identifying SNPs-disease association using a Markov Blanket-based approach. BMC Bioinform. 2011;12(Suppl 12):S3. ArticleGoogle Scholar
  45. Uppu S, Krishna A, Gopalan RP. A review on methods for detecting SNP interactions in high-dimensional genomic data. IEEE/ACM Trans Comput Biol Bioinform. 2016;15(2):599–612. ArticlePubMedGoogle Scholar
  46. Jiang R, Tang W, Wu X, Fu W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinform. 2009;10(1):S65. ArticleGoogle Scholar
  47. De Lobel L, Geurts P, Baele G, Castro-Giner F, Kogevinas M, Van Steen K. A screening methodology based on random forests to improve the detection of gene–gene interactions. Eur J Hum Genet. 2010;18(10):1127–32. ArticlePubMedPubMed CentralGoogle Scholar
  48. Yoshida M, Koike A. SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinform. 2011;12(1):469. ArticleGoogle Scholar
  49. Schwarz DF, König IR, Ziegler A. On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics. 2010;26(14):1752–8. ArticlePubMedPubMed CentralGoogle Scholar
  50. Wu Q, Ye Y, Liu Y, Ng MK. SNP selection and classification of genome-wide SNP data using stratified sampling random forests. IEEE Trans Nanobioscience. 2012;11(3):216–27. ArticlePubMedGoogle Scholar
  51. Lin HY, Ann Chen Y, Tsai YY, Qu X, Tseng TS, Park JY. TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions. Ann Hum Genet. 2012;76(1):53–62. ArticlePubMedGoogle Scholar
  52. Pan Q, Hu T, Malley JD, Andrew AS, Karagas MR, Moore JH. Supervising random forest using attribute interaction networks. European conference on evolutionary computation, machine learning and data mining in bioinformatics. Berlin: Springer; 2013. p. 104–16. Google Scholar
  53. Chen SH, Sun J, Dimitrov L, Turner AR, Adams TS, Meyers DA, et al. A support vector machine approach for detecting gene-gene interaction. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society. 2008;32(2):152–67. ArticleGoogle Scholar
  54. Özgür A, Vu T, Erkan G, Radev DR. Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics. 2008;24(13):i277–i85. ArticlePubMedPubMed CentralGoogle Scholar
  55. Shen Y, Liu Z, Ott J. Support vector machines with L 1 penalty for detecting gene-gene interactions. Int J Data Min Bioinform. 2012;6(5):463–70. ArticlePubMedGoogle Scholar
  56. Fang YH, Chiu YF. SVM-based generalized multifactor dimensionality reduction approaches for detecting gene-gene interactions in family studies. Genet Epidemiol. 2012;36(2):88–98. ArticlePubMedGoogle Scholar
  57. Marvel S, Motsinger-Reif A. Grammatical evolution support vector machines for predicting human genetic disease association. Proceedings of the 14th annual conference companion on Genetic and evolutionary computation 2012. p. 595–8. Google Scholar
  58. Zhang H, Wang H, Dai Z, Chen M-S, Yuan Z. Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform. 2012;13(1):298. ArticleGoogle Scholar
  59. Lin Y, Jeon Y. Random forests and adaptive nearest neighbors. J Am Stat Assoc. 2006;101(474):578–90. https://doi.org/10.1198/016214505000001230. ArticleGoogle Scholar
  60. Koo CL, Liew MJ, Mohamad MS, Salleh M, Hakim A. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. Biomed Res Int. 2013;2013:432375. ArticlePubMedPubMed CentralGoogle Scholar
  61. Roller E, Ivakhno S, Lee S, Royce T, Tanner S. Canvas: versatile and scalable detection of copy number variants. Bioinformatics. 2016;32(15):2375–7. ArticlePubMedGoogle Scholar
  62. Ivakhno S, Roller E, Colombo C, Tedder P, Cox AJ. Canvas SPW: calling de novo copy number variants in pedigrees. Bioinformatics. 2018;34(3):516–8. ArticlePubMedGoogle Scholar
  63. Wang Z, Hormozdiari F, Yang W-Y, Halperin E, Eskin E. CNVeM: copy number variation detection using uncertainty of read mapping. J Comput Biol. 2013;20(3):224–36. ArticlePubMedPubMed CentralGoogle Scholar
  64. Nguyen HT, Merriman TR, Black MA. The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data. Front Genet. 2014;5:248. ArticlePubMedPubMed CentralGoogle Scholar
  65. Miller CA, Hampton O, Coarfa C, Milosavljevic A. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One. 2011;6(1):e16327. ArticlePubMedPubMed CentralGoogle Scholar
  66. Aure MR, Vitelli V, Jernström S, Kumar S, Krohn M, Due EU, et al. Integrative clustering reveals a novel split in the luminal a subtype of breast cancer with impact on outcome. Breast Cancer Res. 2017;19(1):44. https://doi.org/10.1186/s13058-017-0812-y. ArticlePubMedPubMed CentralGoogle Scholar
  67. Karim MR, Rahman A, Jares JB, Decker S, Beyan O. A snapshot neural ensemble method for cancer-type prediction based on copy number variations. Neural Comput & Applic. 2019:1–19. Google Scholar
  68. AlShibli A, Mathkour H. A shallow convolutional learning network for classification of cancers based on copy number variations. Sensors. 2019;19(19):4207. ArticlePubMed CentralGoogle Scholar
  69. Fortin J-P, Triche TJ Jr, Hansen KD. Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics. 2017;33(4):558–60. ArticlePubMedGoogle Scholar
  70. Robertson KD. DNA methylation and human disease. Nat Rev Genet. 2005;6(8):597–610. ArticlePubMedGoogle Scholar
  71. Jiang Y, Oldridge DA, Diskin SJ, Zhang NR. CODEX: a normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res. 2015;43(6):e39-e. ArticleGoogle Scholar
  72. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17(11):1665–74. ArticlePubMedPubMed CentralGoogle Scholar
  73. Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 2007;35(6):2013–25. ArticlePubMedPubMed CentralGoogle Scholar
  74. Zhang Z, Cheng H, Hong X, Di Narzo AF, Franzen O, Peng S, et al. EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data. Nucleic Acids Res. 2019;47(7):e39-e. ArticleGoogle Scholar
  75. Pounraja VK, Jayakar G, Jensen M, Kelkar N, Girirajan S. A machine-learning approach for accurate detection of copy number variants from exome sequencing. Genome Res. 2019;29(7):1134–43. ArticlePubMedPubMed CentralGoogle Scholar
  76. Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983–7. ArticlePubMedGoogle Scholar
  77. Hill T, Unckless RL. A deep learning approach for detecting copy number variation in next-generation sequencing data. G3: Genes, Genomes, Genetics. 2019;9(11):3575–82. ArticleGoogle Scholar
  78. Zhang Y, Jin L, Wang B, Hu D, Wang L, Li P, et al. DL-CNV: a deep learning method for identifying copy number variations based on next generation target sequencing. Math Biosci Eng: MBE. 2019;17(1):202–15. ArticlePubMedGoogle Scholar
  79. Jiang Y, Qiu Y, Minn AJ, Zhang NR. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc Natl Acad Sci. 2016;113(37):E5528–E37. ArticlePubMedPubMed CentralGoogle Scholar
  80. Liu J, Halloran JT, Bilmes JA, Daza RM, Lee C, Mahen EM, et al. Comprehensive statistical inference of the clonal structure of cancer from multiple biopsies. Sci Rep. 2017;7(1):1–13. Google Scholar
  81. Holder LB, Haque MM, Skinner MK. Machine learning for epigenetics and future medical applications. Epigenetics. 2017;12(7):505–14. ArticlePubMedPubMed CentralGoogle Scholar
  82. Ni P, Huang N, Zhang Z, Wang D-P, Liang F, Miao Y, et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics. 2019;35(22):4586–95. ArticlePubMedGoogle Scholar
  83. Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017;18(1):67. ArticlePubMedPubMed CentralGoogle Scholar
  84. Zhang W, Spector TD, Deloukas P, Bell JT, Engelhardt BE. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol. 2015;16(1):14. ArticlePubMedPubMed CentralGoogle Scholar
  85. Zhang G, Huang KC, Xu Z, Tzeng JY, Conneely KN, Guan W, et al. Across-platform imputation of DNA methylation levels incorporating nonlocal information using penalized functional regression. Genet Epidemiol. 2016;40(4):333–40. https://doi.org/10.1002/gepi.21969. ArticlePubMedPubMed CentralGoogle Scholar
  86. Zeng H, Gifford DK. Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res. 2017;45(11):e99-e. ArticleGoogle Scholar
  87. Capper D, Jones DT, Sill M, Hovestadt V, Schrimpf D, Sturm D, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–74. ArticlePubMedPubMed CentralGoogle Scholar
  88. Cai Z, Xu D, Zhang Q, Zhang J, Ngai S-M, Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol BioSyst. 2015;11(3):791–800. ArticlePubMedGoogle Scholar
  89. Wei SH, Balch C, Paik HH, Kim Y-S, Baldwin RL, Liyanarachchi S, et al. Prognostic DNA methylation biomarkers in ovarian cancer. Clin Cancer Res. 2006;12(9):2788–94. ArticlePubMedGoogle Scholar
  90. Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 2013;14(3):R21. ArticlePubMedPubMed CentralGoogle Scholar
  91. Forcato M, Nicoletti C, Pal K, Livi CM, Ferrari F, Bicciato S. Comparison of computational methods for Hi-C data analysis. Nat Methods. 2017;14(7):679–85. https://doi.org/10.1038/nmeth.4325. ArticlePubMedPubMed CentralGoogle Scholar
  92. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80. ArticlePubMedPubMed CentralGoogle Scholar
  93. Bonev B, Mendelson Cohen N, Szabo Q, Fritsch L, Papadopoulos GL, Lubling Y, et al. Multiscale 3D genome rewiring during mouse neural development. Cell. 2017;171(3):557–72.e24. https://doi.org/10.1016/j.cell.2017.09.043. ArticlePubMedPubMed CentralGoogle Scholar
  94. Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY, et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature. 2013;503(7475):290–4. https://doi.org/10.1038/nature12644. ArticlePubMedPubMed CentralGoogle Scholar
  95. Zhang Y, An L, Xu J, Zhang B, Zheng WJ, Hu M, et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun. 2018;9(1):750. https://doi.org/10.1038/s41467-018-03113-2. ArticlePubMedPubMed CentralGoogle Scholar
  96. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 2019;35(21):4222–8. https://doi.org/10.1093/bioinformatics/btz251. ArticlePubMedPubMed CentralGoogle Scholar
  97. Liu Q, Lv H, Jiang R. hicGAN infers super resolution Hi-C data with generative adversarial networks. Bioinformatics. 2019;35(14):i99–i107. https://doi.org/10.1093/bioinformatics/btz317. ArticlePubMedPubMed CentralGoogle Scholar
  98. Lajoie BR, Dekker J, Kaplan N. The Hitchhiker’s guide to Hi-C analysis: practical guidelines. Methods. 2015;72:65–75. https://doi.org/10.1016/j.ymeth.2014.10.031. ArticlePubMedGoogle Scholar
  99. Yaffe E, Tanay A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet. 2011;43(11):1059–65. https://doi.org/10.1038/ng.947. ArticlePubMedGoogle Scholar
  100. Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics. 2012;28(23):3131–3. https://doi.org/10.1093/bioinformatics/bts570. ArticlePubMedPubMed CentralGoogle Scholar
  101. Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012;9(10):999–1003. https://doi.org/10.1038/nmeth.2148. ArticlePubMedPubMed CentralGoogle Scholar
  102. Li Y, Hu M, Shen Y. Gene regulation in the 3D genome. Hum Mol Genet. 2018;27(R2):R228–r33. https://doi.org/10.1093/hmg/ddy164. ArticlePubMedPubMed CentralGoogle Scholar
  103. Yu M, Ren B. The three-dimensional Organization of Mammalian Genomes. Annu Rev Cell Dev Biol. 2017;33:265–89. https://doi.org/10.1146/annurev-cellbio-100616-060531. ArticlePubMedPubMed CentralGoogle Scholar
  104. Crowley C, Yang Y, Qiu Y, Hu B, Won H, Ren B, et al. FIREcaller: an R package for detecting frequently interacting regions from Hi-C data. bioRxiv. 2019; 619288. https://doi.org/10.1101/619288.
  105. Schmitt AD, Hu M, Jung I, Xu Z, Qiu Y, Tan CL, et al. A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 2016;17(8):2042–59. https://doi.org/10.1016/j.celrep.2016.10.061. ArticlePubMedPubMed CentralGoogle Scholar
  106. Rao Suhas SP, Huntley Miriam H, Durand Neva C, Stamenova Elena K, Bochkov Ivan D, Robinson James T, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80. https://doi.org/10.1016/j.cell.2014.11.021. ArticlePubMedPubMed CentralGoogle Scholar
  107. Kaul A, Bhattacharyya S, Ay F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat Protoc. 2020;15(3):991–1012. https://doi.org/10.1038/s41596-019-0273-0. ArticlePubMedPubMed CentralGoogle Scholar
  108. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014; https://doi.org/10.1101/gr.160374.113.
  109. Juric I, Yu M, Abnousi A, Raviram R, Fang R, Zhao Y, et al. MAPS: model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments. PLoS Comput Biol. 2019;15(4):e1006982. https://doi.org/10.1371/journal.pcbi.1006982. ArticlePubMedPubMed CentralGoogle Scholar
  110. Xu Z, Zhang G, Jin F, Chen M, Furey TS, Sullivan PF, et al. A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data. Bioinformatics. 2016;32(5):650–6. https://doi.org/10.1093/bioinformatics/btv650. ArticlePubMedGoogle Scholar
  111. Xu Z, Zhang G, Wu C, Li Y, Hu M. FastHiC: a fast and accurate algorithm to detect long-range chromosomal interactions from Hi-C data. Bioinformatics. 2016;32(17):2692–5. https://doi.org/10.1093/bioinformatics/btw240. ArticlePubMedPubMed CentralGoogle Scholar
  112. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014;24(6):999–1011. https://doi.org/10.1101/gr.160374.113. ArticlePubMedPubMed CentralGoogle Scholar
  113. Lawrence CE, Reilly AA. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990;7(1):41–51. https://doi.org/10.1002/prot.340070105. ArticlePubMedGoogle Scholar
  114. Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34(Web Server issue):W369–73. https://doi.org/10.1093/nar/gkl198. ArticlePubMedPubMed CentralGoogle Scholar
  115. Moses AM, Chiang DY, Eisen MB. Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pac Symp Biocomput. 2004:324–35. https://doi.org/10.1142/9789812704856_0031.
  116. Prakash A, Blanchette M, Sinha S, Tompa M. Motif discovery in heterogeneous sequence data. Pac Symp Biocomput. 2004:348–59. https://doi.org/10.1142/9789812704856_0033.
  117. Sinha S, Blanchette M, Tompa M. PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinform. 2004;5:170. https://doi.org/10.1186/1471-2105-5-170. ArticleGoogle Scholar
  118. Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33(8):831–8. https://doi.org/10.1038/nbt.3300. ArticlePubMedGoogle Scholar
  119. Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27(12):1696–7. ArticlePubMedPubMed CentralGoogle Scholar
  120. Foat BC, Morozov AV, Bussemaker HJ. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics. 2006;22(14):e141–e9. ArticlePubMedGoogle Scholar
  121. Quang D, Xie X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods. 2019;166:40–7. https://doi.org/10.1016/j.ymeth.2019.03.020. ArticlePubMedPubMed CentralGoogle Scholar
  122. Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931–4. https://doi.org/10.1038/nmeth.3547. ArticlePubMedPubMed CentralGoogle Scholar
  123. Ritchie GR, Dunham I, Zeggini E, Flicek P. Functional annotation of noncoding sequence variants. Nat Methods. 2014;11(3):294–6. https://doi.org/10.1038/nmeth.2832. ArticlePubMedPubMed CentralGoogle Scholar
  124. Wang M, Tai C, Weinan E, Wei L. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res. 2018;46(11):e69. https://doi.org/10.1093/nar/gky215. ArticlePubMedPubMed CentralGoogle Scholar
  125. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM III, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–1902.e21. ArticlePubMedPubMed CentralGoogle Scholar
  126. Adey AC. Integration of single-cell genomics datasets. Cell. 2019;177(7):1677–9. ArticlePubMedGoogle Scholar
  127. Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell. 2019;177:1873–1887.e17. ArticlePubMedPubMed CentralGoogle Scholar
  128. Li G, Yang Y, Van Buren E, Li Y. Dropout imputation and batch effect correction for single-cell RNA sequencing data. J Bio-X Res. 2019;2(4):169–77. Google Scholar
  129. Bengio Y. Learning deep architectures for AI. Foundations and trends® in. Mach Learn. 2009;2(1):1–127. Google Scholar
  130. Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. Adv Neural Inform Proc Syst. 2015:649–57. Google Scholar
  131. Deng Y, Bao F, Dai Q, Wu LF, Altschuler SJ. Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning. Nat Methods. 2019;16(4):311–4. ArticlePubMedPubMed CentralGoogle Scholar
  132. Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15(12):1053–8. ArticlePubMedPubMed CentralGoogle Scholar
  133. Van Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ, et al. Recovering gene interactions from single-cell data using data diffusion. Cell. 2018;174(3):716–729.e27. ArticlePubMedPubMed CentralGoogle Scholar
  134. Eraslan G, Simon LM, Mircea M, Mueller NS, Theis FJ. Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun. 2019;10(1):1–14. ArticleGoogle Scholar
  135. Way GP, Greene CS. Bayesian deep learning for single-cell analysis. Nat Methods. 2018;15(12):1009–10. ArticlePubMedGoogle Scholar
  136. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Adv Neural Inform Process Syst. 2014;3:2672–80. Google Scholar
  137. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. ArticlePubMedPubMed CentralGoogle Scholar
  138. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502. ArticlePubMedPubMed CentralGoogle Scholar
  139. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32(4):381. ArticlePubMedPubMed CentralGoogle Scholar
  140. Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11(7):740. ArticlePubMedPubMed CentralGoogle Scholar
  141. Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16(1):278. ArticlePubMedPubMed CentralGoogle Scholar
  142. Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8(1):1–12. ArticleGoogle Scholar
  143. Lun AT, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016;5:2122. PubMedPubMed CentralGoogle Scholar
  144. Chen W-P, Chang S-H, Tang C-Y, Liou M-L, Tsai S-JJ, Lin Y-L. Composition analysis and feature selection of the oral microbiota associated with periodontal disease. Biomed Res Int. 2018 Google Scholar
  145. Nakano Y, Suzuki N, Kuwata F. Predicting oral malodour based on the microbiota in saliva samples using a deep learning approach. BMC Oral Health. 2018;18(1):128. ArticlePubMedPubMed CentralGoogle Scholar
  146. Hsieh C-H, Chen W-M, Hsieh Y-S, Fan Y-C, Yang PE, Kang S-T, et al. A novel multi-gene detection platform for the analysis of miRNA expression. Sci Rep. 2018;8(1):1–9. ArticleGoogle Scholar
  147. Saxena D, Caufield PW, Li Y, Brown S, Song J, Norman R. Genetic classification of severe early childhood caries by use of subtracted DNA fragments from Streptococcus mutans. J Clin Microbiol. 2008;46(9):2868–73. ArticlePubMedPubMed CentralGoogle Scholar
  148. Carnielli CM, Macedo CCS, De Rossi T, Granato DC, Rivera C, Domingues RR, et al. Combining discovery and targeted proteomics reveals a prognostic signature in oral cancer. Nat Commun. 2018;9(1):1–17. ArticleGoogle Scholar
  149. Torres PJ, Thompson J, McLean JS, Kelley ST, Edlund A. Discovery of a novel periodontal disease-associated bacterium. Microb Ecol. 2019;77(1):267–76. ArticlePubMedGoogle Scholar
  150. Vapnik V. The nature of statistical learning theory. Berlin: Springer Science & Business Media; 2000. BookGoogle Scholar
  151. Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AICHE J. 1991;37(2):233–43. ArticleGoogle Scholar
  152. Oh M, Zhang L. DeepMicro: deep representation learning for disease prediction based on microbiome data. Sci Rep. 2020;10(1):1–9. Google Scholar
  153. Reiman D, Metwally A, Dai Y, Sun J. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data. IEEE J Biomed Health Inform. 2020;24(10):2993–3001. ArticlePubMedGoogle Scholar

Acknowledgments

This work was supported by grants from the National Institutes of Health (NIH), National Institute of Dental and Craniofacial Research, R03-DE028983 to DW and HC, U01-DE025046 to KD and HC, NIH R01 GM105785, R01 HL129132, and R01 HL146500 to YL, and NLM T15-LM012500 to MP.

Author information

Authors and Affiliations

  1. Division of Oral and Craniofacial Health Science, Adam School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Di Wu
  2. Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Di Wu & Hunyong Cho
  3. Division of Pediatric and Public Health, Adam School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Deepti S. Karhade & Kimon Divaris
  4. Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Malvika Pillai
  5. Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Min-Zhi Jiang
  6. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Le Huang & Yun Li
  7. Department of Statistics and Operations Research, University of North Carolina-Chapel Hill, Chapel Hill, USA Gang Li
  8. Research Computing, University of North Carolina-Chapel Hill, Chapel Hill, USA Jeff Roach
  9. Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, USA Kimon Divaris