Statistical approaches applicable in managing OMICS data: Urinary proteomics as exemplary case.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Wiley Country of Publication: United States NLM ID: 8219702 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1098-2787 (Electronic) Linking ISSN: 02777037 NLM ISO Abbreviation: Mass Spectrom Rev Subsets: MEDLINE
    • Publication Information:
      Original Publication: New York : Wiley, [c1982-
    • Subject Terms:
    • Abstract:
      With urinary proteomics profiling (UPP) as exemplary omics technology, this review describes a workflow for the analysis of omics data in large study populations. The proposed workflow includes: (i) planning omics studies and sample size considerations; (ii) preparing the data for analysis; (iii) preprocessing the UPP data; (iv) the basic statistical steps required for data curation; (v) the selection of covariables; (vi) relating continuously distributed or categorical outcomes to a series of single markers (e.g., sequenced urinary peptide fragments identifying the parental proteins); (vii) showing the added diagnostic or prognostic value of the UPP markers over and beyond classical risk factors, and (viii) pathway analysis to identify targets for personalized intervention in disease prevention or treatment. Additionally, two short sections respectively address multiomics studies and machine learning. In conclusion, the analysis of adverse health outcomes in relation to omics biomarkers rests on the same statistical principle as any other data collected in large population or patient cohorts. The large number of biomarkers, which have to be considered simultaneously requires planning ahead how the study database will be structured and curated, imported in statistical software packages, analysis results will be triaged for clinical relevance, and presented.
      (© 2023 The Authors. Mass Spectrometry Reviews published by John Wiley & Sons Ltd.)
    • References:
      Bartel J, Krumsiek J, Theis FJ. 2013. Statistical methods for the analysis of high‐throughput metabolomics data. Comput Struct Biotechnol J 4, e201301009.
      Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B 57, 289–300.
      Bhat A, Heinzel A, Mayer B, Perco P, Mühlberger I, Husi H, Merseburger AS, Zoidakis J, Vlahou A, Schanstra JP, Mischak H, Jankowski V. 2015. Protein interactome of muscle‐invasive bladder cancer. PLoS One 10, e0116404.
      Blom G. 1958. Statistical estimates and transformed beta‐variables. 1st ed. New York/Stockholm: Wiley/Almquist and Wiksell.
      Casalicchio G, Molnar C, Bischl B. 2019. Visualizing the feature importance for black box models. In: Machine Learning and Knowledge Discovery in Databases (Berlingerio M, Bonchi F, Gärtner T, eds.). Cham, Switzerland: Springer International Publishing, 665–670.
      Castelloe J, Cybrynski M. 2017. Chapter 91. The POWER Procedure. In: SAS/STAT® 14.3 User's Guide (Baxter A, Huddleston E, eds.). Cary, North Carolina, USA: SAS Intitute Inc., 7353–7602.
      Cavill R, Jennen D, Kleinjans J, Briede JJ. 2016. Transcriptomic and metabolomic data integration. Brief Bioinform 17, 891–901.
      Cazaly E, Saad J, Wang W, Heckman C, Ollikainen M, Tang J. 2019. Making sense of the epigenome using data integration approaches. Front Pharmacol 10, 126.
      Csala A, Zwinderman AH. 2019. Multivariate statistical methods for high‐dimensional multiset omics data analysis. In: Computational Biology (Husi H, ed.). Brisbane, Australia: Codon Publications, 71–83.
      Csala, A, Zwinderman AH, Hof MH. 2020. Multiset sparse partial least squares path modeling for high dimensional omics data analysis. BMC Bioinformatics 21, 9.
      Forrest IS, Petrazzini BO, Duffy Á, Park JK, Marquez‐Luna C, Jordan DM, Rocheleau G, Cho JH, Rosenson RS, Narula J, Nadkarni GN, Do R. 2023. Machine learning‐based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet 21, 215–225.
      Good DM, Zürbig P, Argilés A, Bauer HW, Behrens G, Coon JJ, Dakna M, Decramer S, Delles C, Dominiczak AF, Ehrich JH, Eitner F, Fliser D, Frommberger M, Ganser A, Girolami MA, Golovko I, Gwinner W, Haubitz M, Herget‐Rosenthal S, Jankowski J, Jahn H, Jerums G, Julian BA, Kellmann M, Kliem V, Kolch W, Krolewski AS, Luppi M, Massy Z, Melter M, Neusüss C, Novak J, Peter K, Rossing K, Rupprecht H, Schanstra JP, Schiffer E, Stolzenburg JU, Tarnow L, Theodorescu D, Thongboonkerd V, Vanholder R, Weissinger EM, Mischak H, Schmitt‐Kopplin P. 2010. Naturally occurring human urinary peptides for use in diagnosis of chronic kidney disease. Moll Cell Proteomics 9, 2424–2437.
      Hansen TW, Staessen JA, Torp‐Pedersen C, Rasmussen S, Thijs L, Ibsen H, Jeppesen J. 2006. Prognostic value of aortic pulse wave velocity as index of arterial stiffness in the general population. Circulation 113, 664–670.
      He T, Mischak M, Clark AL, Campbell RT, Delles C, Díez J, Filippatos G, Mebazaa A, McMurray JJV, González A, Raad J, Stroggilos R, Bosselmann HS, Campbell A, Kerr SM, Jackson CE, Cannon JA, Schou M, Girerd N, Rossignol P, McConnachie A, Rossing K, Schanstra JP, Zannad F, Vlahou A, Mullen W, Jankowski V, Mischak H, Zhang Z, Staessen JA, Latosinska A. 2021. Urinary peptides in heart failure: a link to molecular pathophysiology. Eur J Heart Fail 23, 1875–1887.
      He T, Melgarejo JD, Clark AL, Yu YL, Thijs L, Díez J, López B, González A, Cleland JG, Schanstra JP, Vlahou A, Latosinska A, Mischak H, Staessen JA, Zhang ZY, Jankowski V. 2021b. Serum and urinary biomarkers of collagen type‐I turnover predict prognosis in patients with heart failure. Clin Transl Med 11, e267.
      Holm S. 1979. A simple sequentially rejective Benferroni test. Scand J Stat 6, 65–70.
      Htun NM, Magliano DJ, Zhang ZY, Lyons J, Petit T, Nkuipou‐Kenfack E, Ramirez‐Torres A, von Zur Muhlen C, Maahs D, Schanstra JP, Pontillo C, Pejchinovski M, Snell‐Bergeon JK, Delles C, Mischak H, Staessen JA, Shaw JE, Koeck T, Peter K. 2017. Prediction of acute coronary syndromes by urinary proteome analysis. PLoS One 12, e0172036.
      Huang DW, Sherman BT, Lempicki RA. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4, 44–57.
      Huang QF, Trenson S, Zhang ZY, Van Keer J, Van Aelst LNL, Yang WY, Nkuipou‐Kenfack E, Thijs L, Wei FF, Mujaj B, Ciarka A, Droogné W, Vanhaecke J, Janssens S, Van Cleemput J, Mischak H, Staessen JA. 2018. Biomarkers to assess right heart pressures in recipients of a heart transplant: a proof‐of‐concept study. Transplant Direct 4, e346.
      Huang QF, Van Keer J, Zhang ZY, Trenson S, Nkuipou‐Kenfack E, Van Aelst LNL, Yang WY, Thijs L, Wei FF, Ciarka A, Vanhaecke J, Janssens S, Van Cleemput J, Mischak H, Staessen JA. 2018. Urinary proteomic signatures associated with β‐blockade and heart rate in heart transplant recipients. PLoS One 13, e0204439.
      Huang QF, Zhang ZY, Van Keer J, Trenson S, Nkuipou‐Kenfack E, Yang WY, Thijs L, Vanhaecke J, Van Aelst LNL, Van Cleemput J, Janssens S, Verhamme P, Mischak H, Staessen JA. 2019. Urinary peptidomic biomarkers of renal function in heart transplant recipients. Nephrol Dial Transplant 34, 1336–1343.
      Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. 2014. Net reclassification indices for evaluating risk‐prediction instruments: a critical review. Epidemiology 25, 114–121.
      Klein J, Lacroix C, Caubet C, Siwy J, Zürbig P, Dakna M, Muller F, Breuil B, Stalmach A, Mullen W, Mischak H, Bandin F, Monsarrat B, Bascands JL, Decramer S, Schanstra JP. 2013. Fetal urinary peptides to predict postnatal outcome of renal disease in fetuses with posterior urethral valves (PUV). Sci Transl Med 5, 198ra106.
      Kuhfield WF, Kuo A, Sarle WS, Watts DL. 2017. Chapter 30. The CANCORR Procedure. In: SAS/STAT® 14.3 User's Guide (Baxter A, Huddleston E, eds.). Cary, North Carolina, USA: SAS Intitute Inc., 1891–1920.
      Latosinska A, Mokou M, Makridakis M, Mullen W, Zoidakis J, Lygirou V, Frantzi M, Katafigiotis I, Stravodimos K, Hupe MC, Dobrzynski M, Kolch W, Merseburger AS, Mischak H, Roubelakis MG, Vlahou A. 2017. Proteomics analysis of bladder cancer invasion: targeting EIF3D for therapeutic intervention. Oncotarget 8, 69435–69455.
      Latosinska A, Siwy J, Mischak H, Frantzi M. 2019. Peptidomics and proteomics based on CE‐MS as a robust tool in clinical application: the past, the present, and the future. Electrophoresis 40, 2294–2308.
      Lazar C, Gatto L, Ferro M, Bruley C, Burger T. 2016. Accounting for the multiple natures of missing values in label‐free quantitative proteomics data sets to compare imputation strategies. J Proteome Res, 15, 1116–1125.
      Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. 2010. Tackling the widespread and critical impact of batch effects in high‐throughput data. Nature Rev Genet 11, 733–739.
      Littell RC, Milliken GA, Stroup WW, Wolfinger RD. 1996. Chapter 3. Analysis of Repeated Measures Data. In: SAS System for Mixed Models. Cary, North Carolina, USA: SAS Institute Inc., 87–134.
      Manduchi E, Romano JD, Moore JH. 2022. The promise of automated machine learning for the genetic analysis of complex traits. Hum Genet 141, 1529–1544.
      Martens DS, Thijs L, Latosinska A, Trenson S, Siwy J, Zhang ZY, Wang C, Beige J, Vlahou A, Janssens S, Mischak H, Nawrot TS, Staessen JA; FLEMENGHO Investigators. 2021. Urinary peptidomics to address age‐related disabilities: a prospective population study with replication inpatients. Lancet Healthy Longevity 2, e690–e703.
      Mavrogeorgis E, Mischak H, Latosinska A, Siwy J, Jankowski V, Jankowski J. 2021. Reproducibility evaluation of urinary peptide detection using CE‐MS. Molecules 26, 7260.
      Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. 2016. Dimension reduction techniques for the integrative analysis of multi‐omics data. Brief Bioinform 17, 628–641.
      Mi H, Muruganujan A, Casagrande JT, Thomas PD. 2013. Large‐scale gene function analysis with the PANTHER classification system. Nat Protoc 8, 1551–1566.
      Mischak H, Allmaier G, Apweiler R, Attwood T, Baumann M, Benigni A, Bennett SE, Bischoff R, Bongcam‐Rudloff E, Capasso G, Coon JJ, D′Haese P, Dominiczak AF, Dakna M, Dihazi H, Ehrich JH, Fernandez‐Llama P, Fliser D, Frokiaer J, Garin J, Girolami M, Hancock WS, Haubitz M, Hochstrasser D, Holman RR, Ioannidis JP, Jankowski J, Julian BA, Klein JB, Kolch W, Luider T, Massy Z, Mattes WB, Molina F, Monsarrat B, Novak J, Peter K, Rossing P, Sánchez‐Carbayo M, Schanstra JP, Semmes OJ, Spasovski G, Theodorescu D, Thongboonkerd V, Vanholder R, Veenstra TD, Weissinger E, Yamamoto T, Vlahou A. 2010. Recommendations for biomarker identification and qualification in clinical proteomics. Sci Transl Med 2, 46ps42.
      Mischak H, Schanstra JP. 2011. CE‐MS in biomarker discovery, validation, and clinical application. Proteomics Clin Appl 5, 9–23.
      Pencina MJ, D′Agostino, Sr. RB, D′Agostino, Jr. RB, Vasan RS. 2008. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27, 157–172.
      Pencina MJ, D′Agostino, Sr. RB, Steyerberg EW. 2011. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30, 11–21.
      Pontillo C, Zhang ZY, Schanstra JP, Jacobs L, Zürbig P, Thijs L, Ramírez‐Torres A, Heerspink HJL, Lindhardt M, Klein R, Orchard T, Porta M, Bilous RW, Charturvedi N, Rossing P, Vlahou A, Schepers E, Glorieux G, Mullen W, Delles C, Verhamme P, Vanholder R, Staessen JA, Mischak H, Jankowski J. 2017. Prediction of chronic kidney disease stage 3 by CKD273, a urinary proteomic biomarker. KI Reports 2, 1066–1075.
      Rossing K, Bosselmann HS, Gustafsson F, Zhang ZY, Gu YM, Kuznetsova T, Nkuipou‐Kenfack E, Mischak H, Staessen JA, Koeck T, Schou M. 2016. Urinary proteomics pilot study for biomarker discovery and diagnosis in heart failure with reduced ejection fraction. PLOS ONE 11(6), e0157167.
      Ruopp MD, Perkins NJ, Whitcomb BW, Schisterman EF. 2008. Youden index and optimal cut‐point estimated from observations affected by a lower limit of detection. Biom J 50, 419–430.
      Saccenti E, Timmerman ME. 2016. Approaches to sample size determination for multivariate data: applications to PCA and PLS‐DA of omics data. J Proteome Res 15, 2379–2393.
      Schisterman EF, Vexler A, Whitcomb BW, Liu A. 2006. The limitations due to exposure detection limits for regression models. Am J Epidemiol 163, 374–383.
      Shaori N, Dubé JS. 2018. Toward improved analysis of concentration data: embracing nondetects. Environ Toxicol Chem 37, 643–656.
      Siwy J, Klein T, Rosler M, von Eynatten M. 2019. Urinary proteomics as a tool to identify kidney responders to dipeptidyl peptidase‐4 inhibition: a hypothesis‐generating analysis of the MERLINA‐T2D trial. Proteomics 13, 1800144.
      Staessen JA, Wendt R, Yu YL, Kalbitz S, Thijs L, Siwy J, Raad J, Metzger J, Neuhaus B, Papkalla A, von der Leyen H, Mebazaa A, Dudoignon E, Spasovski G, Milenkova M, Canevska‐Taneska A, Salgueira Lazo M, Psichogiou M, Rajzer MW, Fuławka Ł, Dzitkowska‐Zabielska M, Weiss G, Feldt T, Stegemann M, Normark J, Zoufaly A, Schmiedel S, Seilmaier M, Rumpf B, Banasik M, Krajewska M, Catanese L, Rupprecht HD, Czerwieńska B, Peters B, Nilsson Å, Rothfuss K, Lübbert C, Mischak H, Beige J; CRIT‐CoV‐U Investigators. 2022. Predictive performance and clinical application of COV50, a urinary proteomic biomarker in early COVID‐19 infection: a cohort study. Lancet Digital Health 4, e727–e737.
      Sui I, Zheng L. 2016. Topics in study design and analysis for multistage clinical proteomic studies. In: Statistical Analysis in Proteomics (Jung K, ed.). Methods in Molecular Biology (MIMB, 1362) New York, NY: Humana Press, 29–61.
      Tobias RD. 1997. An introduction to partial least squares regression. Cary, NC: SAS Institute Inc., 1250−1257.
      Tofte N, Lindhardt M, Adamova K, Bakker SJL, Beige J, Beulens JWJ, Birkenfeld AL, Currie G, Delles C, Dimos I, Francová L, Frimodt‐Møller M, Girman P, Göke R, Havrdova T, Heerspink HJL, Kooy A, Laverman GD, Mischak H, Navis G, Nijpels G, Noutsou M, Ortiz A, Parvanova A, Persson F, Petrie JR, Ruggenenti PL, Rutters F, Rychlík I, Siwy J, Spasovski G, Speeckaert M, Trillini M, Zürbig P, von der Leyen H, Rossing P; PRIORITY Investigators. 2020. Early detection of diabetic kidney disease by urinary proteomics and subsequent intervention with spironolactone to delay progression (PRIORITY): a prospective observational study and embedded randomised placebo‐controlled trial. Lancet Diabetes Endocrinol 8, 301–312.
      Tripepi G, Jager KJ, Dekker FW, Zoccali C. 2010. Stratification for condounding—Part 2: direct and indirect standardization. Nephron Clin Pract 116, c322–c325.
      Trygg J, Wold S. 2002. Orthogonal projections to latent structures (O‐PLS). J Chemom 16, 119–128.
      Vlahou A, Hallinan D, Apweiler R, Argiles A, Beige J, Benigni A, Bischoff R, Black PC, Boehm F, Céraline J, Chrousos GP, Delles C, Evenepoel P, Fridolin I, Glorieux G, van Gool AJ, Heidegger I, Ioannidis JPA, Jankowski J, Jankowski V, Jeronimo C, Kamat AM, Masereeuw R, Mayer G, Mischak H, Ortiz A, Remuzzi G, Rossing P, Schanstra JP, Schmitz‐Dräger BJ, Spasovski G, Staessen JA, Stamatialis D, Stenvinkel P, Wanner C, Williams SB, Zannad F, Zoccali C, Vanholder R. 2021. Data sharing under the general data protection regulation: time to organize law and research ethics? Hypertension 77, 1029–1035.
      Voillet V, Besse P, Liaubet L, San Cristobal M, Gonzalez I. 2016. Handling missing rows in multi‐omics data integration: multiple imputation in multiple factor analysis framework. BMC Bioinformatics 17, 402.
      Wang T, Renteria ME, Peng J. 2022. Editorial: data mining and statistical methods for knowledge discovery in diseases based on multimodal omics. Front Genet 13, 895796.
      Waring J, Lindvall C, Umeton R. 2020. Automated machine learning: review of the state‐of‐the‐art and opportunities for healthcare. Art Intell Med 101, 101822.
      Wasung ME, Chawla LS, Madero M. 2015. Biomarkers of renal function, which and when? Clin Chim Acta 438, 350–357.
      Yang WY, Izzi B, Bress AP, Thijs L, Citterio L, Wei FF, Salvi E, Delli Carpini S, Manunta P, Cusi D, Hoylaerts MF, Luttun A, Verhamme P, Hardikar S, Nawrot TS, Staessen JA, Zhang ZY. 2022. Association of colorectal cancer with genetic and epigenetic variation in PEAR1—a population‐based cohort study. PLoS One 17, e0266481.
      Zhang ZY, Thijs L, Petit T, Gu YM, Jacobs L, Yang WY, Liu YP, Koeck T, Zürbig P, Jin Y, Verhamme P, Voigt JU, Kuznetsova T, Mischak H, Staessen JA. 2015. Urinary proteome and systolic blood pressure as predictors of 5‐year cardiovascular and cardiac outcomes in a general population. Hypertension 66, 52–60.
      Zhang ZY, Ravassa S, Yang WY, Petit T, Pejchinovski M, Zürbig P, López B, Wei FF, Pontillo C, Thijs L, Jacobs L, González A, Koeck T, Delles C, Voigt JU, Verhamme P, Kuznetsova T, Díez J, Mischak H, Staessen JA. 2016. Diastolic left ventricular function in relation to urinary and serum collagen biomarkers in a general population. PLoS One 11, e0167582.
      Zhang ZY, Marrachelli VG, Yang WY, Trenson S, Huang QF, Wei FF, Thijs L, Van Keer J, Monleon D, Verhamme P, Voigt JU, Kuznetsova T, Redón J, Staessen JA. 2017. Left ventricular function in relation to circulating metabolic biomarkers: cross‐sectional and longitudinal observations in a general population. In: OMICS as Tool to Address the Burden of Non‐Communicable Age‐Related Disease in Populations in Epidemiological Transition. Acta Biomedica LovaniensaLeuven University Press, 459–495.
      Zhang ZY, Nkuipou‐Kenfack E, Staessen JA. 2019. Urinary peptidomic biomarker for personalized prevention and treatment of diastolic left ventricular dysfunction. Proteomics Clin Appl 13, e1800174.
      Zhang ZY, Ravassa S, Pejchinovski M, Yang WY, Zürbig P, López B, Wei FF, Thijs L, Jacobs L, González A, Voigt JU, Verhamme P, Kuznetsova T, Díez J, Mischak H, Staessen JA. 2017b. A urinary fragment of mucin‐1 subunit α is a novel biomarker associated with renal dysfunction in the general population. KI Reports 2, 811–820.
      Zheng A. 2015. Evaluating machine learning models: a beginner's guide to key concepts and pitfalls. Sebastopol, CA: O'Reilly Media Inc.
      Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J Royal Stat Soc B 37, 301–320.
    • Contributed Indexing:
      Keywords: multidimensional classifiers; proteomics; statistical methods; urinary proteomics
    • Accession Number:
      0 (Biomarkers)
    • Publication Date:
      Date Created: 20230505 Date Completed: 20241008 Latest Revision: 20241008
    • Publication Date:
      20241008
    • Accession Number:
      10.1002/mas.21849
    • Accession Number:
      37143314