|
Vol 59(2025) N 5 p. 836-848; DOI 10.1134/S0026893325700311 ![]() N.V. Lebedev1*, D.A. Filimonov2, V.V. Poroikov2, A.A. Lagunin1,2 Prediction of New N£-Acetylation Sites in the Human Proteome Based on Molecular Multilevel Neighborhoods of Atom Descriptors 1Pirogov Russian National Research Medical University, Moscow, 117997 Russia2Orekhovich Institute of Biomedical Chemistry, Moscow, 119121 Russia *lebedev_nv@rsmu.ru Received - 2024-06-04; Revised - 2025-03-06; Accepted - 2025-05-07 The Ne-acetylation of lysine residues is one of the most common processes of post-translational protein modification. As a result of the reaction between the e-amino group of the side chain of Lys and the activated acetyl, an amide bond is formed, which leads to a change in the charge of the protein in the region of the modification. The growing interest in such sites is due to the influence of Ne-acetylation of Lys residues on the regulation of cellular activity, the disruption of which can lead to pathological conditions. Furthermore, the prediction of the Ne-acetylation sites of Lys residues serves as a tool for planning an experiment design in modern proteomics, since the presence of a forecast simplifies the choice of proteolysis strategy, the interpretation of controversial mass spectra, and the selection of proteotypic peptides. Here, we propose a new approach for predicting the Ne-acetylation sites of Lys residues in human proteins using machine learning techniques. A feature of this approach is the use of structural formulas of peptides containing a potential Ne-acetylation site and their description in the form of Multilevel Neighborhoods of Atoms (MNA) descriptors. Such descriptors are recursively generated for each atom of the molecule. A level zero descriptor represents the atom itself, the first level descriptor includes the atom and all atoms one bond away from it, and so on. Classification models for predicting Ne-acetylation sites of Lys residues were built using the previously developed MultiPASS program based on the analysis of more than 23000 sites from the PhosphoSitePlus database. The best model was obtained with a peptide length of 35 amino acid residues and using level 9 MNA descriptors. In fivefold cross-validation, the sensitivity, specificity, and ROC-AUC of the developed model were 0.71, 0.74, and 0.82, respectively. The model identified 1136 previously unknown potential sites in 418 proteins of the human reference proteome at a classification threshold defined as the difference in the probabilities of site assignment to positive (Pa) and negative (Pi) classes, (Pa - Pi) > 0.7. The obtained data can serve as a basis for further proteomic studies aimed at identifying and functionally annotating Ne- acetylation sites of Lys in human proteins. human proteome, post-translational modifications, lysine Ne-acetylation, molecular descriptors, machine learning, computational model |