Journal of Molecular Biology

Year	IMPACT-FACTOR
2024	1,200
2023	1,500
2022	1,200
2021	1,540
2020	1,374
2019	1,023
2018	0,932
2017	0,977
2016	0,799
2015	0,662
2014	0,740
2013	0,739
2012	0,637
2011	0,658
2010	0,654
2009	0,570
2008	0,849
2007	0,805
2006	0,330
2005	0,435
2004	0,623
2003	0,567
2002	0,641
2001	0,490
2000	0,477
1999	0,762
1998	0,785
1997	0,507
1996	0,518
1995	0,502

Vol 59(2025) N 5 p. 836-848; DOI 10.1134/S0026893325700311

N.V. Lebedev¹*, D.A. Filimonov², V.V. Poroikov², A.A. Lagunin^1,2

Prediction of New N£-Acetylation Sites in the Human Proteome Based on Molecular Multilevel Neighborhoods of Atom Descriptors
¹Pirogov Russian National Research Medical University, Moscow, 117997 Russia
²Orekhovich Institute of Biomedical Chemistry, Moscow, 119121 Russia

^*lebedev_nv@rsmu.ru
Received - 2024-06-04; Revised - 2025-03-06; Accepted - 2025-05-07

The N_e-acetylation of lysine residues is one of the most common processes of post-translational protein modification. As a result of the reaction between the e-amino group of the side chain of Lys and the activated acetyl, an amide bond is formed, which leads to a change in the charge of the protein in the region of the modification. The growing interest in such sites is due to the influence of N_e-acetylation of Lys residues on the regulation of cellular activity, the disruption of which can lead to pathological conditions. Furthermore, the prediction of the N_e-acetylation sites of Lys residues serves as a tool for planning an experiment design in modern proteomics, since the presence of a forecast simplifies the choice of proteolysis strategy, the interpretation of controversial mass spectra, and the selection of proteotypic peptides. Here, we propose a new approach for predicting the N_e-acetylation sites of Lys residues in human proteins using machine learning techniques. A feature of this approach is the use of structural formulas of peptides containing a potential N_e-acetylation site and their description in the form of Multilevel Neighborhoods of Atoms (MNA) descriptors. Such descriptors are recursively generated for each atom of the molecule. A level zero descriptor represents the atom itself, the first level descriptor includes the atom and all atoms one bond away from it, and so on. Classification models for predicting N_e-acetylation sites of Lys residues were built using the previously developed MultiPASS program based on the analysis of more than 23000 sites from the PhosphoSitePlus database. The best model was obtained with a peptide length of 35 amino acid residues and using level 9 MNA descriptors. In fivefold cross-validation, the sensitivity, specificity, and ROC-AUC of the developed model were 0.71, 0.74, and 0.82, respectively. The model identified 1136 previously unknown potential sites in 418 proteins of the human reference proteome at a classification threshold defined as the difference in the probabilities of site assignment to positive (Pa) and negative (Pi) classes, (Pa - Pi) > 0.7. The obtained data can serve as a basis for further proteomic studies aimed at identifying and functionally annotating N_e- acetylation sites of Lys in human proteins.

human proteome, post-translational modifications, lysine N_e-acetylation, molecular descriptors, machine learning, computational model