TY - CONF
T1 - Scop3P: the bridge between human phosphosites, protein structure and proteomics data
AU - Ramasamy, Pathmanaban
PY - 2020/8/7
Y1 - 2020/8/7
N2 - IntroductionPhosphorylation is one of the key Post-translational modifications (PTMs) of proteins, which is studied extensively due to its importance in many essential cellular processes. As a result, the amount of publicly available data about phosphorylation has increased dramatically over time. However, available resources on phosphorylation typically contain only sequence and phosphosite information, generally omitting structural information and phosphorylation status. Yet such structural information is particularly relevant in a crucial task: to differentiate between functional and non-functional phopshosites. We therefore developed Scop3P: a database of public proteomics data-derived human phosphosites that are annotated with detailed, residue-level structural information based on state-of-the-art prediction tools. Moreover, Scop3P allows to link phosphosites to 3D protein structures when available in PDB.MethodsInformation about human phosphosites was obtained from UniProtKB/Swiss-Prot and by re-processing public phopshoproteomics data from PRIDE. Phosphosites were then mapped onto their protein structure when available in the PDB using the SIFTS mapping. For each human protein - even when no structure is available - backbone dynamics, disordered propensity and early folding properties were predicted using DynaMine, DisOmine and EFoldMine respectively. Secondary structural propensities were obtained from DSSP and solvent accessibility from PDBePISA. Human amino acid variations were retrieved from the Humsavar dataset from Swiss-Prot and were scored using PROVEAN. Evolutionary conservation information for the phosphosites with structures were retrieved from Consurf-DB. Visualization of the protein structure is performed by NGL Viewer. Scop3P uses a relational database as data repository.Preliminary DataThe information in scop3P can be accessed by search or browse options. The user can search with the Swiss-Prot accession/entry name, protein name, PDB ID or by ProteomeXchange ID. Phospho informations can also be browsed by selecting particular type of PTMs or PRIDE projects. For each human protein, Scop3P contains two levels of information: sequence and structural information. All experimental information of phospshosites and predicted secondary structure informations were annotated to amino acid sequences. In total, our database contains 9775 phosphoproteins (Human proteome contains 20,408 proteins) that contain 58758 unique phospho instances (P-sites) from Swiss-Prot (40034) and PRIDE (18724). 5198 of these unique phospho instances were mapped onto 14662 different PDB structures. Residue conservation scores were given for the phopshosites with structures. Moreover, 76746 human amino acid variations were also annotated to the protein sequence. 372 of these variations are located on phosphosites in which 164 of them were predicted as deleterious variants. Each phosphosite is also annotated with information like peptide sequence, number of different PRIDE projects it is seen in, project details and the status of phosphorylation for the given project (number of times phophorylated and unphosphorylated). Every protein position in the database is annotated with residue level structural parameters like solvent accessibility, secondary structural propensity, backbone dynamics, disorder propensity and early folding propensity. Protein structures can be visualized and colored by mapped phosphosites, solvent accessibility in phosphosites or by amino acid variants. Novel AspectScop3P establishes the link between phosphorylation status information, sequence and structural level information and data retrieved from re-processing proteomics experiments.
AB - IntroductionPhosphorylation is one of the key Post-translational modifications (PTMs) of proteins, which is studied extensively due to its importance in many essential cellular processes. As a result, the amount of publicly available data about phosphorylation has increased dramatically over time. However, available resources on phosphorylation typically contain only sequence and phosphosite information, generally omitting structural information and phosphorylation status. Yet such structural information is particularly relevant in a crucial task: to differentiate between functional and non-functional phopshosites. We therefore developed Scop3P: a database of public proteomics data-derived human phosphosites that are annotated with detailed, residue-level structural information based on state-of-the-art prediction tools. Moreover, Scop3P allows to link phosphosites to 3D protein structures when available in PDB.MethodsInformation about human phosphosites was obtained from UniProtKB/Swiss-Prot and by re-processing public phopshoproteomics data from PRIDE. Phosphosites were then mapped onto their protein structure when available in the PDB using the SIFTS mapping. For each human protein - even when no structure is available - backbone dynamics, disordered propensity and early folding properties were predicted using DynaMine, DisOmine and EFoldMine respectively. Secondary structural propensities were obtained from DSSP and solvent accessibility from PDBePISA. Human amino acid variations were retrieved from the Humsavar dataset from Swiss-Prot and were scored using PROVEAN. Evolutionary conservation information for the phosphosites with structures were retrieved from Consurf-DB. Visualization of the protein structure is performed by NGL Viewer. Scop3P uses a relational database as data repository.Preliminary DataThe information in scop3P can be accessed by search or browse options. The user can search with the Swiss-Prot accession/entry name, protein name, PDB ID or by ProteomeXchange ID. Phospho informations can also be browsed by selecting particular type of PTMs or PRIDE projects. For each human protein, Scop3P contains two levels of information: sequence and structural information. All experimental information of phospshosites and predicted secondary structure informations were annotated to amino acid sequences. In total, our database contains 9775 phosphoproteins (Human proteome contains 20,408 proteins) that contain 58758 unique phospho instances (P-sites) from Swiss-Prot (40034) and PRIDE (18724). 5198 of these unique phospho instances were mapped onto 14662 different PDB structures. Residue conservation scores were given for the phopshosites with structures. Moreover, 76746 human amino acid variations were also annotated to the protein sequence. 372 of these variations are located on phosphosites in which 164 of them were predicted as deleterious variants. Each phosphosite is also annotated with information like peptide sequence, number of different PRIDE projects it is seen in, project details and the status of phosphorylation for the given project (number of times phophorylated and unphosphorylated). Every protein position in the database is annotated with residue level structural parameters like solvent accessibility, secondary structural propensity, backbone dynamics, disorder propensity and early folding propensity. Protein structures can be visualized and colored by mapped phosphosites, solvent accessibility in phosphosites or by amino acid variants. Novel AspectScop3P establishes the link between phosphorylation status information, sequence and structural level information and data retrieved from re-processing proteomics experiments.
M3 - Poster
T2 - ASMS Conference on Mass Spectrometry and Allied Topics
Y2 - 2 June 2019 through 6 June 2019
ER -