Description

Datasets generated to assess protein ambiguity from different sources of structural data.

Abstract

These datasets contain annotations for protein order, disorder and ambiguity from different sources of information. From DisProt and CoDNaS a dataset was generated extracting the disorder and ambiguous labels from DisProt for disorder and folding-upon-binding regions, and the ordered labels from CoDNaS. Another dataset was manually assembled from the MFIB database (Mutual folding induced by binding), and finally a manually curated one for proteins with fold-switching regions, also labelled as ambiguous regions.

Size

100 kb
Date made available3 Aug 2022
PublisherFrontiers in Molecular Biosciences
Date of data production1 Jan 2022 - 1 Jun 2022

Keywords

  • Protein structures
  • protein ambiguity
  • protein disorder
  • AlphaFold
  • Machine learning

Format

  • Format
  • fasta
  • csv

Cite this