摘要
A “minimally complex problem set” forab initio protein structure prediction has been proposed. As well as consisting of non-redundant and crystallographically determined high-resolution protein structures, without disulphide bonds, modified residues, unusual connectivities and heteromolecules, it is more importantly a collection of protein structures, with a high probability of being the same in the crystal form as in solution. To our knowledge, this is the first attempt at this kind of dataset. Considering the lattice constraint in crystals, and the possible flexibility in solution of crystallographically determined protein structures, our dataset is thought to be the safest starting points for anab initio protein structure prediction study.