Abstract
ABSTRACTProtein domain interactions with short linear peptides, such as Src homology 2 (SH2) domain interactions with phosphotyrosine-containing peptide motifs (pTyr), are ubiquitous and important to many biochemical processes of the cell. The desire to map and quantify these interactions has resulted in the development of high-throughput (HTP) quantitative measurement techniques, such as microarray or fluorescence polarization assays. For example, in the last 15 years, experiments have progressed from measuring single interactions to covering 500,000 of the 5.5 million possible SH2-pTyr interactions in the human proteome. However, high variability in affinity measurements and disagreements about positive interactions between published datasets led us to re-evaluate the analysis methods and raw data of published SH2-pTyr HTP experiments. We identified several opportunities for improving the identification of positive and negative interactions, and the accuracy of affinity measurements. We implemented model fitting techniques that are more statistically appropriate for the non-linear SH2-pTyr interaction data. We developed a novel method to account for protein concentration errors due to impurities and degradation, as well as addressing protein inactivity and aggregation. Our revised analysis increases reported affinity accuracy, reduces the false negative rate, and results in an increase in useful data due to the addition of reliable true negative results. We demonstrate improvement in classification of binding vs non-binding when using machine learning techniques, suggesting improved coherence in the reanalyzed datasets. We present revised SH2-pTyr affinity results, and propose a new analysis pipeline for future HTP measurements of domain-peptide interactions.
Publisher
Cold Spring Harbor Laboratory