University of Oklahoma
School of Chemical Engineering and Materials Science
Recombinant Protein Solubility Prediction
Type (or cut and paste) your protein sequence below, click on the "Submit" button, and the solubility probability of your protein will be calculated. The statistical model predicts protein solubility assuming the protein is being overexpressed in Escherichia coli. If there are numbers, spaces, or other characters in your sequence, don't worry, they won't affect the calculation. For more information on the solubility model used here, see the references below.
Current Model (2009)
This model was created using logistic regression of 32 possible parameters. In addition, the protein database used to create this model was increased to 212 proteins. Results from the model were 94% accurate when compared to lab results.
Parameters used for this model include:
- Molecular weight
- Amino acid fractions
- Aliphatic index
- Alpha-Helix propensity
- Beta-Sheet propensity
- Average pI
- Approximate charge average
- Hydrophilicity index
Please enter the average pI value and molecular weight in the boxes below. These values can be calculated using the pI/Mw tool developed by the Swiss Institute of Bioinformatics.
A. Diaz, E Tomba, R. Lennarson, R. Richard, M. Bagajewicz, and R.G. Harrison. 2009. Prediction of Protein Solubility in Escherichia coli Using Logistic Regression. Biotechnol. Bioeng. 105(2):374-383. PDF file
Previous Model (1991)
This model was created using discriminant analysis of 6 possible parameters. The parameters used include (In order of decreasing correlation):
- Charge average
- Turn forming reside fraction
- Cysteine fraction
- Proline Fraction
- Total number of residues
- R.G. Harrison. 2000. Expression of soluble heterologous proteins via fusion with NusA protein. inNovations. 11:4-7. PDF file
- Davis, G.D., Elisee, C., Newham, D.M. and R.G. Harrison. 1999. New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol. Bioeng. 65(4):382-8. PubMed Abstract
- Wilkinson, D.L. and R.G. Harrison. 1991. Predicting the solubility of recombinant proteins in Escherichia coli. Bio/Technology. 9: 443-448. PubMed Abstract
Please send questions or comments to email@example.com.