University of Oklahoma

School of Chemical, Biological, and Materials Engineering

Recombinant Protein Solubility Prediction

Type (or cut and paste) your protein sequence below, click on the "Submit" button, and the solubility probability of your protein will be calculated. The statistical model predicts protein solubility assuming the protein is being overexpressed in Escherichia coli. If there are numbers, spaces, or other characters in your sequence, don't worry, they won't affect the calculation. For more information on the solubility model used here, see the references below.

Current Model (2009)

This model was created using logistic regression of 32 possible parameters. In addition, the protein database used to create this model was increased to 212 proteins. Results from the model were 94% accurate when compared to lab results. Parameters used for this model include:

Please enter the average pI value and molecular weight in the boxes below. These values can be calculated using the pI/Mw tool developed by the Swiss Institute of Bioinformatics.

Average pI

Molecular Weight (No Commas)

Protein Sequence


A. Diaz, E Tomba, R. Lennarson, R. Richard, M. Bagajewicz, and R.G. Harrison. 2009. Prediction of Protein Solubility in Escherichia coli Using Logistic Regression. Biotechnol. Bioeng. 105(2):374-383. PDF file

Previous Model (1991)

This model was created using discriminant analysis of 6 possible parameters. The parameters used include (In order of decreasing correlation):


