Help & Resources


Overall workflow

This flowchart, as presented in Tropsha, A. (2010). Best Practices for QSAR Model Development, Validation, and Exploitation. Molecular Informatics, 29(6-7), 476-488. , forms the basis of the Chembench workflow. Chembench implements the best practices detailed in that review.

Chembench breaks that entire process down into three major steps: Dataset creation, Modeling, and Prediction. Each of the steps may be used independently. For example, a medicinal chemist or toxicologist could use the predictors we have made available in the Prediction section to help determine activity of a specific compound.

Dataset Creation

Dataset workflow

The dataset workflow handles preprocessing of the compounds, including structure standardization using the JChem standardizer. An external set is specified, and descriptors and visualizations are generated. When the dataset creation is finished, the dataset, the selected external set, and visualizations can be viewed and downloaded. See the Dataset help section for more details.


Modeling workflow

The Modeling workflow generates a predictor composed of an ensemble of models. After descriptors have been selected, the dataset's modeling set is split into several training and test sets, and one or more models is created for each train-test split. Once the predictor has been created, it is used to predict the activity of the dataset's external set, so that the predictor's accuracy can be evaluated. In addition, a second predictor is created for validation purposes by y-Randomization modeling. These techniques are thoroughly described in the publication linked above. See the Modeling help section for more details on modeling in Chembench.


Prediction workflow

To make a prediction, a user selects one or more predictors and a dataset. See the Prediction help section for more details.