Using Machine Learning to Generate Hypotheses 1225 Data Set Identify outcome and predictor variables, continuous and categorical variables Split the data into two parts: 90% training data,10% unseen data 90% 10% Do not touch the unseen data until the model is finalized Use the training data to build the model Impute missing values in the training data Impute missing values in the unseen data Imputation Parameters Conduct exploratory analyses to build a model with reasonable accuracy Make informed guesses about the viable range of each model parameter No Assess accuracy of the machinelearning model's predictions in the unseen data Conduct hyperparameter search to identify a model with high accuracy Does the model have reasonably high accuracy in the unseen data? Yes Output Conduct additional analyses to explain the model using the training data Fig. 1. Flowchart showing the machine-learning procedure used in Study 1.