How To Use The Train Test Split In Python

Import Necessary Libraries

Begin by importing train_test_split from sklearn.model_selection. You'll also need libraries like NumPy or pandas for dataset manipulation.

Prepare Your Dataset

Organize your data into features (X) and labels (y). For supervised learning, X is your input data, and y is the target variable.

Split the Data

Use train_test_split(X, y, test_size=0.2) to divide your dataset. The test_size parameter defines the proportion for testing (e.g., 20% test, 80% train).

Stratified Sampling

If you want a balanced split for classification tasks, use the stratify=y argument to maintain the class distribution in both training and testing sets.

Shuffling the Data

By default, train_test_split shuffles the data before splitting. Set shuffle=False if you need to preserve the original order, such as in time series data.

Training and Testing

After splitting, use the training set for model training and the test set for evaluation. This helps in assessing the model's performance on unseen data.