The goal of the project is to predict the manner in which the test subjects performed a bicep curl in five possible ways, type: A, B, C, D, or E. This is the classe
variable in the training set. Thus, this project features a multinomial machine learning task where prediction accuracy will be the most important outcome. The project is completed as a submission for the John Hopkins University course, Practical Machine Learning, hosted by Coursera.
library(data.table)
library(pander)
library(lubridate)
library(xlsx)
library(xgboost)
library(statsr)
library(caret)
library(plyr)
library(dplyr)
The train and test data was provided by the Coursera course assignment page.
train = fread("pml-training.csv", na.strings = "", showProgress = FALSE, drop = "V1") %>% tbl_df
test = fread("pml-testing.csv", na.strings = "", showProgress = FALSE, drop = "V1") %>% tbl_df
The data is based on the Weight Lifting Exercises study from the following source:
Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013.
Read more: http://groupware.les.inf.puc-rio.br/har#weight_lifting_exercises#ixzz4kBkyfAnE
During Stage 1, the training set’s original 159 features will be inspected to identify what information they represent, modified to produce the best formats for the machine learning task, and eliminated from consideration, if necessary. In addition, a XGBoost Gradient Boosting Model will be fit for the sole purpose of identifying important interactions between the features using additional tools provided with the Xgbfi program. These interaction terms will then be added to the training data to use as additional features for fitting the model.
The first step is to convert the features to the appropriate types for the information they contain. The following code identifies features that need to be converted to factors and dates, as well as the features that will not be changed and the features that will be removed. These features are saved in separate data frame to be put back together at a later step.
## Convert features to factor type
to_factor = train %>% select(user_name, new_window) %>% tbl_df %>% mutate_if(is.character,as.factor)
## Convert features to date type
to_date = train %>% select(cvtd_timestamp) %>% mutate(cvtd_timestamp = dmy_hm(cvtd_timestamp))
## No change needed
no_chg = train %>% select(num_window:total_accel_belt, gyros_belt_x:total_accel_arm, gyros_arm_x:magnet_arm_z, roll_dumbbell:yaw_dumbbell, total_accel_dumbbell, gyros_dumbbell_x:yaw_forearm, total_accel_forearm, gyros_forearm_x:classe)
## Removal from data for high NA
## > 95%
to_rm = train %>% select(raw_timestamp_part_1, raw_timestamp_part_2, kurtosis_roll_belt:var_yaw_belt, var_accel_arm:var_yaw_arm, kurtosis_roll_arm:amplitude_yaw_arm, kurtosis_roll_dumbbell:amplitude_yaw_dumbbell, var_accel_dumbbell:var_yaw_dumbbell, kurtosis_roll_forearm:amplitude_yaw_forearm, var_accel_forearm:var_yaw_forearm)
## extra the dependent feature
train_y = select(train, classe)
train_y$classe = factor(train_y$classe)
The features that do not need any changes represent numeric values that are already numeric types except for the classe
variable, which is put into its own data frame called train_y
. Many of the original features are removed from the data since they are over 95% NA values and, thus, are not informative.
The factor features require additional processing and are converted to separate, numeric factors using one hot encoding. If the factor has only two levels, such as the case with new_window
, the factor levels are simply converted to either a 0 or 1. This step will ensure that the data is compatible with most learning algorithms as some algorithms are not compatible with the factor type.
two_level = select(to_factor, new_window)
two_level$new_window = as.numeric(two_level$new_window) - 1
dummies <- dummyVars(classe ~ ., data = bind_cols(train_y, select(to_factor, -new_window)))
to_factor = predict(dummies, newdata = bind_cols(train_y, to_factor)) %>% tbl_df
to_factor = bind_cols(two_level, to_factor)
cvtd_timestamp
The unique time information is split into new factors for the day, month, year, hour, and minute. This will allow any learning algorithm to pick up on the relationship between each unit of time and classe
.
to_date = to_date %>% mutate(day = day(cvtd_timestamp), month = month(cvtd_timestamp), year = year(cvtd_timestamp), hour = hour(cvtd_timestamp), minute = minute(cvtd_timestamp)) %>% select(-cvtd_timestamp)
Finally, the remaining numeric variables are scaled to ensure they are compatible with all learning algorithms. Some algorithms use Euclidean distances or weight coefficients based on the scale of each feature, so scaling ensures all numeric variables have the same scale to eliminate any possible bias unequal scaling could produce.
no_chg = no_chg %>% select(-classe)
scale_save = function(x) {
scaled = scale(x)
center = attributes(scaled)$`scaled:center`
scale = attributes(scaled)$`scaled:scale`
return(list(center=center, scale=scale))
}
scaled_list = apply(no_chg, 2, scale_save)
no_chg = no_chg %>% scale %>% tbl_df %>% apply(2, as.numeric) %>% tbl_df
A data frame with all the independent variables that were processed in the preceding steps is then bound together.
train_x = bind_cols(to_factor, to_date, no_chg)
Often, interactions exist between different features that provide important information for the sake of extracting patterns from the data. Not all learning algorithms account for these interactions, so any important interactions in this dataset will be added to the training set so they are available to any learning algorithm.
In this case, XGBoost, a gradient boosting algorithm that does identify interactions, will be used to identify important interactions between the current features.
# Prep data for xgboost
train_x_num = as.matrix(train_x)
train_x_label = as.numeric(train_y$classe)
train_x_matrix = xgb.DMatrix(data = train_x_num, label = train_x_label)
# Fit the model
bst <- xgboost(data = train_x_matrix,
nround = 100, # default 100
eta = 0.1, # default 0.3
max.depth = 6, # default = 6
gamma = 0, # default 0, if train error >>> test error, bring gamma into action
min_child_weight = 1, # default = 1
subsample = 1, # default = 1
colsample_bytree = 1, # default 1
objective = "multi:softmax",
eval_metric = "merror",
num_class = 6)
All the default settings for XGBoost are used except for 0.1 for eta to produce the model bst
. The eta parameter effects the learning rate of the model and a smaller value means the model is more robust to overfitting.
The “Gain” measure is one of the ways to rank features by the amount of information they contribute to the model.
# plot the most important features
xgb.plot.importance(xgb.importance(colnames(train_x_num, do.NULL = TRUE, prefix = "col"), model = bst), top_n = 28)
This plot shows the top 28 features by information gain. num_window
and roll_belt
are two features with the highest gain.
To extract the information from the model concerning interactions, the xgbfi program is used. This is run in the terminal on a Mac OS and requires an ‘xgb.dump’ and ‘fmap.txt’ file. Fortunately, the xgb.dump function creates these files.
# Dump the model to file for Xgbfi
featureList <- names(train_x)
featureVector <- c()
for (i in 1:length(featureList)) {
featureVector[i] <- paste(i-1, featureList[i], "q", sep="\t")
}
write.table(featureVector, fmap_path, row.names=FALSE, quote = FALSE, col.names = FALSE)
xgb.dump(model = bst, fname = dump_path, fmap = fmap_path, with_stats = TRUE)
## [1] TRUE
The procedure to find model interactions using Xgbfi is similar to the one found here: http://projects.rajivshah.com/blog/2016/08/01/xgbfi/
Xgbfi was cloned from its github repository. The ‘xgb.dump’ and ‘fmap.txt’ produced in the step above are placed in the same directory as the XgbFeatureInteractions.exe
program. The program is then run and outputs an excel file with interaction at multiple depths. The following screenshot depicts the terminal command with xgbfi settings and the resulting excel file:
knitr::include_graphics('xgbfi.png')
The information from Xgbfi is imported back into RStudio using read.xlsx
from the xlsx
package.
depth0 = read.xlsx(xlsx_path, sheetIndex = 1) %>% tbl_df %>% mutate(interact_order = 1)
depth1 = read.xlsx(xlsx_path, sheetIndex = 2) %>% tbl_df %>% mutate(interact_order = 2)
depth2 = read.xlsx(xlsx_path, sheetIndex = 3) %>% tbl_df %>% mutate(interact_order = 3)
depth3 = read.xlsx(xlsx_path, sheetIndex = 4) %>% tbl_df %>% mutate(interact_order = 4)
interact = bind_rows(depth0, depth1, depth2, depth3)
interact$interact_order = factor(interact$interact_order)
pandoc.table(interact %>% group_by(interact_order) %>% summarize(Q1_gain = quantile(Gain, 0.25), med_gain = median(Gain), mean_gain = mean(Gain), Q3_gain = quantile(Gain, 0.75), max_gain = max(Gain)))
interact_order | Q1_gain | med_gain | mean_gain | Q3_gain | max_gain |
---|---|---|---|---|---|
1 | 195.8153 | 568.6647 | 4099.190 | 2819.074 | 59438.22 |
2 | 1297.6438 | 2363.7473 | 5088.008 | 6541.869 | 48432.97 |
3 | 2593.9023 | 4943.0702 | 6671.207 | 8375.872 | 31642.17 |
4 | 3525.6711 | 5246.2755 | 7105.213 | 7709.871 | 42424.81 |
Summary statistics for each number of interactions are computed to review the distribution of information gains for the 2-way, 3-way and 4-way feature interactions as well as for the original features.
Then, the interactions in the top 25% of information gain for the level preceding it are added. This filtering step means only the most important feature interactions are added as opposed to all of the possible interactions. For instance, only those 2-way interactions with a gain greater than 2819, the top 25% for the non-interaction terms, are included. This cut off happens to be about the top 75% of two-way interactions in terms of information gain for 2-way interactions.
# Find only interactions with a gain above a certain threshold
two_way = interact %>% filter(interact_order == 2 & Gain > 2819) %>% select(Interaction) %>% as.matrix
three_way = interact %>% filter(interact_order == 3 & Gain > 6541) %>% select(Interaction) %>% as.matrix
four_way = interact %>% filter(interact_order == 4 & Gain > 8375) %>% select(Interaction) %>% as.matrix
# Function that extracts feature names and replaces a "|" symbol with a ":"
prep_interact_char_vec = function(ord_matrix) {
ord_matrix = as.character(ord_matrix)
for (i in 1:length(ord_matrix)) {
ord_matrix[i] = gsub("|", ":", ord_matrix[i], fixed = TRUE)
}
return(ord_matrix)
}
# Run the function for the three levels of interaction
two_way = prep_interact_char_vec(two_way )
three_way = prep_interact_char_vec(three_way)
four_way = prep_interact_char_vec(four_way)
With the names of the important feature interactions extracted, the interactions can now be added to the training set by calculating their values using the model.matrix
function.
# Function that takes a chacter vector of feature names and creates a matrix with each feature interactions actual values
prep_interacts_tbl = function(ord_char_vec, data) {
mod_mat_formula = ""
for (i in 1:length(ord_char_vec)) {
if (i == 1) {
mod_mat_formula = paste0(mod_mat_formula, ord_char_vec[i])
}
else {
mod_mat_formula = paste0(mod_mat_formula, " + ", ord_char_vec[i])
}
}
mod_mat_formula = paste0("~ ", mod_mat_formula)
add_matrix = model.matrix(as.formula(mod_mat_formula), data)
colnames(add_matrix)
rm_index = c()
for (i in 1:length(colnames(add_matrix))) {
if (!grepl(":", colnames(add_matrix)[i], fixed = TRUE)) {
rm_index = c(rm_index, i)
}
}
add_matrix = add_matrix[,-rm_index]
colnames(add_matrix) = gsub(":", "_", colnames( add_matrix))
return(tbl_df(add_matrix))
}
# Run the function for the three levels of interaction
two_way_tbl = prep_interacts_tbl(two_way, train_x)
three_way_tbl = prep_interacts_tbl(three_way, train_x)
four_way_tbl = prep_interacts_tbl(four_way, train_x)
glimpse(four_way_tbl)
## Observations: 19,622
## Variables: 21
## $ day_minute_num_window <dbl> ...
## $ num_window_roll_belt_user_name.carlitos <dbl> ...
## $ minute_num_window_accel_dumbbell_y <dbl> ...
## $ num_window_magnet_dumbbell_y_magnet_dumbbell_z <dbl> ...
## $ magnet_forearm_z_pitch_forearm_roll_forearm <dbl> ...
## $ magnet_dumbbell_y_roll_forearm_magnet_dumbbell_z <dbl> ...
## $ day_minute_num_window_roll_belt <dbl> ...
## $ day_num_window_roll_belt_gyros_belt_z <dbl> ...
## $ minute_num_window_accel_dumbbell_y_magnet_dumbbell_y <dbl> ...
## $ minute_magnet_forearm_z_pitch_forearm_roll_forearm <dbl> ...
## $ num_window_accel_dumbbell_y_magnet_dumbbell_y_yaw_belt <dbl> ...
## $ magnet_dumbbell_y_pitch_forearm_roll_forearm_magnet_dumbbell_z <dbl> ...
## $ magnet_dumbbell_y_pitch_forearm_magnet_dumbbell_z_yaw_belt <dbl> ...
## $ num_window_magnet_dumbbell_y_roll_forearm_magnet_dumbbell_z <dbl> ...
## $ minute_magnet_dumbbell_y_pitch_forearm_roll_forearm <dbl> ...
## $ pitch_forearm_magnet_dumbbell_z_yaw_belt_yaw_arm <dbl> ...
## $ minute_accel_dumbbell_y_magnet_dumbbell_y_yaw_belt <dbl> ...
## $ minute_magnet_dumbbell_y_roll_forearm_magnet_dumbbell_z <dbl> ...
## $ num_window_accel_dumbbell_y_yaw_belt_pitch_belt <dbl> ...
## $ minute_num_window_roll_forearm_magnet_dumbbell_z <dbl> ...
## $ roll_belt_magnet_dumbbell_y_pitch_belt_accel_forearm_x <dbl> ...
The glimpse above shows an example of the interactions columns.
The interaction features are then added to the training set and any possible duplicate columns this produces are removed.
train_x = bind_cols(train_x, two_way_tbl, three_way_tbl, four_way_tbl)
train_x = train_x[, !duplicated(colnames(train_x))]
As a final step before modeling, the training set is check for any features with zero variance. Such variables do not provide significant information and can often cause errors in learning algorithms. nearZeroVar
from the caret package is used in this step.
# Identify near zero variance predictors
nzv = nearZeroVar(train_x, names = TRUE, saveMetrics = TRUE)
indexes = rownames(nzv[nzv$nzv,])
pandoc.table(data.frame(indexes = indexes, nzv[nzv$nzv,]) %>% tbl_df %>% arrange(percentUnique))
indexes | freqRatio | percentUnique | zeroVar | nzv |
---|---|---|---|---|
year | 0.00000 | 0.00509632 | TRUE | TRUE |
new_window | 47.33005 | 0.01019264 | FALSE | TRUE |
num_window_user_name.carlitos | 471.71429 | 0.66761798 | FALSE | TRUE |
roll_belt_user_name.carlitos | 393.09524 | 3.29222301 | FALSE | TRUE |
While there are four features with near zero variance, the year
feature is problematic if retained for modeling. All of its entries are for the same year, so it will be removed. The other features are kept as new_window
, for instance, is an important feature according to the XGBoost analysis above even though only 1% of its entries are unique.
train_x = select(train_x, -year)
Now, the training data is appropriately preprocessed and ready for modeling.
To model this data, a random forest will be used. Random forests are an ensemble classifier that consist of many decision trees and is one of the most accurate learning algorithms to use out of the box. It has many advantages including its ability to handle thousands of input variables without variable deletion, resistance to overtraining, and the ability to handle data without preprocessing. The weight lifting data has been preprocessed significantly, so the primary reason a random forest is used here is due to the algorithm’s high accuracy, resistance to overtraining, and ability to handle high dimensionality (train_x
has 147 variables after preprocessing).
The caret
and ranger
packages will be combined for this step to fit a random forest model to the training data. The caret
package will consider different numbers of randomly selected predictors (mtry) as it fits the model. The algorithm makes this selection by estimating the prediction error using a 5x5 K-fold cross validation for each possible setting of the mtry parameter.
fit_dat = bind_cols(train_x, train_y)
set.seed(123)
model_mtry <- train(
classe ~ .,
data = fit_dat,
tuneLength = 9,
tuneGrid = data.frame(mtry = c(2:10)),
method = "ranger",
trControl = trainControl(method = "repeatedcv", number = 5, repeats = 5, verboseIter = TRUE)
)
Now, the caret model can be printed to find out which mtry value was selected.
model_mtry
## Random Forest
##
## 19622 samples
## 157 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times)
## Summary of sample sizes: 15697, 15698, 15697, 15697, 15699, 15698, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9985017 0.9981048
## 3 0.9989501 0.9986721
## 4 0.9991846 0.9989686
## 5 0.9992661 0.9990717
## 6 0.9993171 0.9991362
## 7 0.9993782 0.9992136
## 8 0.9993782 0.9992136
## 9 0.9994088 0.9992522
## 10 0.9993782 0.9992136
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 9.
It appears a mtry of 9 yields the greatest prediction accuracy.
The model is refit with only mtry=9, but, this time, with a 10x10 K-Fold cross-validation to produce a more accurate error estimate.
fit_dat = bind_cols(train_x, train_y)
set.seed(123)
model <- train(
classe ~ .,
data = fit_dat,
tuneLength = 1,
tuneGrid = data.frame(mtry = c(9)),
method = "ranger",
trControl = trainControl(method = "repeatedcv", number = 10, repeats = 10, verboseIter = TRUE)
)
The performance characteristics of the model are checked using a confusion matrix.
mod_pred = data.frame(pred = predict(model, train_x), obs = train_y$classe)
# multiClassSummary(mod_pred, lev = c("A","B","C","D","E"))
confusionMatrix(mod_pred$pred, reference = mod_pred$obs, mode="everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 5580 0 0 0 0
## B 0 3797 0 0 0
## C 0 0 3422 0 0
## D 0 0 0 3216 0
## E 0 0 0 0 3607
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.9998, 1)
## No Information Rate : 0.2844
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 1.0000 1.0000 1.0000 1.0000
## Specificity 1.0000 1.0000 1.0000 1.0000 1.0000
## Pos Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Neg Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Precision 1.0000 1.0000 1.0000 1.0000 1.0000
## Recall 1.0000 1.0000 1.0000 1.0000 1.0000
## F1 1.0000 1.0000 1.0000 1.0000 1.0000
## Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Balanced Accuracy 1.0000 1.0000 1.0000 1.0000 1.0000
The model has a 100% accuracy rate for the in-sample data and correctly identifies all the classes. While 100% is the estimated out-of-sample error rate for this model since these accuracy numbers were computed from a 10x10 cross-validation, one should always be skeptical of a perfect model.
Going into to the details of the cross validation run by the caret
package, the range of the accuracies for each of the 100 folds (from the 10x10 k-fold cross validation) is calculated. In addition, a bootstrap inference to estimate the median is run as well to estimate the out-of-sample accuracy (error). The median is chosen as the measure of center for estimating accuracy since the accuracies are heavily left skewed towards 100%.
kfold_acc = data.frame(acc=model$resample$Accuracy)
range(kfold_acc$acc)
## [1] 0.9984694 1.0000000
inference(acc, data = kfold_acc, statistic = c("median"), type = "ci", method = "simulation", boot_method = "se")
## Single numerical variable
## n = 100, y_med = 0.9995, Q1 = 0.9995, Q3 = 1
## 95% CI: (0.9991 , 0.9999)
The minimum accuracy for the cross validation was 0.9985 and the maximum was 1.0000. The median estimate for the actual accuracy is 0.9995 ± 0.0004. Thus, even when splitting the training data into ten random parts, ten different times and fitting the model one hundred different ways, the accuracy was extremely high. Using the training data size as an example, one would expect the model to misclassify 19622 - 0.9995 * 19622 = 9.8, or about ten of the rows in a data set of the same size.
The final step is to use the model to predict the test data set of 20 unclassified measurements.
First, all the changes that were applied the training set must be applied to the test set. One major difference in the procedure is the center and scaling information for the training set must be used to scale the test set, or else the scaled test set will not have the correct values.
# Convert features to proper types
to_factor = test %>% select(user_name, new_window) %>% tbl_df %>% mutate_if(is.character,as.factor)
to_date = test %>% select(cvtd_timestamp) %>% mutate(cvtd_timestamp = dmy_hm(cvtd_timestamp))
no_chg = test %>% select(num_window:total_accel_belt, gyros_belt_x:total_accel_arm, gyros_arm_x:magnet_arm_z, roll_dumbbell:yaw_dumbbell, total_accel_dumbbell, gyros_dumbbell_x:yaw_forearm, total_accel_forearm, gyros_forearm_x:problem_id)
to_rm = test %>% select(raw_timestamp_part_1, raw_timestamp_part_2, kurtosis_roll_belt:var_yaw_belt, var_accel_arm:var_yaw_arm, kurtosis_roll_arm:amplitude_yaw_arm, kurtosis_roll_dumbbell:amplitude_yaw_dumbbell, var_accel_dumbbell:var_yaw_dumbbell, kurtosis_roll_forearm:amplitude_yaw_forearm, var_accel_forearm:var_yaw_forearm)
# One hot
two_level = select(to_factor, new_window)
two_level$new_window = as.numeric(two_level$new_window) - 1
dummies <- dummyVars(problem_id ~ ., data = bind_cols(select(test, problem_id), select(to_factor, -new_window)))
to_factor = predict(dummies, newdata = bind_cols(select(test, problem_id), to_factor)) %>% tbl_df
to_factor = bind_cols(two_level, to_factor)
# Extracting unique time info from `cvtd_timestamp`
to_date = to_date %>% mutate(day = day(cvtd_timestamp), month = month(cvtd_timestamp), year = year(cvtd_timestamp), hour = hour(cvtd_timestamp), minute = minute(cvtd_timestamp)) %>% select(-cvtd_timestamp)
# Scale continous variables using training centers and variance
no_chg = no_chg %>% select(-problem_id)
for (i in 1:ncol(no_chg)) {
center = scaled_list[[i]]$center
scale = scaled_list[[i]]$scale
no_chg[,i] = scale(no_chg[,i], center = center, scale = scale)
}
no_chg = no_chg %>% tbl_df %>% apply(2, as.numeric) %>% tbl_df
# Create dataset of indpendent variables
test_x = bind_cols(to_factor, to_date, no_chg)
# Prep interact terms
two_way_tbl = prep_interacts_tbl(two_way, test_x)
three_way_tbl = prep_interacts_tbl(three_way, test_x)
four_way_tbl = prep_interacts_tbl(four_way, test_x)
# Stitch final data
test_x = bind_cols(test_x, two_way_tbl, three_way_tbl, four_way_tbl)
test_x = test_x[, !duplicated(colnames(test_x))]
# Remove zero variance
test_x = select(test_x, -year)
The predictions for the twenty new measurements are found using the predict function.
test_pred = data.frame(case=1:20, pred=predict(model, test_x))
pandoc.table(head(test_pred))
case | pred |
---|---|
1 | B |
2 | A |
3 | B |
4 | A |
5 | A |
6 | E |
Only the first six are shown here to protect the integrity of the project for others since this information is public. When the predictions were submitted for the test data on Coursera, the model did achieve a perfect prediction accuracy of 20 out of 20.
Most of this project, as with many machine learning projects, was spent on feature engineering and management. Since a random forest model was used, many of the feature manipulations, such as scaling the data and finding interactions, were not as necessary since random forests account for these issues. However, it never hurts to practice feature engineering as it grants greater flexibility for algorithm choice for fitting models. Moreover, if the patterns in the data were not as clear or the classification task was different, the random forest may not have been as successful and other models would have to be used. Either way, the extra steps taken in preprocessing the data help to prepare and extract information from the basic features supplied at the beginning of the project. While the prediction accuracy for the final model is extremely high, it remains to be seen whether it will be as accurate in all instances of bicep curls measurements using the same sensors. However, the model performs perfectly for the prediction goal of this project.