The powerful statistical capabilities of R programming language include a large selection of built-in methods and third-party libraries that contain an array of machine learning algorithms which can be applied for classification, clustering and predictive analytics. The hands-on “Machine Learning with R” course explores practical applications of the most frequently used machine learning approaches such a Multiple Linear, Polynomial (Non-Linear) and Logistic Regressions, k-Means and Hierarchical Clustering, k-Nearest Neighbours, Naive Bayes and Decision Trees algorithms through the R statistical environment. It also provides a good introduction to more advanced techniques e.g. Random Forests and simple implementations of Artificial Neural Networks. The course is suitable for data scientists, researchers, data analysts, developers and engineers, who are currently using R language (preferably at intermediate level) and would like to expand their skills to include machine learning and predictive analytics toolkit.
During the “Machine Learning with R” training course, your delegates will be introduced to a variety of machine learning algorithms for classification and clustering, and their practical scenarios on real-word data using R language. Apart from this, they will learn to evaluate the predictive models based on the obtained classification metrics such as sensitivity, specificity, F-score, Kappa etc., and optimise the accuracy and efficiency of these models using various methods of cross-validation, grid-search and performance boosting.
The course is suitable for data scientists, researchers, data analysts, developers and engineers, who are currently using Python language (preferably at intermediate level) and would like to expand their skills to include machine learning and predictive analytics toolkit.
Please note this training course doesn’t include Deep Learning approaches – our “Deep Learning with R” course is specifically designed to cover these methods in detail.
Basic course information
Minimum recommended duration: 4-6 full days or 8-12 half-days (can be spread across multiple weeks)
Programming languages used: R
Minimum number of attendees: 5
Course level: For pre-intermediate/intermediate users of R, excellent as a “refresher” for more experienced, senior analysts.
Pre-requisites: Some practical experience in data analytics using R is recommended for delegates attending this course. A good knowledge of statistics and interest in machine learning techniques will be beneficial. It is advisable that the course is preceded with our “Applied Data Science with R”.
IT recommendations: In order to benefit from the contents of the course it is recommended that attendees have the most recent version of R and R Studio software installed on their personal/company laptops (any operating system). As R is a free environment you can download it directly from www.r-project.org website and RStudio is available at https://www.rstudio.com/products/rstudio/#Desktop. Please contact us should you have any questions related to the installation process or should you wish to use a different setup for your course.
Programme outline
The programme for each in-house training course is discussed and agreed individually with the client. The proposed contents of the course may include (but is not limited to) the following concepts and topics:
Predict continuous target variables with different regression analysis techniques including multiple linear regressions, stepwise regressions, Lasso/Ridge regularised regressions, non-linear (polynomial) regressions and methods of their evaluation and optimisation:
Implementation of OLS and robust approaches for linear models, understanding gradient descent algorithm and its exact and iterative solutions,
Understanding density functions and OLS normality assumption: screening for outliers, testing for normality (QQ-plots, histograms, Shapiro-Wilk and Kolmogorov-Smirnov tests), continuous data normalisation techniques, testing for multi-collinearity (creating correlation matrices, heatmaps etc.),
Methods of stepwise regression: forward, backward, both directions, and their
optimisation based on specific evaluation metrics: AIC, BIC, RMSE, MSE, R squared,Fitting polynomial regressions across all predictors with different degrees and flexibility, using splines for particular predictors only,
Regularisation approaches for polynomials (Lasso, Ridge, Elastic Net), searching for optimal lambda hyperparameter, overfitting vs underfitting.
Apply k-means and hierarchical clustering algorithms for feature selection, dimensionality reduction and customer segmentation purposes:
Methods of selecting k-number of clusters, scree plot, standardisation and normalisation techniques for numeric and categorical data, calculating between and within clusters sums of squares, calculating centroid locations, using the clustering labels for deriving profiling attributes across clusters/segments, optimising clustering solutions, plotting clusters using principal components analysis,
Implementing hierarchical clustering algorithm using different distance calculations (e.g. Euclidean, Manhattan, Minkowski, correlation-based etc.) and various linkage solutions (single, complete, average); visualising cluster and understanding dendrograms, extracting segments based on the dendrogram cuts, using the extracted clusters for estimating cluster profiles.
Implement selected classification algorithms e.g. logistic regression and Naïve Bayes for binary and multinomial classification tasks:
Understanding probabilities and log-odds, varying the flexibility of polynomials in the logistic regression, setting the cut-off threshold of probabilities for classification purposes, feature engineering in classification tasks.
Carry out evidence-based model selection depending on obtained classification metrics e.g. confusion matrix, sensitivity, specificity, F score, Kappa statistic, logarithmic loss, R-squared, mean absolute error, root mean squared error, Gini score, area under ROC curve etc.,
Automate optimisation of classification models through cross-validation and grid-search methods: creating used-defined grid search functions with varying evaluation metrics and cross-validation complexity, selecting “the best” models,
Apply more advanced classification and predictive analytics algorithms e.g. decision trees and their ensembles e.g. random forests and adaptive boosting in more complex machine learning applications:
Implementing recursive partitioning algorithm with Gini and entropy methods for decision trees – understanding decision rules, tree pruning, applying weights/penalties to classes, evaluation of the tree’s decisions, creating decision tree plots, searching for optimal tree size using a grid search with user-defined weights, sizes of leaf nodes, minimum number of examples in the node, types of splits etc.,
Creating a forest of decision trees (random forests) with varying subsets of
examples and features used for splits – introduction to ensembles, improving the model with adaptive boosting algorithm,Building a simple and a multi-layered neural network for classification and regression tasks with the backpropagation algorithm and different settings of various hyperparemeters, using different activation functions and varying complexity of network topologies.
Customise the course
We can adapt our in-house training courses to address your specific needs and requirements e.g.:
The course can be designed to include your own data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems,
The course period can be spread across multiple weeks/months depending on your needs and availability – this will allow your delegates to revise and practise the learnt skills before the next session and provide them with additional time to internalise all presented material,
The course can include a custom project spread across several weeks/months with a follow-up session at the end of the period,
As all our in-house training courses are quoted individually, the final cost quotation will be based on several factors: the number of attendees, days of training (plus additional support/project guidance if needed), location of the training, complexity of IT setup and the extent of course customisation.
Arrange this course at your organisation
If you are interested in this in-house training course, please press Ask For Quote button in the top part of the page to enquire about and request a quote for this course based on your specific needs and desired outcomes of the training.
In your enquiry please include the following information:
contact details to a person who should receive the quote,
number of delegates you would like to train,
approximate number of days (or half-days) you would like to arrange the course for (including additional support/project guidance if needed),
location of the training venue,
any details on course customisation or specific topics you would like the course to address – most importantly, please indicate desired outcomes of the course if different then presented above,
any other questions you may have.