During the “Applied Data Science with R” in-house training course the attendees will learn how to apply the R programming language to carry out essential data management, wrangling and processing activities. The course is suitable for data and insights analysts/scientists, data engineers and data product developers who are responsible for pre-processing of data, analytics and reporting of findings.
This course will introduce your attendees to all basic concepts of data processing and analysis in R environment. More specifically the delegates will learn to understand different types of data and common data structures available in R language, prepare, transform and manage datasets and their variables, export/import data from various file formats (Excel spreadsheets, csv, tab, txt etc.), create simple graphical representations of the data (bar plots, histograms, box plots etc.), obtain summaries, data aggregations, cross-tabulations, frequency and pivot tables, and run and explain results of basic statistical tests e.g. correlations, t-tests etc. The course will also provide an introduction to modelling using multiple linear regression methods and will introduce the attendees to data visualisation techniques for data reporting and research communication.
The course will cover modern approaches in applied data science using R language and its rich ecosystem of external libraries including tidyverse family of packages e.g. dplyr, ggplot2, tidyr, readr, tibble and other essential R libraries e.g. data.table, lubridate, Hmisc, readxl, haven etc.
Basic course information
Minimum recommended duration: 4-5 full days or 8-10 half-days (can be spread across multiple weeks)
Programming languages used: R
Minimum number of attendees: 5
Course level: For beginners/novice, also good as a “refresher” for more advanced analysts.
Pre-requisites: No prior knowledge of R is required from delegates attending this course, however a keen interest in data analysis is assumed. It is recommended that the attendees have practical experience in data processing or quantitative research – gathered from either professional work or university education/research. A good knowledge of statistics would be beneficial.
IT recommendations: In order to benefit from the contents of the course it is recommended that attendees have the most recent version of R and R Studio software installed on their personal/company laptops (any operating system). As R is a free environment you can download it directly from www.r-project.org website and R Studio is available at https://www.rstudio.com/products/rstudio/#Desktop. Please contact us should you have any questions related to the installation process or should you wish to use a different setup for your course.
Programme outline
The programme for each in-house training course is discussed and agreed individually with the client. The proposed contents of the course may include (but is not limited to) the following concepts and topics:
R environment: what is R?; Introduction to IDEs e.g. RStudio; Starting R environment; Basic settings and functions,
Mathematical functions and control flow operators; R-related help and support; Installing and running third-party packages,
R data structures: creating scalars, vectors, matrices, arrays, lists and other data objects in R; Creating and manipulating simple data frames,
Data import and export: reading/writing data from/to various file formats (Excel spreadsheets, standard file formats e.g. csv, tab, txt etc.),
Essential data processing: adding/deleting observations; sampling; flagging/identifying specific cases based on conditional search; sorting cases; adding/editing value and variable labels; dealing with missing data; reshaping data from long/narrow into wide formats; working with dates and timestamps,
Exploratory data analysis: inspecting the structure of data objects; cross-tabulations, data summaries, aggregations, frequency testing and descriptive statistics (measures of central tendency and dispersion); vertical/horizontal merging of data frames and other R objects,
Introduction to data visualisations: creating informative data visualisations using R core and third-party packages; essential exploratory plots e.g. histograms, density plots, scatterplots, box plots, bar plots, line graphs etc.; Using graphical parameters for adding/editing text, titles, lines, fonts, colours, axes, background and other elements of plots; Introduction to the Grammar of Graphics with the ggplot2 syntax,
Tests of differences and correlations; Testing for normality assumptions: QQ, density plots and test-specific normality measurements; One-sample, matched-samples and independent t tests; Correlations and simple regressions; Test-specific visualisation functions/packages; Effect size and power estimation,
Data modelling: ANOVA and multiple linear regressions – understanding multivariate inferential tests and statistical outputs; Using regressions for predictions on test data,
Creating a simple data product with R; From data cleaning, exploratory data analysis, data management, data wrangling to analysis, data visualisation, model optimisation and debugging.
Customise the course
We can adapt our in-house training courses to address your specific needs and requirements e.g.:
The course can be designed to include your own data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems,
The course period can be spread across multiple weeks/months depending on your needs and availability – this will allow your delegates to revise and practise the learnt skills before the next session and provide them with additional time to internalise all presented material,
The course can include a custom project spread across several weeks/months with a follow-up session at the end of the period,
As all our in-house training courses are quoted individually, the final cost quotation will be based on several factors: the number of attendees, days of training (plus additional support/project guidance if needed), location of the training, complexity of IT setup and the extent of course customisation.
Arrange this course at your organisation
If you are interested in this in-house training course, please press Ask For Quote button in the top part of the page to enquire about and request a quote for this course based on your specific needs and desired outcomes of the training.
In your enquiry please include the following information:
contact details to a person who should receive the quote,
number of delegates you would like to train,
approximate number of days (or half-days) you would like to arrange the course for (including additional support/project guidance if needed),
location of the training venue,
any details on course customisation or specific topics you would like the course to address – most importantly, please indicate desired outcomes of the course if different then presented above,
any other questions you may have.