We encourage you to discuss your specific in-house training needs with us directly. We are able to deliver our data analysis, Big Data processing, machine learning or data visualisation training services to businesses and organisations of any size. We can provide our training courses to clients based anywhere in Europe, Asia or North and South America.

We offer flexible training solutions depending on your requirements and we are happy to discuss your specific areas of interest and skill gaps in order to provide a bespoke data science training. Please take a look at some of our currently run in-house training courses listed below:

 

Applied Data Science – in R or Python – 3-5 days


Brief Description

During the “Applied Data Science” in-house training course the attendees will learn how to apply R or Python languages to carry out all essential data management, cleaning and processing activities. The course is suitable for data and insights analysts/scientists, data engineers and data product developers who are responsible for pre-processing of data, analytics and reporting of findings.

 

The course takes the attendees from data import of specific file formats e.g. Excel spreadsheets (xls, xlsx), standard data formats (csv, tab, txt), through modern data manipulation techniques in R or Python, data summaries and Exploratory Data Analysis (e.g. data aggregations, cross-tabulations, frequency tables, pivot tables etc.), calculating specific descriptive statistics, to hypothesis and inferential testing using simple statistical methods e.g. correlations, t-tests and multiple regressions.

 

The course also includes an introduction to basic data visualisation techniques which can be used as simple methods of Exploratory Data Analysis or for presentation of results.


Programme Outline

The contents of the course may include (but is not limited to) the following concepts and topics:

    • Introduction to R or Python language and its integrated development environments; Basic settings and functions,
    • Mathematical functions and control flow operators; Help and support; Using third-party, external packages in R or Python,
    • Understanding R/Python data structures and their limitations,
    • Data input and export; Importing data from standard (e.g. CSV, tab-delimited, text etc.) and prioprietary file formats e.g. Excel spreadsheets, SPSS/Stata files etc.,
    • Basic data manipulation techniques: adding/deleting rows/columns; Sampling; Flagging/identifying specific cases based on conditional search; Sorting cases; Adding/editing value and variable labels; Dealing with missing data; Reshaping data from long/narrow into wide formats,
    • Exploratory Data Analysis: inspecting the structure of data sets; Cross-tabulations, ‘pivot tables’, data aggregations and descriptive statistics (measures of central tendency and dispersion); Vertical/horizontal merging of data objects,
    • Introduction to data visualisations in R/Python: simple EDA plots e.g. histograms, density plots, scatterplots, box plots, bar plots, line graphs etc.,
    • Tests of differences and correlations; Testing for normality assumptions: QQ, density plots and test-specific normality measurements; One-sample, matched-sample and independent t tests; Correlations and simple regressions; Test-specific visualisation methods; Effect size and power estimation,
    • Introduction to data modelling: ANOVA and multiple regressions.


Other Details

    • Minimum recommended duration: 3-5 full days (can be spread across multiple weeks)
    • Programming languages used: R or Python
    • Maximum number of attendees: 12
    • Course level: For beginners/novice, also good as a “refresher” for more advanced analysts.
    • Pre-requisites: It is recommended that the attendees have practical experience in data processing or quantitative research – gathered from either professional work or university education/research. A good knowledge of statistics would be beneficial.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Machine Learning – with R or Python – 3-5 days


Brief Description

The “Machine Learning” course focuses on practical applications of common predictive analytics and knowledge extraction techniques in R or Python language. These will include clustering and classification algorithms such as k-Means, k-Nearest Neighbours, Naive Bayes, Multiple Linear Regressions and Logistic as well as Poisson Regressions. The training course will also provide a good introduction to more advanced machine learning methods such as Decision Trees, Support Vector Machines, Random Forests and Artificial Neural Networks.

This training course is designed for users with some experience in R or Python languages and clients who would like to extend their data science activities to include data modelling and predictive analytics.


Programme Outline

The contents of the course may include (but is not limited to) the following concepts and topics:

    • Extensive coverage of frequently used and robust clustering and classification methods available in R or Python e.g. k-Means, k-Nearest Neighbours, Naive Bayes, Linear and Logistic Regressions etc.,
    • Introduction to more advanced machine learning algorithms such as Decision Trees, Support Vector Machines, Random Forests and Artificial Neural Networks,
    • Implementation of R/Python functions for Machine Learning in practical scenarios – business/finance or socio-economic datasets will be used whenever possible,
    • Estimating model performance metrics and understanding evaluation and model improvement techniques,
    • Introduction to applications of Machine Learning methods to Big Data – implementation of selected Machine Learning algorithms using Cloud Computing, Spark with R/Python and H2O platforms.


Other Details

    • Minimum recommended duration: 3-5 full days (can be spread across multiple weeks)
    • Programming languages used: R or Python
    • Maximum number of attendees: 12
    • Course level: For pre-intermediate/intermediate users of R or Python, excellent as a “refresher” for experienced, senior analysts.
    • Pre-requisites: Some practical experience in data analytics using R or Python is recommended. A good knowledge of statistics and interest in machine learning techniques will be beneficial.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Introduction to Hadoop – with elements of Java language – 3-5 days


Brief Description

During the “Introduction to Hadoop” training course, the attendees will become familiar with major characteristics and functionalities of Apache Hadoop platform and its ecosystem of tools for Big Data processing and analysis. The course provides a first-hand practical experience in Hadoop Distributed File System (HDFS) and MapReduce frameworks. The attendees will learn to design and perform simple MapReduce programs to process the data and calculate a set of statistics. The course can also serve as a gentle introduction to the basics of Java programming language and essential Hadoop File System Unix-like shell commands.

This training course is designed for clients who consider migrating their Big Data workflows to the Hadoop ecosystem or wish to upskill their analytics team with essential Big Data processing knowledge.


Programme Outline

The contents of the course may include (but is not limited to) the following concepts and topics:

    • Understanding of the features, major characteristics, architecture and operations of Hadoop and its ecosystem including, Yet Another Resource Negotiator (YARN), Hadoop Distributed File System, MapReduce programming framework and other Hadoop-related tools e.g. HBase, Hive, Cassandra, Mahout and Pig,
    • Monitoring and diagnostics of the performance of Apache Hadoop clusters and their resources using Apache Ambari,
    • Management of large datasets in Hadoop Distributed File System (HDFS) using Hadoop File System shell commands,
    • How to design and execute simple MapReduce parallel programs (written in Java) for computing various statistics and to control their performance in real-time,
    • Practical applications of learnt skills to deploy and provision Hadoop-based Big Data applications.


Other Details

    • Minimum recommended duration: 3-5 full days (can be spread across multiple weeks)
    • Programming languages used: Java (also HDFS shell commands and basics of SQL for Hive querying)
    • Maximum number of attendees: 12
    • Course level: Intermediate
    • Pre-requisites: Good IT skills and practical experience in manipulating large datasets are recommended. Some knowledge of Java and Unix commands will be beneficial, however these will be explained during the training.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Introduction to Spark – with elements of Scala language – 3-5 days


Brief Description

The “Introduction to Spark” training course focuses on applications of Apache Spark engine to fast Big Data processing. During the course the attendees will be introduced to Scala language for large-scale data management and also to MLlib and GraphX Spark libraries for advanced data analytics: Machine Learning and Graph Analytics. Additionally, other essential Spark-related projects and third-party Spark packages will be presented.

The course is an excellent choice on its own, but it can also be run as an add-on to the “Introduction to Hadoop” training course or as one of the pre-requisite courses before the “Big Data Methods in R” course.


Programme Outline

During the course the attendees will learn the following:

    • To understand characteristics and features of the Apache Spark processing engine and supported data structures,
    • To design and deploy Spark applications for Big Data analytics,
    • To apply Scala language for data management, processing and analysis within the Apache Spark engine,
    • To combine SQL querying, streaming and advanced analytics using built-in Spark libraries: Spark SQL, Spark Streaming, MLlib and GraphX,
    • To extend the built-in functionalities of Spark and its native libraries by making the most of third-party packages and related projects for Spark,
    • To develop the understanding of Spark’s connectivity with other Big Data analytics tools within the Hadoop ecosystem or externally.


Other Details

    • Minimum recommended duration: 3-5 full days (can be spread across multiple weeks)
    • Programming languages used: Scala (also HDFS shell commands and basics of Java)
    • Maximum number of attendees: 12
    • Course level: Intermediate
    • Pre-requisites: Good IT skills and practical experience in manipulating large datasets are recommended. Some knowledge of Unix commands will be beneficial, however this will be explained during the training.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Big Data Methods in R – 3-5 days


Brief Description

The “Big Data Methods in R” course is a good choice for organisations which want to leverage their existing R skills and extend them to include R’s connectivity with Big Data storage solutions (e.g. SQL/NoSQL databases) and processing engines (Hadoop, Spark, H2O etc.).

During this training course, the attendees will be provided with essential know-how on applications of R language to manage, manipulate and analyse out-of-memory data and datasets stored in a distributed file systems or large databases. The course can also serve as a good introduction to Cloud Computing (Amazon Web Services and Microsoft Azure) and tools that support Big Data analytics.


Programme Outline

Throughout the course the attendees will learn the following concepts (however the course contents may be streamline or extended depending on your needs):

    • Use third-party R packages, which support parallel computing in order to increase the speed and processing capabilities of R,
    • Work on large data sets in the Cloud (Microsoft Azure and Amazon EC2) through R deployed on the server,
    • Implement MapReduce framework through Hadoop straight from R console,
    • Manage Hadoop Distributed File System and HBase database through R,
    • Connect to and extract, aggregate and manage the data in major relational SQL-based database management systems (RDBMSs) using a variety of R packages,
    • Apply NoSQL queries to access, transform and manipulate large data sets in MongoDB using R packages,
    • Improve the data flow and speed of processing of large data sets through R’s connectivity with Spark,
    • Implement selected Big Data tools in the Big Data Product Cycle with R.


Other Details

    • Minimum recommended duration: 3-5 full days (can be spread across multiple weeks)
    • Programming languages used: R (plus basics of SQL)
    • Maximum number of attendees: 12
    • Course level: Beginners-Intermediate
    • Pre-requisites: Pre-intermediate skills in data management, processing and analytics in R language. Understanding of basic concepts of Big Data analytics.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Querying Relational Databases with SQL – 3-4 days


Brief Description

The course covers all essential aspects of working with relational databases using Structured Query Language and provides the attendees with much needed practice in data extraction, database modifications and data aggregations with SQL.

During the course the attendees will firstly become familiar with simple methods used for database querying and data summaries. They will then proceed to more advanced topics in SQL e.g. joins, data cleaning (e.g. data extraction from strings, dates and timestamps), data aggregations, writing nested queries and subqueries.


Programme Outline

Throughout the course the attendees will learn and practise the following topics:

    • Understanding of characteristics of relational databases and differences between available Relational Database Management Systems (RDBMSs),
    • Creating tables within relational databases and importing data,
    • Using basic SQL statements to query created tables and extract data of interest,
    • Calculating data summaries, aggregations and cross-tabulations using SQL queries,
    • Implementing AND/OR, IN, LIKE, HAVING and CASE statements for advanced querying,
    • Splitting tables and applying joins,
    • Modifying, altering and updating databases in SQL,
    • Applying advanced data manipulations in relational databases: sorting cases, recoding values, writing nested queries and subqueries, working with strings, dates and timestamps, column indexing etc.


Other Details

    • Minimum recommended duration: 3-4 full days (can be spread across multiple weeks)
    • Programming languages used: SQL
    • Maximum number of attendees: 20
    • Course level: For absolute beginners
    • Pre-requisites: Interest in relational databases. Understanding of basic data management processes and analysis.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Data Visualisation Techniques in R – 4-5 days


Brief Description

This course is designed to address all most essential topics in graphical data visualisation in R. The attendees will be presented with numerous examples of how to create and edit different types of graphs, plots, diagrams, and charts using both in-built standard R functions and more advanced syntax from external packages such as ggplot2 etc. Apart from static data visualisation techniques, the course will serve as a good introduction to advanced methods such as interactive graphs in rCharts, ggvis and plotly, as well as reactive plotting using Shiny framework. All presented techniques will be practised during tutorial sessions.


Programme Outline

Throughout the course the attendees will learn the following concepts (however the course contents may be streamline or extended depending on your needs):

    • Correctly use colours and apply palettes for different types of graphs and plots,
    • Set default graphical parameters and edit them for specific visualisations,
    • Plot various types of static graphs in core R and third-party packages e.g. ggplot2, lattice etc.; Bar plots, line graphs, histograms, density plots, polygons, pie charts, scatterplots, box plots, violin plots, correlograms, spider plot etc.,
    • Edit, add and adjust specific graphical and textual elements of created visualisations such as axis and value labels, titles/subtitles, annotations, ticks, margins, backgrounds, plotting symbols, line types, colours, borders, secondary axes, legends and many more,
    • Superimpose elements from one type of graph on another e.g. a density curve on histogram etc.,
    • Statistical test-specific data visualisations and available graphical options e.g. plotting two descriptive statistics on one graph,
    • Create templates and simple dashboards with multiple visualisations and textual annotations on one page in core R and ggplot2,
    • Add interactivity to visualisations in R: introduction to advanced plotting in plotly, rshiny, rCharts, ggvis and javascript-based highcharts etc.,
    • Export data visualisations from R to various formats for publishing.


Other Details

    • Minimum recommended duration: 4-5 full days (can be spread across multiple weeks)
    • Programming languages used: R (additionally some elements of JavaScript, HTML and CSS will be introduced as well)
    • Maximum number of attendees: 12
    • Course level: Beginners-Intermediate
    • Pre-requisites: Pre-intermediate skills in data management, processing and analytics in R language. Understanding of basic concepts of graphical data visualisations.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.


Statistical Disclosure Control with R – 3-4 days


Brief Description

The “Statistical Disclosure Control with R” training course has been designed for organisations, governmental departments, research institutes and private companies, which process, manage and analyse socio-economic microdata and want to safeguard the identity of individuals using modern statistical approaches. The course provides an in-depth knowledge on theory, specific statistical methods and practical applications of Statistical Disclosure Control – a growing area of research in data processing and statistics focused on minimising disclosure risk of socio-economic datasets.


Programme Outline

During the course the attendees will learn to:

    • Understand and appreciate the motivation for Statistical Disclosure Control methods from data science, data protection and legal perspectives,
    • Apply a variety of modern data science techniques to process microdata in R language and its third-party packages for data manipulations and transformations,
    • Differentiate between data types and classes of variables from data science and SDC perspectives; implement SDC workflows in varying disclosure scenarios depending on selection of key categorical and continuous variables,
    • Generate contingency tables and estimate sample and population frequencies,
    • Perform calculations of individual, cluster and global disclosure risks,
    • Carry out essential Statistical Disclosure Control methods in R language such as recoding, local suppression, micro-aggregation and post-randomisation,
    • Calculate the effect of applied SDC modifications on individual and global risk of perturbed datasets, their information loss and data utility,
    • Report and communicate the results of SDC interventions,
    • Apply the above SDC methods to special cases e.g. datasets coming from different file formats (including proprietary tools such as Stata and SPSS), multiple datasets which can or cannot be linked together, Big Data etc.


Other Details

    • Minimum recommended duration: 3-4 full days (can be spread across multiple weeks)
    • Programming languages used: R
    • Maximum number of attendees: 12
    • Course level: Beginners-Intermediate
    • Pre-requisites: Pre-intermediate skills in data management, processing and analytics in R language. Understanding of basic concepts of statistics e.g. exploratory data analysis, linear models, basic probability theory.

We can adapt our in-house training courses to address your specific needs and requirements e.g.:

    • The course can be designed to include client’s data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems.
    • The course can include a project spread across several weeks with a follow-up session at the end of the period.
    • The final cost quotation will be based on the number of attendees, days of training (plus additional support/project guidance if needed), location of the training and the extent of course customisation.
    • If you are interested in this in-house training course please Contact Us to discuss your specific needs and desirable outcomes of the training.