Apache Spark has revolutionised the way in which large organisations ingest and process Big Data and is now quickly becoming the industry standard for large scale and near real-time analytics. During this 2-day course, the attendees will be introduced to the basics of the Spark architecture, its data structures and compatibility with other Big Data and analytical tools (e.g. Hadoop, Hive, SQL, R and Python). They will be also provided with essential skills in Scala language allowing them to easily design and deploy Spark applications on a multi-node, parallel computing cluster. The course will also provide an introduction to machine learning techniques available in Spark (through Spark ML and MLlib) and examples of network/graph analytics (using GraphX).
During the training course, the attendees will learn:
to use Scala language, Spark engine and its libraries for data import/export from/to various file formats and storage systems (e.g. standard file formats like csv, tab, txt, or Hadoop, Amazon S3 buckets, Hive etc.),
to understand the structure and operations applicable to Resilient Distributed Datasets, DataFrames and other Spark data structures and objects; Spark transformations and actions,
to manipulate data within Spark – converting between various data structures, recoding values, joins and merges, working with timestamps and strings, preparing data for further processing, applying Spark ML transformers e.g. normalisation or standardisation,
to calculate descriptive statistics and carry out essential exploratory data analysis including data aggregation and summaries, cross-tabulations, frequency/contingency tables etc.,
to deploy fully-functional Big Data machine learning Spark applications using Spark ML pipelines – multiple linear regressions for predicting numeric continuous target variable and Generalized Linear Models e.g. logistic regression for binary classification – a tutorial on Spark ML and MLlib libraries,
to carry-out model cross-validation; to calculate and interpret models evaluation metrics e.g. accuracy, recall, precision, R squared, ROC curve, MSE and RMSE,
to manipulate and extract information from graphs and networks, estimate essential network/graph parameters e.g. degrees, triangles, or (strongly) connected components and apply graph algorithms e.g. PageRank or label propagation – a tutorial on Spark GraphX,
to understand and appreciate the compatibility of Spark with other Big Data and data science tools (e.g. Hadoop, Hive, RDBMSs) and programming languages (e.g. Java, Python and R).
The course will run for two days from 9:30am until ~5:00pm on each day and will consist of alternating lecture-style presentations and practical tutorials. The example datasets used during tutorial sessions will come from social sciences, economics and business fields, however the contents may vary depending on specific interests of participants (based on the Participant’s Skills Inventory). There will be two 15-minute coffee/tea breaks and one 1-hour lunch break on each day.
What is included?
Apart from the contents of the course, Mind Project will provide the participants with the following:
- a digital (USB memory stick) Course Manual including all presentation slides, Spark/Scala course script files, datasets and a list of reference books and online resources,
- additional home exercises and all data sets available to download,
- Wi-Fi access,
- Central London location – a 1-min walk from the Barbican station, 5 minutes away from Farringdon and St. Paul’s stations, 15 minutes from the Liverpool Street Station,
- networking opportunity,
- Mind Project course attendance certificate.
In order to fully benefit from the training course, we recommend that attendees bring their personal WiFi-enabled laptops to the session with at least one of the following web browsers installed: Chrome, Safari, Mozilla Firefox and/or Internet Explorer. Also, the laptops should be equipped with a simple text editor suitable for code/script typing e.g. Notepad++ (for Windows users) or TextWrangler (for Mac users). Please be advised that we do not recommend the following applications: WordPad, Gedit or TextEdit.
This course is targeted at IT literate users with interest in Big Data processing and architecture. No prior exposure to the Spark engine or Hadoop ecosystem and its tools is required.
Participants are encouraged to complete the online Participant’s Skills Inventory to allow Mind Project and our course tutors to customise the contents of the course depending on the level of participants’ knowledge and their areas of interest. The data obtained through the Participant’s Skills Inventory will be held fully-confidential and will only be used to provide a quality data analysis training.
Deadline for registrations
The deadline for registrations on this training course is Friday, 9th of March 2018 at 16:00 London (UK) time. However, Mind Project reserves the right to end the registration process earlier if all places are booked before the deadline.
Prices and discounts
- £450 + VAT (£540) per person for the whole course (regular fee).
- £325 + VAT (£390) per person for the whole course for UK registered undergraduate and postgraduate students, and representatives of registered charitable organisations (discounted fee).
- For group bookings of 4 and more participants, please contact us directly.
Please mind that the course fee DOES NOT include the following:
- transport to and from the venue,
- accommodation and lunch.
Please contact us should you have any questions about this course. You may also want to visit the Training Courses – Frequently Asked Questions website, which gives further practical details about Mind Project training courses. You can book your place on the course by clicking Book ticket button in the top section of the course page. Please note that we accept all major credit/debit cards (through the PayPal and Stripe systems) and BACS payments. We can only confirm fully-paid bookings. Please contact us for other payment options e.g. if a Purchase Order is required. Please read Training & Events Terms & Conditions before your purchase.
The course will be held at CAP House, 1st Floor, 9-12 Long Lane, London, EC1A 9HA. Please see the map below.