Practical Guide to Data Engineering

Practical Guide to Data Engineering
Date: Tue 17 Sep 2024 - Tue 17 Sep 2024
Register by: 13 September 2024

Navigating the variety of solutions in the data engineering space can be overwhelming. There are many choices that cover a wide variety of use-cases depending on the volume, variability and velocity of your data, and the technical language and claims made by technology vendors can be a barrier to understanding. 

Aimed at intermediate leaners, this course will help demystify this space with introductions presented in plain, accessible language to topics including when to use Relational vs NoSQL databases, data-streams and moving data, message queues and topics, querying strategies, data warehouse vs data lake.

In this level course, you will:

The Many Varieties of Database and Choosing a Suitable Persistence Solution 

To introduce and explain in plain, accessible language the main types of databases using open-source solutions such as Postgres, Cassandra, MongoDB and Neo4j. We will explain a range of fundamental differences and the factors involved when choosing a particular solution for different use-cases. We will explain important concepts such as data formats, data-files, partitioning, primary and secondary indexes, data normalisation vs de-normalisation, views & materialised-views, and CAP theorem (Consistency, Availability, Partitioning). We will cover classic relational database solutions (RDBMs) and the diverse types of NoSQL DB including Document Stores, Wide-Column Stores, Time-Series, and Graph DBs.  

Caching and In-Memory Data 

When latency is of critical concern, in-memory data caches can significantly increase the performance of your application. We will introduce the main caching strategies using open-source solutions including Redis and in-memory databases.  

Moving Data: Streams, Queues, Topics and Brokers 

To introduce the main concepts and patterns behind data streaming, messaging and asynchronous message channels in plain, accessible language using open-source solutions such as Kafka, Pulsar and RabbitMQ.  We will explain a range of different patterns such as Topics vs Queues, Pub-Sub, Competing Consumer and Message Routing, and explain what specific use-cases they can address.

There are no pre-requisites for this course

Create a free account to our Training Portal to register for a course and browse all available training courses.

Register now

Join Newsletter

Provide your details to receive regular updates from the STFC Hartree Centre.