SQL, NoSQL, Big Data and Hadoop
What you'll learn
- Build an intuition from RDBMS system through NoSQL to the Big Data on the Cloud and Hadoop platform
- Understand various distributed database classifications
- Understand when and how to use Redis or Key-Value Stores
- Understand when and how to use MongoDB or Document-oriented databases
- Understand and use HBase as a Wide-Columnar Store
- Understand and use Time series database (InfluxDB)
- Understand and use Elasticsearch as a search engine
- Understand and use Neo4J as a Graph Database Management System
- Understand large scale distributed data storage and processing in Hadoop
- Understand when and how to use and build Streaming architecture with Apache Kafka
- Use Apache Hive and Understand where to use it in respect to big data platforms
- Understand a number of SQL-on-Hadoop Engines and how they work
- Understand how to use data engineering capabilities to enable a data-driven organization
Requirements
- No strict requirement but knowledge of relational database will be helpful.
- A Windows, Linux or Mac Machine to set up a lab
- Any Hadoop Vendor Sandbox like Cloudera Quickstart or HDP VM (Hadoop)
Description
A comprehensive look at the wide landscape of database systems and how to make a good choice in your next project
The first time we ask or answer any question regarding databases is when building an application. The next is either when our choice of database becomes a bottleneck or when we need to do large-scale data analytics.
This course covers almost all classes of databases or data storage platform there are and when to consider using them. It is a great journey through databases that will be great for software developers, big data engineers, data analysts as well as decision makers. It is not an in-depth look into each of the databases but promises to get you up and running with your first project for each class.
In this course, we are going to cover
Relational Database Systems, their features, use cases and limitations
Why NoSQL?
CAP Theorem
Key-Value store and their use cases
Document-oriented databases and their use cases
Wide-columnar store and their use cases
Time-series databases and their use cases
Search Engines and their use cases
Graph databases and their use cases
Distributed Logs and real time streaming systems
Hadoop and its use cases
SQL-on-Hadoop tools and their use cases
How to make informed decisions in building a good data storage platform
What is the target audience?
Chief data officers
Application developer
Data analyst
Data architects
Data engineers
Students
Anyone who wants to understand Hadoop from a database perspective.
What this course does not cover?
This course does not access any of the databases from the administrative perspective. So we don't cover administrative tasks like security, backup, recovery, migration and the likes.
Very in-depth features in the specific databases in discussion. An example is that we will not go into the different database engines for MySQL or how to write a stored procedures.
What are the requirements?
The lab for this course can be carried out in any machine (Microsoft Windows, Linux, Mac OX).
However, the training on HBase or Hadoop will require you to have a hadoop environment. The suggestion for this will be to to use a pre-installed sandbox, a cloud offering or install your own custom sandbox.
What do I need to know to get the best out of this course?
This course does not assume any knowledge of NoSQL or data engineering.
However a little knowledge of RDBMS (even Microsoft Access) is enough to get you into the best position for this course.
Who this course is for:
- Chief Data Officers
- IT Decision Makers
- Database Architects
- Software Developers
- Big data Engineers
- Anyone who wants to understand the where each NoSQL class of database best fits.
- Anyone who is curious about NoSQL or Big Data Systems
Instructor
A big data specialist residing in Johannesburg, South Africa with 15 years of experience in various areas of software development include enterprise, mobile and database application. Michael currently holds a Lead Data Engineer role in Black Swan Data where he uses software and big data technologies to build and maintain a data platform for social predictions. His passion includes learning new technologies, implementing enterprise software and/or data systems and sharing the knowledge.
For the past four years, Michael has also been a Hadoop and Big data instructor/trainer at Dezyre (.com) academy where has trained over 300 students in 4 different continents in various topics like Hadoop, NoSQL and other big data technologies. These training sessions usually take place in form of a small group of individuals or in a one-on-one webinar meeting.
When he is not writing or learning software, Michael spends most of his time with family, or writing and producing Music with his wife.