Understand "Big Data" and grasp why, if you are a Developer, Database Administrator, Software Architect or a IT Consultant, why you should be looking at this technology stack
There are more job opportunities in Big Data management and Analytics than there were last year and many IT professionals are prepared to invest time and money for the training.
In the old days… you know… a few years ago, we would utilize systems to extract, transform and load data (ETL) into giant data warehouses that had business intelligence solutions built over them for reporting. Periodically, all the systems would backup and combine the data into a database where reports could be run and everyone could get insight into what was going on.
The problem was that the database technology simply couldn’t handle multiple, continuous streams of data. It couldn’t handle the volume of data. It couldn’t modify the incoming data in real-time. And reporting tools were lacking that couldn’t handle anything but a relational query on the back-end. Big Data solutions offer cloud hosting, highly indexed and optimized data structures, automatic archival and extraction capabilities, and reporting interfaces have been designed to provide more accurate analyses that enable businesses to make better decisions.
Better business decisions means that companies can reduce the risk of their decisions, and make better decisions that reduce costs and increase marketing and sales effectiveness.
This infographic from Informatica walks through the risks and opportunities associated with leveraging big data in corporations.
Big Data is Timely – A large percentage of each workday, knowledge workers spend attempting to find and manage data.
Big Data is Accessible – Senior executives report that accessing the right data is difficult.
Big Data is Holistic – Information is currently kept in silos within the organization. Marketing data, for example, might be found in web analytics, mobile analytics, social analytics, CRMs, A/B Testing tools, email marketing systems, and more… each with focus on its silo.
Big Data is Trustworthy – Organizations measure the monetary cost of poor data quality. Things as simple as monitoring multiple systems for customer contact information updates can save millions of dollars.
Big Data is Relevant – Organizations are dissatisfied with their tools ability to filter out irrelevant data. Something as simple as filtering customers from your web analytics can provide a ton of insight into your acquisition efforts.
Big Data is Authoritive – Organizations struggle with multiple versions of the truth depending on the source of their data. By combining multiple, vetted sources, more companies can produce highly accurate intelligence sources.
Big Data is Actionable – Outdated or bad data results in organizations making bad decisions that can cost billions.
Here I present a curriculum as to the current state of my Cloudera courses.
My Hadoop courses are based on Vagrant so that you can practice and destroy your virtual environment before applying the installation onto real servers/VMs.
For those with little or no knowledge of the Hadoop eco system Udemy course : Big Data Intro for IT Administrators, Devs and Consultants
I would first practice with Vagrant so that you can carve out a virtual environment on your local desktop. You don't want to corrupt your physical servers if you do not understand the steps or make a mistake. Udemy course : Real World Vagrant For Distributed Computing
I would then, on the virtual servers, deploy Cloudera Manager plus agents. Agents are the guys that will sit on all the slave nodes ready to deploy your Hadoop services Udemy course : Real World Vagrant - Automate a Cloudera Manager Build
Then deploy the Hadoop services across your cluster (via the installed Cloudera Manager in the previous step). We look at the logic regarding the placement of master and slave services. Udemy course : Real World Hadoop - Deploying Hadoop with Cloudera Manager
If you want to play around with HDFS commands (Hands on distributed file manipulation). Udemy course : Real World Hadoop - Hands on Enterprise Distributed Storage.
You can also automate the deployment of the Hadoop services via Python (using the Cloudera Manager Python API). But this is an advanced step and thus I would make sure that you understand how to manually deploy the Hadoop services first. Udemy course : Real World Hadoop - Automating Hadoop install with Python!
There is also the upgrade step. Once you have a running cluster, how do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and the Hadoop Services). Udemy course : Real World Hadoop - Upgrade Cloudera and Hadoop hands on
As a Developer, Administrator or Architect - Why should you consider "Big Data"
Part II - Whiteboarding some of the Hadoop Services
Part III - Whiteboarding some of the Hadoop Services
We look at some Hadoop distributors - apache.org, Cloudera, Hortonworks and MapR
Here we look at some of the pro and cons for accessing Hadoop Cloud Deployments. Amazon EMR and Microsoft Azure
Here we target some database tables and show how we can move tables from mysql into Hadoop.
Here we use the HIVE service to provide us a logical database within Hadoop. As Hadoop can handle petabytes of data, you sure be able to image a logical database within Hadoop that can crunch petabytes of data.
HIVE SERVICE II - We apply sql statements within Hadoop on the copied data.
HDFS SERVICE - We move some files into HDFS, ready for SPARK processing
I spent 6 years at "Royal Bank of Scotland" and 5 years at the investment bank "BNP Paribas" developing and managing Interest Rate Derivatives services as well as engineering and deploying In Memory DataBases (Oracle Coherence), NoSQL and Hadoop clusters (Cloudera) into production.
In 2016, I left to start my own training, POC-D. "Proof Of Concept - Delivered", which focuses on delivering training on IMDB (In Memory Database), NoSQL, BigData and DevOps technology.
From Q3 2017, this will also include FinTech Training in Capital Markets using Microsoft Excel (Windows), JVM languages (Java/Scala) as well as .NET (C#, VB.NET, C++/CLI, F# and IronPythyon)
I have a YouTube Channel, publishing snippets of my videos. These are not courses. Simply ad-hoc videos discussing various distributed computing ideas.
Check out my website and/or YouTube for more info
See you inside ...