Programming Hadoop with Apache Pig
4.3 (3 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
388 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Programming Hadoop with Apache Pig to your Wishlist.

Add to Wishlist

Programming Hadoop with Apache Pig

Programming Hadoop with Apache Pig
4.3 (3 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
388 students enrolled
Last updated 12/2015
English
Current price: $10 Original price: $20 Discount: 50% off
1 day left at this price!
30-Day Money-Back Guarantee
Includes:
  • 1.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Have a coupon?
What Will I Learn?
Learn Pig basics
Programming Hadoop with Apache Pig
View Curriculum
Requirements
  • Hadoop Basic terms
Description

Apache Pig is an open source tool that you can analyze large datas . It converts developed scripts to hadoop MapReduce jobs

This course will inform you about pig basics . Tutorials include pig basics , horton installation and some of important examples

If you want to develop hadoop MapRecude easily , you should take this course .

Who is the target audience?
  • Big Data Developers
  • Hadoop Developers
Students Who Viewed This Course Also Viewed
Curriculum For This Course
Expand All 17 Lectures Collapse All 17 Lectures 01:17:12
+
Pig Basics and Installation of Horton Sandbox
3 Lectures 12:06


Sample data

-------------------

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d25,EN

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d26,BE

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d27,TR

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d28,US


Pig script

-----------------------------


PC_INFO = LOAD '/user/data/sampledata1' USING PigStorage(',') AS

(

date_time :long,

id_computer: chararray,

country :chararray

);

DUMP PC_INFO;

Running Example Pig File in Hortonworks Sandbox
03:45
+
Data Types and Operators
4 Lectures 22:32

log file

--------

11,1144121498080,133.5f,122123.45,US,010001

12,2144121498080,133.6f,122123.46,TR,111100

13,3144121498080,133.7f,122123.47,US,110001

14,4144121498080,133.8f,122123.48,EN,111100

15,5144121498080,133.9f,122123.49,RU,011001

16,6144121498080,133.0f,122123.40,US,110100


pig script

---------------

TYPED_DATA = LOAD '/user/data/sampledata2' USING PigStorage(',') AS

(

intTypedData : int,

longTypedData : long,

floatTypeddata: float,

doubleTypedData: double,

chararrayTypedData: chararray,

bytearrayTypedData: bytearray

);

DUMP TYPED_DATA;

Data Types
05:08

Operators
04:39

Example Data

Distinct


1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d25,EN

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d27,TR

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US


Pig


DATA = LOAD '/user/data/sampledata3' USING PigStorage(',') AS

(

timestamp :long,

computerID :chararray,

countryCode :chararray

);

DUMP DATA;

DISTINCT_DATA = DISTINCT DATA;

DUMP DATA;




Filter tuples with given condition

DATA = LOAD '/user/data/sampledata3' USING PigStorage(',') AS

(

timestamp :long,

computerID :chararray,

countryCode :chararray

);

DUMP DATA;

FILTERED_DATA = FILTER DATA BY countryCode == 'US' ;

DUMP FILTERED_DATA ;

Relational Operators (DISTINCT , FILTER)
04:05

log file 1

1,US,5,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=car

2,US,25,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=watch

3,TR,20,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=table

4,EN,10,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=car+games

5,PL,16,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=car+games+download

6,US,24,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=home

7,US,36,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=travel

8,EN,48,https://www.google.com/?gfe_rd=cr&ei=djNDVsWFOsqo8wf70JP4Bw&gws_rd=cr#q=car


.code 1


DATA = LOAD '/user/data/sampledata4' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray

);

DATA_GROUPED = GROUP DATA BY countryCode;

DUMP DATA_GROUPED;


code 2

DATA = LOAD '/user/data/sampledata4' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray

);

DATA_GROUPED = GROUP DATA BY countryCode;

RESULT = FOREACH DATA_GROUPED {

GENERATE

group ,

AVG(DATA.durationTime);

}

DUMP RESULT;


log file 2


1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d25,EN

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d27,TR

1441214980800,2664c6bb-9261-42bf-a5c4-534436f50d24,US


code 3



DATA = LOAD '/user/data/sampledata3' USING PigStorage(',') AS

(

timestamp: long,

computerID: chararray,

countryCode: chararray

);

DATA_GROUPED = GROUP DATA BY countryCode;

RESULT = FOREACH DATA_GROUPED {

GENERATE

group as countryCode,

COUNT(DATA);

}

DUMP RESULT;


code 4


DATA = LOAD '/user/data/sampledata3' USING PigStorage(',') AS

(

timestamp: long,

computerID: long,

countryCode: chararray

);

RESULT = FOREACH DATA {

S = FILTER DATA BY country == 'US';

GENERATE

COUNT(S);

}

DUMP RESULT;







Relational Operators (GROUP , FOREACH)
08:40
+
FUNCTIONS
3 Lectures 12:46

log file

1,US,5,https://www.google.com.tr/#q=apache+pig,apache,pig

2,US,25,https://www.google.com.tr/#q=apache+hive,apache,hive

3,TR,20,https://www.google.com.tr/#q=apache+hadoop,apache,hadoop

4,EN,10,https://www.google.com.tr/#q=apache+oozie,apache,oozie

5,PL,16,https://www.google.com.tr/#q=apache%20flume,apache,flume

6,US,24,https://www.google.com.tr/#q=apache+spark,apache,spark

7,US,36,https://www.google.com.tr/#q=apache+kafka,apache,kafka

8,EN,48,https://www.google.com.tr/#q=apache+storm,apache storm in hadoop ,storm


avg pig


DATA = LOAD '/user/data/samplelogfile' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray,

keyword1: chararray,

keyword2: chararray

);

GROUPED_DATA = GROUP DATA BY countryCode;

RESULT = FOREACH GROUPED_DATA {

GENERATE

group ,

AVG(DATA.durationTime);

}

DUMP RESULT;


concat pig


DATA = LOAD '/user/data/samplelogfile' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray,

keyword1: chararray,

keyword2: chararray

);

RESULT = FOREACH DATA {

GENERATE

url ,

CONCAT(keyword1,keyword2) as combinedKeywords;

}

DUMP RESULT;


Eval Functions 1 (AVG , CONCAT)
04:23

log file

1,US,5,https://www.google.com.tr/#q=apache+pig,apache,pig

2,US,25,https://www.google.com.tr/#q=apache+hive,apache,hive

3,TR,20,https://www.google.com.tr/#q=apache+hadoop,apache,hadoop

4,EN,10,https://www.google.com.tr/#q=apache+oozie,apache,oozie

5,PL,16,https://www.google.com.tr/#q=apache%20flume,apache,flume

6,US,24,https://www.google.com.tr/#q=apache+spark,,spark

7,US,36,https://www.google.com.tr/#q=apache+kafka,,kafka

8,EN,48,https://www.google.com.tr/#q=apache+storm,apache storm in hadoop ,storm


max min functions


DATA = LOAD '/user/data/samplelogfile' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray,

keyword1: chararray,

keyword2: chararray

);

GROUPED_DATA = GROUP DATA BY countryCode;

RESULT = FOREACH GROUPED_DATA {

GENERATE

group ,

MAX(DATA.durationTime) as maxDurationTime,

MIN(DATA.durationTime) as minDurationTime;

}

DUMP RESULT;

size pig


DATA = LOAD '/user/data/samplelogfile' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray,

keyword1: chararray,

keyword2: chararray

);

RESULT = FOREACH DATA {

GENERATE

SIZE(keyword1) as numberOfCharacher;

}

DUMP RESULT;

Preview 04:07

log file


1,US,5,https://www.google.com.tr/#q=apache+pig,apache,pig

2,US,25,https://www.google.com.tr/#q=apache+hive,apache,hive

3,TR,20,https://www.google.com.tr/#q=apache+hadoop,apache,hadoop

4,EN,10,https://www.google.com.tr/#q=apache+oozie,apache,oozie

5,PL,16,https://www.google.com.tr/#q=apache%20flume,apache,flume

6,US,24,https://www.google.com.tr/#q=apache+spark,,spark

7,US,36,https://www.google.com.tr/#q=apache+kafka,,kafka

8,EN,48,https://www.google.com.tr/#q=apache+storm,apache storm in hadoop ,storm


sum pig


DATA = LOAD '/user/data/samplelogfile' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray,

keyword1: chararray,

keyword2: chararray

);

GROUPED_DATA = GROUP DATA BY countryCode;

RESULT = FOREACH GROUPED_DATA {

GENERATE

group ,

SUM(DATA.durationTime) as totalDurationTime;

}

DUMP RESULT;


tokenize pig


DATA = LOAD '/user/data/samplelogfile' USING PigStorage(',') AS

(

id: int,

countryCode: chararray,

durationTime: int,

url: chararray,

keyword1: chararray,

keyword2: chararray

);

FILTERED_DATA = FILTER DATA BY id==8;

RESULT = FOREACH FILTERED_DATA {

GENERATE

TOKENIZE(keyword1);

}

DUMP RESULT;



Eval Functions 3 (SUM , TOKENIZE )
04:16
+
JOIN and UNION Operators
4 Lectures 16:02

Employee log

John,27,1

David,30,2

Peter,29,3


Department log

1,Sales

2,Marketing

3,Engineering


Pig code

CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

name: chararray,

age: int,

dept_id: int

);

DEPARTMENT = LOAD '/user/data/departmentlog' USING PigStorage(',') AS

(

dept_id: int,

dept_name: chararray

);

JOINED_DATA = JOIN CUSTOMER BY dept_id , DEPARTMENT BY dept_id;

DUMP JOINED_DATA;


foreach code

FINAL = FOREACH JOINED_DATA {

GENERATE

CUSTOMER::name,

CUSTOMER::age,

DEPARTMENT::dept_name;

}

DUMP FINAL;

JOIN Operator 1 (INNER JOIN)
03:54

Customers log

1,John

2,David

3,Peter

Orders log

100,1,2014-01-29 23:56:57.700

200,4,2015-02-29 01:56:57.700

300,3,2013-03-29 23:56:57.700

Pig Script

CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

customerId: int,

name: chararray

);

ORDERS = LOAD '/user/data/orderlog' USING PigStorage(',') AS

(

orderId: int,

customerId: int,

orderDate: chararray

);

JOINED_DATA = JOIN CUSTOMER BY customerId LEFT OUTER , ORDERS BY customerId;

DUMP JOINED_DATA;


right join code

CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

customerId: int,

name: chararray

);

ORDERS = LOAD '/user/data/orderlog' USING PigStorage(',') AS

(

orderId: int,

customerId: int,

orderDate: chararray

);

JOINED_DATA = JOIN CUSTOMER BY customerId RIGHT OUTER , ORDERS BY customerId;

DUMP JOINED_DATA;


JOIN Operator 2 (LEFT JOIN, RIGHT JOIN)
05:05

Customers log


1,John

2,David

3,Peter



Orders log

100,1,2014-01-29 23:56:57.700

200,4,2015-02-29 01:56:57.700

300,3,2013-03-29 23:56:57.700



Pig Script

CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

customerId: int,

name: chararray

);

ORDERS = LOAD '/user/data/orderlog' USING PigStorage(',') AS

(

orderId: int,

customerId: int,

orderDate: chararray

);

JOINED_DATA = JOIN CUSTOMER BY customerId FULL OUTER , ORDERS BY customerId;

DUMP JOINED_DATA;

JOIN Operator 3 (FULL OUTER JOIN)
03:22

Example

Customer log

1,Sam

2,John



Employee log

1,John

2,David

3,Peter


Pig Script


CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

customerId: int,

name: chararray

);

EMPLOYEE = LOAD '/user/data/employeelog' USING PigStorage(',') AS

(

employeeId: int,

employeeName: chararray

);

MERGED_DATA = UNION CUSTOMER, EMPLOYEE;

DUMP MERGED_DATA;

UNION Operator
03:41
+
Pig Commands and UDF
3 Lectures 13:46

Example log



1,John,US

2,Sarah,TR

3,John,US

4,David,EN

5,Peter,PL


Pig Script


CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

customerId: int,

name: chararray,

country: chararray

);

RESULT = FOREACH CUSTOMER {

GENERATE

name ,

country;

}

STORE RESULT INTO '/user/data/customeroutput' using PigStorage(',');


Storing MySql Database
%declare PARAM_DB_URL 'jdbc:mysql://192.168.1.1:3306/my_db'

%declare PARAM_DB_USERNAME 'username1'

%declare PARAM_DB_PASSWORD 'password1'

register /user/lib/mysql-connector-java-5.1.21.jar

CUSTOMER = LOAD '/user/data/customerlog' USING PigStorage(',') AS

(

customerId: int,

name: chararray,

country: chararray

);

RESULT = FOREACH CUSTOMER {

GENERATE

name ,

country;

}

STORE RESULT INTO into 'my_table' using org.apache.pig.piggybank.storage.DBStorage(

'com.mysql.jdbc.Driver','$PARAM_DB_URL','$PARAM_DB_USERNAME','$PARAM_DB_PASSWORD','insert into my_table (name,country) values (?,?)');


STORE Command
04:28

Example log

1,John,US

2,Sarah,TR

3,John,US

4,David,EN

5,Peter,PL





Pig Script


CUSTOMER = LOAD '/user/data/customer' USING PigStorage(',') AS

(

customerId: int,

name: chararray,

country: chararray

);

RESULT = FOREACH CUSTOMER {

GENERATE

name ,

country;

}

fs -mkdir /user/data/temp

fs -cp /user/data/customer /user/data/temp

File Commands (mv , cp , mkdir ....)
03:45

Log file

1,John,US

2,Sarah,TR

3,John,US

4,David,EN

5,Peter,PL



Step 2 - Add hadoop and pig based dependencies

<dependency>

<groupId>org.apache.pig</groupId>

<artifactId>pig</artifactId>

<version>0.9.0</version>

</dependency>

<dependency>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-core</artifactId>

<version>0.20.2</version>

</dependency>


java code

package com.test;

import java.io.IOException;

import org.apache.pig.EvalFunc;

import org.apache.pig.data.Tuple;

public class Uppercase extends EvalFunc<String> {

public String exec(Tuple input) throws IOException {

if (input == null || input.size() == 0)

return null;

try {

String str = (String) input.get(0);

return str.toUpperCase();

} catch (Exception e) {

throw new IOException("Caught exception processing input row ", e);

}

}

}


pig code


REGISTER com.test.jar

-- Define function for use.

DEFINE Uppercase com.test.Uppercase();

CUSTOMER = LOAD '/user/data/customer' USING PigStorage(',') AS

(

customerId: int,

name: chararray,

country: chararray

);

RESULT = FOREACH CUSTOMER {

GENERATE

Uppercase(name);

}

DUMP RESULT;



User Defined Functions (UDF)
05:33
About the Instructor
Serkan Sakınmaz
4.4 Average rating
34 Reviews
1,858 Students
2 Courses

I was born in Ankara . I have graduated from Istanbul University from computer engineering department.I have been making projects on Java and Java web technology for 7 years.I am also a Java&Android trainer at a private corporation of education.In my free time ,I write Java,Big Data and Android based blogs in my personal blogger.

More from Serkan Sakınmaz