Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

【한글자막】 스파크 스트리밍과 Scala 로 빅 데이터 스트리밍하기 (실전편)

Name: 【한글자막】 스파크 스트리밍과 Scala 로 빅 데이터 스트리밍하기 (실전편)
Rating: 4.7 (14 reviews)

실시간으로 구조화된 스파크 스트리밍, 카프카 통합 및 실시간 스트리밍 빅 데이터를 다루는 스파크 스트리밍 튜토리얼

Created bySundog Education by Frank Kane, Frank Kane, Sundog Education Team, 웅진씽크빅 글로벌

Last updated 9/2024

Korean

What you'll learn

스파크스트리밍을 사용하여 실시간 대용량 데이터를 스트림 처리합니다
카프카, 플룸, 키네시스와 같은 데이터 소스를 스파크스트리밍을 이용하여 통합합니다
구조화된 스파크 2의 스트리밍 응용 프로그램 인터페이스를 사용합니다
스칼라 프로그래밍 언어를 사용하여 스파크 응용 프로그램을 만듭니다
출력된 결과는 실시간 데이터를 카산드라 또는 파일 시스템으로 변환시킨 것을 배웁니다
스파크 스트리밍을 스파크 구조화 질의어와 통합하여 실시간으로 스트리밍 데이터를 질문합니다
스트리밍 데이터를 사용하여 실시간 교육에 사용하고 머신러닝 모델을 가르쳐서 실시간 예측에 이용합니다
아파치 로그 액세스 데이터를 수집하고 스트림을 변환합니다
실시간 트위터 피드 스트림을 수용합니다
연속적인 입력 데이터 스트림을 거쳐 네트워크 연결 상태를 추적할 수 있는 데이터를 유지합니다
윈도우 시간 경과에 따른 스트리밍 데이터를 질문합니다

Course content

9 sections • 38 lectures • 6h 25m total length

팁: 지금 바로 트위터 개발자 계정을 신청하세요!0:41
소개 및 설정하기13:20
A brief introduction to the course, and then we'll get your development environment for Spark and Scala all set up on your desktop. A quick test application will confirm Spark is working on your system!
[활동] 스파크 스트리밍으로 라이브 트윗을 스트리밍 하세요!14:27
Get set up with a Twitter developer account, and run your first Spark Streaming application to listen to and print out live Tweets as they happen!
유데미 101: 이 코스를 최대한 활용하는 방법2:10

[활동] 스칼라 기본사항: 파트 124:27
We start our crash course in the Scala programming language by covering some basics of the language: types and variables, printing, and boolean comparisons.
[문제] 스칼라의 흐름 제어9:28
Our Scala crash course continues, illustrating various means of flow control in Scala. For loops, do/while loops, while loops, etc.
[문제] 스칼라의 기능9:08
Scala is a functional programming language, and so understanding how functions work and are treated in Scala is hugely important! This lecture covers the fundamentals, and lets you put it into practice.
[문제] 스칼라의 데이터 구조22:28
We wrap up our Scala crash course with commonly used data structures using in Spark with Scala. Tuples, lists, and maps.

스파크 소개7:06
Before you can learn about Spark Streaming, you need to understand how Spark itself works at a high level! This covers the why & how of Apache Spark, of which Spark Streaming is a component.
탄력적인 분산 데이터 세트 (RDD)10:40
The fundamental object of Spark programming is the Resilient Distributed Dataset (RDD), and this is used not just in Spark but also within Spark Streaming scripts. This lecture explains what they are, and what you can do with them.
[활동] RDD 작동 : 간단한 단어 수 응용 프로그램8:02
Let's walk through and actually run a simple Spark script that counts the number of occurrences of each word in a book.
스파크 스트리밍 소개6:32
We finally have all the pre-requisite knowledge to start talking about Spark Streaming itself in more detail! We'll cover how it works, what it's for, and its architecture.
[활동] 프린트윗츠 응용 프로그램 다시 보기7:31
Now that we know more, let's go revisit that first Spark Streaming application we ran in lecture 2, and dive into how it really works.
윈도우 설정: 장기간에 걸친 데이터 집계5:00
Windowing allows you to analyze streaming data over a sliding window of time, which lets you do much more than just transform streaming data and store it someplace else. We'll cover the concepts of the batch, window, and slide intervals, and how they work together to let you aggregate streaming data over some period of time.
스파크 스트리밍의 고장 허용 범위6:06
How can Spark Streaming do so much work continuously in a reliable manner? We'll uncover some of its tricks for reliability, as well as tips for configuring Spark Streaming to be as reliable as possible.

[문제] 트윗을 디스크에 저장하기13:24
We'll build on our "print tweets" example to actually store the incoming Tweets to disk, and illustrate how Spark Streaming can handle file output.
[문제] 평균 트윗 길이 추적하기10:17
Compute the average length of a Tweet, using windowing in Spark Streaming.
[문제] 가장 인기있는 해시태그 추적하기15:52
This is a fun one! We'll track the most popular hashtags in Twitter over time, and watch how they change in real time!

[문제] 요청한 상위 URL 추적하기14:19
We'll simulate an incoming stream of Apache access logs, and use Spark Streaming to keep track of the most-requested web pages in real time!
[문제] 로그 오류에 대해 경고하기13:09
This example will listen to an Apache access log stream, and raise an alarm if too many errors are returned by the server in real time.
[문제] 스파크 스트리밍과 구조화 질의어 통합하기15:37
We'll integrate Spark Streaming with Spark SQL, allowing us to run SQL queries on data as it is streamed in! Again we will use Apache logs as an example.
구조화된 스트리밍 소개8:27
Spark 2.0 introduced experimental support for Structured Streaming, a new DataSet-based API for Spark Streaming that is bound to become increasingly important. Learn how it works.
[활동] 구조화된 스트리밍으로 아파치 로그 파일 분석11:32
As an example, we'll stream Apache access logs in from a directory, and use Structured Streaming to count up status codes over a one-hour moving window.

아파치 카프카와 통합하기12:20
Apache Kafka is a popular and robust technology for publishing messages across a cluster on a large scale. We'll show how to get Spark Streaming to listen to Kafka topics, and process them in real time.
아파치 플룸과 통합하기8:51
Flume is a popular technology for publishing log information at large scale, especially on a Hadoop cluster. We'll illustrate how to set up both push-based and pull-based Flume configurations with Spark Streaming, and discuss the tradeoffs of each.
아마존 키네시스와 통합하기4:46
Amazon's Kinesis Streaming service is basically Kafka on AWS. If you're working with an AWS/EC2 cluster, you'll want to know how to integrate Spark Streaming with Kinesis - and that's what this lecture covers.
[활동] 사용자 정의 데이터 수신기 작성하기5:53
What if you need to integrate Spark Streaming with some proprietary system that does not have an existing connection library? Well, you can always write your own Receiver class. This example shows you how and actually lets you build and run one.
카산드라와 통합하기7:35
Cassandra is a popular "NoSQL" database that can be used to provide fast access to massive data sets to real-time applications. Dumping data transformed by Spark Streaming into a Cassandra database can expose that data you your larger, real-time services. We'll show you how and actually run a simple example.

[문제] 스파크 스트림의 추적 가능한 정보14:22
Spark has the ability to track arbitrary state across streams of data as they come in, such as web sessions, running totals, etc. This example shows you how it all works, and challenges you to track your own state using our example as a baseline.
[활동] K-방식을 통한 클러스터링 스트리밍하기15:40
Spark Streaming integrates with some of Spark's MLLib (Machine Learning Library) capabilities. This example creates a real-time K-Means clustering example; it does unsupervised machine learning that continually gets better as more training data feeds into it.
[활동] 직선회귀 스트리밍하기12:20
Spark Streaming can also feed data in real-time to linear regression models, that get better over time as more data is fed into them. This example shows linear regression in action with Spark Streaming.

[활동] 실행중인 스파크 코드 패키징 및 운영9:39
Your production applications won't be run from within the Scala IDE; you'll need to run them from a command line, and potentially on a cluster. The spark-submit command is used for this. We'll show you how to package up your application and run it using spark-submit from a command prompt.
[활동] SBT로 코드 패키징하기11:45
If your Spark Streaming application has external library dependencies that might not be already present on every machine in your cluster, the SBT tool can manage those dependencies for you, and package them into the JAR file you run with spark-submit. We'll show you how it works with a real example.
하둡 클러스터에서 EMR을 사용하여 실제로 실행해보기15:48
We'll run our simple word count example on a real cluster, using Amazon's Elastic MapReduce service! This just shows you what's involved in running a Spark Streaming job on a real cluster as opposed to your desktop; there are a few parameters to spark-submit you need to worry about, and getting your scripts and data in the right place is also something you need to deal with.
스파크 작업 문제를 해결하고 조정하기12:35
Spark jobs rarely run perfectly, if at all, on the first try - some tuning and debugging is usually required, and arriving at the right scale of your cluster is also necessary. We'll cover some performance tips, and how to troubleshoot what's going on with a Spark Streaming job running on a cluster.

Requirements

개인용 컴퓨터가 필요합니다(설치 과정 등은 windows OS를 사용하였으며, 리눅스나 MAC OS에서도 설치가 가능합니다)
스파크에 대해 소개하고 있으나, 사전 지식이 필요하지 않습니다
스칼라 프로그래밍 경험은 수강에 도움이 됩니다 (사전 지식이 없어도 관련 강의가 포함되어 있어 수강이 가능합니다)

Description

스파크 스트리밍과 스칼라로 빅데이터 스트리밍!
대량의 데이터 세트를 해결하세요!
실무에 바로 적용할 수 있습니다!

스파크 스트리밍과 스칼라로 빅 데이터 스트리밍하기 (실전편) 강의를 선택해야 하는 이유

현재 IntelliJ 통합개발환경에 맞춰 업데이트 됐습니다!

“빅 데이터” 분석은 인기있고 대단히 가치있는 능력입니다. 중요한 건 “빅 데이터”의 흐름이 멈추지 않는다는 것입니다! 스파크 스트리밍은 대량의 데이터 세트를 생성할 때 처리하기 위한 새롭고 신속하게 개발되는 기술입니다 - 항상 실시간으로 분석 업데이트를 할 수 있는데 밤마다 분석을 해야할까요? 대형 웹사이트의 방문 사이트 동향 데이터, 대규모 “사물 인터넷” 배포의 센서 데이터, 재무 데이터 등 그 어떤 것이든 스파크 스트리밍은 데이터가 생성될 때 항상 데이터를 변환하고 분석할 수 있는 강력한 기술입니다.

여러분은 아마존과 IMDb 선임 매니저와 전 엔지니어분으로부터 해당 내용을 배우게 될 것입니다.

이 코스 과정에서는 실제 라이브 트위터 데이터, 아파치 액세스 로그의 시뮬레이션 동향, 그리고 심지어 머신러닝 모델을 훈련하는 곳에 사용되는 데이터까지 접해볼 수 있습니다! 직접 집에서 컴퓨터로 스파크 스트리밍 작업을 작성하고 실행해 볼 수 있습니다. 그리고 과정이 끝날 때쯤 여러분에게 실제 하둡 클러스터로 이러한 작업을 가져와서 생산 환경에서도 실행하는 방법을 보여줄 것입니다.

이 교육과정은 매우 실용적이고 바로 수행 가능한 활동으로 구성되어 여러분의 교육을 강화하는 데 도움이 됩니다. 강의가 끝날 무렵, 여러분은 스파크 스트리밍 스크립트를 스칼라를 활용하여 자신있게 작성할 줄 알게되며, 완전히 새로운 방식으로 거대한 양의 데이터를 해결하는데 준비가 되어있을 것입니다. 스파크 스트리밍이 이 모든 걸 가능하게 했다는 사실에 매우 놀랄 것입니다!

스파크 스트리밍과 스칼라로 빅 데이터 스트리밍하기 (실전편) 강의에서는 아래의 내용을 배울 수 있습니다:

스칼라 프로그래밍 언어로 된 집중 훈련을 수강하세요
아파치 스파크가 클러스터에서 어떻게 운영되는지 알아보세요
스파크 스트리밍으로 불연속의 스트림을 설정하고 데이터가 수신되면 변환할 수 있습니다
실시간으로 구조화된 스트리밍을 이용하여 데이터 프레임으로 스트리밍합니다
슬라이딩 윈도우에서 시간 경과에 따른 스트리밍 데이터 분석
여러 데이터 스트림 전반에 걸쳐 상태 정보 유지하게 됩니다
카프카, 플룸, 및 키네시스와 같은 확장성이 뛰어난 데이터 소스와 스파크 스트리밍을 연결하는 방법을 습득합니다
카산드라와 같은 구조화 질의어만을 사용하지 않는 데이터베이스에 실시간으로 데이터 스트림을 폐기하는 방법
스트리밍 된 데이터에 실시간으로 구조화 질의어 쿼리를 실행합니다
스트리밍 데이터로 머신러닝 모델을 실시간으로 훈련하고, 이 모델을 사용하여 시간이 지남에 따라 계속 향상되는 예측을 할 수 있습니다
아마존의 빅데이터 프레임워크 실행을 간소화하는 관리형 클러스터 플랫폼을 사용하여 자체적으로 내장된 스파크 스트리밍 코드를 실제 하둡 클러스터에 패키징, 배포 및 실행하는 방법을 배웁니다.

강의를 들으시고 강의와 관련하여 궁금하신 점은 무엇이든 Q&A에 남기실 수 있지만, 꼭 영어로 남겨주세요. 그래야 답변을 드릴 수 있습니다. :)

강의에서 만나요!

Who this course is for:

프로그래밍 사전 지식이나 스크립팅 능력이 있는 학생
“빅 데이터”가 지속적으로 생성되는 회사에서 근무하고 있거나 해당 기업에서 근무하고 싶은 관련 종사자
이전 소프트웨어 공학이나 프로그래밍 경험이 없는 학생들은 먼저 프로그래밍 입문 과정을 수강해야 합니다

【한글자막】 스파크 스트리밍과 Scala 로 빅 데이터 스트리밍하기 (실전편)

What you'll learn

Explore related topics

Course content

시작하기4 lectures • 31min

스칼라 집중 코스4 lectures • 1hr 6min

스파크 스트리밍에 대한 개념7 lectures • 51min

트위터를 이용한 스파크 스트리밍 예제3 lectures • 40min

클릭스트림 / 아파치 액세스 로그 데이터를 사용한 스파크 스트리밍 예제5 lectures • 1hr 3min

다른 시스템과 통합하기5 lectures • 39min

스파크 스트리밍 예제 (고급)3 lectures • 42min

스파크 스트리밍의 운영4 lectures • 50min

여러분이 해냈습니다!3 lectures • 5min

Requirements

Description

Who this course is for: