Learn Apache Solr with Big Data and Cloud Computing

Apache Solr, Zookeeper, Clusters, Replication, Cloud, Big data, Search algorithms and Much More
4.0 (77 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
810 students enrolled
$20
Take This Course
  • Lectures 55
  • Contents Video: 5 hours
    Other: 1 min
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 9/2014 English

Course Description

Solr is the popular, blazing fast open source enterprise search platform from the Apache LuceneTMproject. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

Solr Features

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces - XML, JSON and HTTP
  • Comprehensive HTML Administration Interfaces
  • Server statistics exposed over JMX for monitoring
  • Linearly scalable, auto index replication, auto failover and recovery
  • Near Real-time indexing
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

Solr Uses the LuceneTM Search Library and Extends it!

  • A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
  • Powerful Extensions to the Lucene Query Language
  • Faceted Search and Filtering
  • Geospatial Search with support for multiple points per document and geo polygons
  • Advanced, Configurable Text Analysis
  • Highly Configurable and User Extensible Caching
  • Performance Optimizations
  • External Configuration via XML
  • An AJAX based administration interface
  • Monitorable Logging
  • Fast near real-time incremental indexing and index replication
  • Highly Scalable Distributed search with sharded index across multiple hosts
  • JSON, XML, CSV/delimited-text, and binary update formats
  • Easy ways to pull in data from databases and XML files from local disk and HTTP sources
  • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
  • Apache UIMA integration for configurable metadata extraction
  • Multiple search indices

Detailed Features

Schema

  • Defines the field types and fields of documents
  • Can drive more intelligent processing
  • Declarative Lucene Analyzer specification
  • Dynamic Fields enables on-the-fly addition of new fields
  • CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field
  • Explicit types eliminates the need for guessing types of fields
  • External file-based configuration of stopword lists, synonym lists, and protected word lists
  • Many additional text analysis components including word splitting, regex and sounds-like filters
  • Pluggable similarity model per field

Query

  • HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, binary)
  • Sort by any number of fields, and by complex functions of numeric fields
  • Advanced DisMax query parser for high relevancy results from user-entered queries
  • Highlighted context snippets
  • Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
  • Multi-Select Faceting by tagging and selectively excluding filters
  • Spelling suggestions for user queries
  • More Like This suggestions for given document
  • Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.
  • Range filter over Function Query results
  • Date Math - specify dates relative to "NOW" in queries and updates
  • Dynamic search results clustering using Carrot2
  • Numeric field statistics such as min, max, average, standard deviation
  • Combine queries derived from different syntaxes
  • Auto-suggest functionality for completing user queries
  • Allow configuration of top results for a query, overriding normal scoring and sorting
  • Simple join capability between two document types
  • Performance Optimizations

Core

  • Dynamically create and delete document collections without restarting
  • Pluggable query handlers and extensible XML data format
  • Pluggable user functions for Function Query
  • Customizable component based request handler with distributed search support
  • Document uniqueness enforcement based on unique key field
  • Duplicate document detection, including fuzzy near duplicates
  • Custom index processing chains, allowing document manipulation before indexing
  • User configurable commands triggered on index changes
  • Ability to control where docs with the sort field missing will be placed
  • "Luke" request handler for corpus information

Caching

  • Configurable Query Result, Filter, and Document cache instances
  • Pluggable Cache implementations, including a lock free, high concurrency implementation
  • Cache warming in background
  • When a new searcher is opened, configurable searches are run against it in order to warm it up to avoid slow first hits. During warming, the current searcher handles live requests.
  • Autowarming in background
  • The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabling high cache hit rates across index/searcher changes.
  • Fast/small filter implementation
  • User level caching with autowarming support

SolrCloud

  • Centralized Apache ZooKeeper based configuration
  • Automated distributed indexing/sharding - send documents to any node and it will be forwarded to correct shard
  • Near Real-Time indexing with immediate push-based replication (also support for slower pull-based replication)
  • Transaction log ensures no updates are lost even if the documents are not yet indexed to disk
  • Automated query failover, index leader election and recovery in case of failure
  • No single point of failure

Admin Interface

  • Comprehensive statistics on cache utilization, updates, and queries
  • Interactive schema browser that includes index statistics
  • Replication monitoring
  • SolrCloud dashboard with graphical cluster node status
  • Full logging control
  • Text analysis debugger, showing result of every stage in an analyzer
  • Web Query Interface w/ debugging output
  • Parsed query output
  • Lucene explain() document score detailing
  • Explain score for documents outside of the requested range to debug why a given document wasn't ranked higher.

What are the requirements?

  • Internet
  • OS X, Windows or Linux

What am I going to get from this course?

  • Integrate Search functionality into any web or mobile app
  • Understand Cloud
  • Solve Search problem of big data
  • You can build your own search engine

What is the target audience?

  • Developers
  • Engineers
  • Data Scientists

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Introduction
Introduction
Preview
06:29
“SOLR” Pronunciation
Preview
00:36
Section 2: Big Data Fundamentals
What is Big Data
Preview
03:10
What Big Data problems Apache Solr solves?
07:06
Section 3: Cloud Computing Fundamentals
What is Cloud Computing?
02:14
How does Solr fit into Cloud?
01:48
Section 4: Fundamentals of Solr
Apache Solr Architecture
04:02
Downloading and Installing Solr
04:21
Solr basic Files
02:20
Basic solr concepts
02:41
Starting up Solr
02:13
HTTP Requests and Responses with Solr
01:36
Solr Admin UI
05:21
Section 5: Search Algorithms
Inverted Index
06:10
Forward Index
02:51
Section 6: Creating a Core
Creating a Core via Admin Panel
03:42
Understanding Structure of Schema.xml
04:37
Define fieldType
08:08
Define field
03:48
Field properties
17:37
copyfield
02:15
dynamicfield
04:58
unique fields
06:43
docvalues vs fieldcache
07:21
Analyzers, Tokenizers and Filters
09:48
Character Filters
02:18
Section 7: Indexing Documents
Adding documents
06:54
Commit and Optimize
05:55
06:31

commit=true : Lock the indexs and delete the document now.

commit=false : Put the delete operation in queue. Delete the document when solr is free from requests or during reload/optimize of index.

Updating document Values
08:09
Section 8: Querying Documents
Search Fundamentals
05:42
05:59

I made a mistake in the video.

Replace timeAllowed:0 with timeAllowed=0

Replace colon with equals to sign.

Understanding search components and request handlers in solrconfig.xml
06:05
q Parameter in depth
11:45
Range searching
01:31
Function Queries
03:43
Faceting
06:19
Hignlighting
05:46
Spell Checking
19:42
05:19

Google auto suggestion is based on the user queries i.e., queries are stored and suggestion is based on those stored queries.

In solr suggestion is based on a particular field value of documents.

06:34

If you want to make this feature faster then make sure you make termVectors=true for the comparing fields.

Result grouping
03:49
Spatial search, terms component, stats component and query elevation component
06:33
Section 9: Modifying schema
06:51

Altering schema.xml:

  1. Whenever you remove, change or add new fields or fieldtypes in schema.xml make sure you reload the schema.xml file.
  2. fieldtype is just used to validate the data format during indexing(insertion) and querying(retrieving). In filesystem everything is stored as strings.

  3. Terms in the index are all represented as strings.

  4. If you modify fieldtype of a field then error might occur if solr cannot interpret the old data format(stored valued) during querying.

  5. If you change stored attribute from false to true then new documents will be stored, no way to retrieve raw field values of the old documents.

  6. If you change index attribute from false to true then new documents will form a index for that field and these newly indexed terms will be visible during querying.

  7. If you are making lot of changes or your solr querying and indexing doesn't seem to work well after altering the schema.xml file then create a new core and reindex all documents from the old core.

Section 10: Miscellaneous
Solr Logging
03:20
Solr Security
07:26
Section 11: Clustering and Replication
SolrCloud Concepts
Article
Clustering
05:29
Replication
02:27
Section 12: ZooKeeper
Understanding need of Zookeeper
02:53
Setting up ZooKeeper
22:24
Adding More Configs and Collections
00:57
Section 13: SolrCloud
Setting Up Solr Cloud
11:43
Section 14: Final Thoughts
Conclusion
01:10
Exercise Files
1 page

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

QScutter Tutorials, a place to learn technology

QScutter is a Indian based company that offers an ever growing range of high quality eLearning solutions that teach using studio quality narrated videos backed-up with practical hands-on examples. The emphasis is on teaching real life skills that are essential in today's commercial environment. We provide tutorials for almost all IT topics.

Ready to start learning?
Take This Course