Sharding and Ultrascale

Packt Publishing
A free video tutorial from Packt Publishing
Tech Knowledge in Motion
4.0 instructor rating • 1570 courses • 297,954 students

Lecture description

Too much data to achieve performance on a single VM.

Learn more from the full course

Learning MongoDB

A comprehensive guide to MongoDB for ultra-fast, fault tolerant management of big data, including advanced data analysis

03:25:42 of on-demand video • Updated April 2015

  • Install MongoDB on Linux and Windows, both manually and using packages
  • Configure MongoDB to autostart and access your data using the command line and GUI clients
  • Learn how to manage databases, including creation, pruning, backup, and recovery to fulfil your big data needs
  • Master how to create map and reduce functions using step-by-step diagrams and examples
  • Understand replica sets, failover verification, responsiveness, and load balancing for large scale applications
  • Discover how redundancy and filesystem choices impact security
  • Delve into advanced topics such as monitoring, automated deployment, sharding, and caching to boost your application
English [Auto] This section looks at several advanced topics and Moga D-B the section begins with a review of charting for large scale applications. The focus then turns to the Monga monitoring service or mass which can be used to deploy monitor upgrade and back up mongered Eby installations. The section finishes with a review of caching techniques to make Mongar DBI even faster and more scalable. This video focuses on charting and scaling in Mongar TV. Shorting is reviewed at a structural level to understand how data is partitioned such as short keys and the deployment implications of shorting such as increased hardware footprint the typical deployment that has been discussed in previous videos has Mangku the processes running on servers that are then assembled into a Reptilicus that client connections interact directly with mongered processes. This scenario is straightforward and provides some load balancing among nodes in the replica set in a shorted scenario. Several aspects of the deployment change. The first and most notable is that the client no longer communicates with the mongered the instances directly instead a new component called Monga as is introduced. The reason for this is that after data has been shorted which is to say partitioned. The client doesn't know on which replicas set a given piece of data is stored. The Monga instance is designed to route queries to the appropriate Magdi processes the Mogo s process will most often run alongside the application or on a very lightweight server. Since all it does is Route Traffic Monga knows how to route queries. Thanks to the config servers these store metadata about Sharda data. It is recommended that at least three config servers be deployed for reliability. Each config server contains the same information. A complete replica set is now required for each shard. This provides for reliability and failover at a data level. The config servers will work to ensure an even distribution of data across shards. Note that if a large volume of queries are made for data that is on a single shard the other Shard is largely idle and doesn't increase throughput. For this reason it's important to choose the right shared key a shared key is a field of a document that will be used to partition the data. For example if a collection held documents representing people and the last name field was chosen as the shared key then Mangu DBI would use the last name field to distribute the data. Keep in mind that if a collection of people documents contains many last names that start with A and C then shirting would not necessarily split in the middle of the alphabet. It would split in the middle of the data. It's very important to choose a shared key that is likely to result in a distribution of queries among shards. This section introduced charting and Mangu DBI charting is how Monga DBI partitions data to distribute work. Data is segmented based on a shared key. The number of servers required to set up a shorted cluster is much higher than what is required for just a replica set. The next video will present shirting example.