MongoDB Sharding

?
R
Bash

Motivation: Read performance can be affected when running a single instance with millions of records We shard a collection, not a database. ShardKey is the fragmentation key. Do not shard all collections Once you enable sharding, collection will be splitted amongst the shards. Once you enable sharding, to change you need to dump the collections and restore the collections later

1Along with sharding, rebalancing, auto-compression, aggregate-level distributed lock, and many such features, MongoDB has come miles ahead is today the software architect’s first choice.
2
3## Rules
40. Do not cause an imbalance in the system at any time
51. Choose a key with high cardinality (ex. email address, address line 1). 
61.1 Bad example: country, firstName
71.2 Bad example: Do not use the default monotonically increasing keys _id. this is a rookie mistake.
82. Go for hashed sharding key scheme => creates a sharding key by hashing one of the provided fields
9db.collection.createIndex( { _id: hashedValue } )
103. Run shard balancer often
11```
12 use config 
13db.settings.update( 
14   { _id: "balancer" }, 
15   { $set: { activeWindow : { start : "03:00", stop : "05:00" } } }, 
16   { upsert: true } 
17)
18```
19
20## Performance Gains
21Shard with hashed key of 32 digit of GUID.
22
23I tested in our environnement of “perf” insertion of 1000000 documents per bulk continuously:
24One shard with replica set took ~ 3 minutes per bulk
25Two shard with replica set took ~ 1 minutes 40 secondes per bulk
26
27After 18000000 documents inserted i noticed degradation of performance :
28One shard with replica set took ~ 5 minutes per bulk
29Two shard with replica set took ~ 2 minutes 40 secondes per bulk
30
31We can conclude that with two shard we multiplied performance of insertion by 2.
32
33------
34
35MONGODB Sharding (shard is a fragment)
36  ------  ------  ------  ------  ------  ------  ------  ------  ------  ------
37  * Shard Cluster 1: Replicate set: 3 mongodb instances (1 Primary, 2 Secondary)
38  * Shard Cluster 2: Replicate set: 3 mongodb instances (1 Primary, 2 Secondary)
39  * Config Servers: Replicate set: 3 mongodb instances (1 Primary, 2 Secondary) **metadata (which shard contains which data)
40  * MongoS Router (reads info from config servers, caches locally and routes traffic to the intended shard)
41  
42  mongo mongodb://192.168.1.81:60000
43  show dbs
44  sh.status()
45  
46  use sharddemo
47  db //sharddemo
48  show collections
49  db.createCollection("movies")
50  db.createCollection("movies1")
51  show collections
52  db.movies.getShardDistribution() // not sharded
53  
54  sh.shardCollection("sharddemo.movies", { "title": "hashed" })
55  
56  // sharding not enabled 
57  // primary shard (collections without sharding)
58  sh.enableSharding("sharddemo")
59  
60  sh.shardCollection("sharddemo.movies", { "title": "hashed" })
61  
62  db.movies.getShardDistribution()
63  for i in {1..50} do echo -e "use shardddemo \n db.movies.insertOne({"title": "Spider Man $i", "language": "English"}) | mongo mongodb//192.0.0.0:3000; done" 
64  
65  use sharddemo
66  db.movies.find()
67  db.movies.count()
68  db.movies.getShardDistribution()
69  
70  // 23 docs to shard1 and 23 docs to shard 3
71  
72  
73  // not sharded
74  db.movies1.find();
75  for i in {1..50} do echo -e "use shardddemo \n db.movies1.insertOne({"title": "Spider Man $i", "language": "English"}) | mongo mongodb//192.0.0.0:3000; done" 
76  mongos
77  db.movies1.getShardDistribution();
78  
79  sh.shardCollection("sharddemo.movies1", { "title": "hashed" })
80  // Error, data must be indexed
81  
82  db.movies.createIndex({"title": "hashed"})
83  sh.shardCollection("sharddemo.movies1", { "title": "hashed" })
84  db.movies1.getShardDistribution();
85  
86  // Moved all the data to 1 RS shard (primary one)
87
88
89
90
91--
92MongoDB Sharding
93
94When to shard?
95If you have more data than one machine can hold on its drives
96If your application is write heavy and you are experiencing too much latency.
97 If your working set outgrows the memory you can allocate to a single machine.
98
99Examples
100Documents older than one year need to be kept, but are rarely used:
101Shardkey - timestamp
1021 year ago records on a slower machine
103You are required to keep certain data in its home country:
104Shardkey - country
105Maintain data centers within each country that house the appropriate shards
106You have customers who want to pay for a “premium” tier:
107Shardkey - isPremium
108Premium users  use high performance servers
109

Created on 6/23/2020