Week 6: Application Engineering
Mongo Application Engineering
1.
Durability of Writes
2.
Availability / Fault Tolerance
3.
Scaling
WriteConcern
Traditionally when we insert/update records that operation
is performed as a fire and forget, Mongo Shell however wants to know if the
operation is successful and hence calls getLastError every single time.
There are couple of arguments for (getLastError) with which
the operations can be perfomed
W: 1 - - wait for a
write acknowledgement. Still not durable, if the changes were made in memory
returns true. Not necessarily after it is written to disk. If the system fails
before writing to disk the data will be lost.
J:1 -- journal. Return only acknowledgement on disk
write and is guaranteed. The operation can be replayed if lost.
Api.mongodb.org
Network Errors
Although
w=1, j =1 is set there are other factors which might not save the state complete.
Lets say you did an insert, that insert was done using a connection which had
j=1, w=1. The driver issues a get last error. The write did get complete, but
unfortunately before it completed, the network connection got reset. In that
case, you will not know if the write completed or not. Because you did not get
an acknowledgement that it completed.
Replication:
ReplicaSets: Replica sets are the set of mongo nodes. All
nodes act together and mirror each other. One primary and multiple secondary.
Data written to primary is asynchronously replicated. The decision of which is
primary is dynamic. The application and its drivers always connects to the
primary. If the primary goes down, then the secondary performs a election on
which one needs to be a primary and there should be a strict majority.
The minimum number of nodes to form a replica set is 3.
Types of Replica Sets:
1.
Regular
2.
Arbiter
(Voting)
3.
Delayed / Regular (Disaster recovery node – It
cannot be a primary node)
4.
Hidden (Often used for Analytics, cannot be a
primary node)
MongoDB does not offer eventual consistency by default.
It offers write consistency. As in the primary configuration
for the MongoDB is to write and the read from the primary. If we change the
read from secondary there might be some discrepancies.
Failover usually about 3 seconds
rs.slaveOk()
rs.isMaster()
seedlist
rs.stepDown()
w:’majority’
rs.status()
rs.conf()
rs.help()
Read Preference: the default read is from the primary, but
when you have lot of nodes and if you want to configure to read from secondary
as well you set the read preference. The read preferences are set on the drivers
(Pymongo has 4, there are others in other drivers)
List of Read preferences allowed:
1.
Primary
2.
Secondary
3.
Primary Preferred
4.
Secondary preferred
5.
Nearest
6.
Tagged
Sharding
There can be more than one mongos
The shard can be arranged as rangebased
The data is identified by the shard key
Shard help
Sh.help()
Implications of sharding on development
1.
Every document includes the Shard key
2.
Shard key is immutable, which means that it
cannot be changed so need to be careful
3.
Index that starts with the Shard Key
4.
When you do an update Shard key has to be
specified or set multi to true
a.
When multi it is going to send the updates to
all of the nodes
5.
No shard key means send to all nodes =>
scatter gather
6.
No unique key unless part of the shard key
Choosing a shard key
1.
Sufficient cardinality
2.
Hot spotting : monotonically increasing
Import
mongoimport --db dbName --collection collectionName --file
fileName.json
doc=db.thinks.findOne();
for (key in doc) print(key);
Week 7: Case Studies
Jon Hoffman from Foursquare
Scala, MongoDB
5 million check-ins a day
Over 2.5 billion
AWS is used as a Application Server
The Database is hosted on own
racks, SSD based
Migrated from AWS due to some
performance issues, which were in the past. AWS has fixed those with the SSD
offering
Ryan Bubinski from Codecademy
Ruby for server side
Javascript for client side and some
server side
API in Ruby
App layer in Ruby and Javascript
All client side is javascript
Mongoid ODM (Object document
mapper)
Rails for application layer
Rack api
nginx
10Gen MMS
Cookiebased session storage
Redis session store (inmemory
session store – key value based)
Millions of submisssions
The submissions vary from 100 of
kilo bytes to MBs
1st gen O(I
million) order of magnitude of 1 million
Hosted
service
2nd Gen O(10 million)
Ec2
Quad extra large
memory instances
EBS
4X large memory
Provisioned IOPS
Replica sets
Single primary
2 secondary
Writes to primary
Reads from
secondary
To handle
horizontal scale on the read load and use one machine to handle the write load
Sharded
temporarily:
2 shards with
replica sets
3rd
gen O(100+ millions)
S3
backed answer storage
Used
S3 as a key value store
writeConcern
For
all writes which involves a confirmation or user acknowledgement use safe mode
For
logging and other event based writes disable safe mode
Rsync
for replication
Heroku
Application
layer and API layer handles both reads and writes are hosted on Heroku
Heroku
are AWS backed
Both
Codeacademy and Heroku (AWS) are hosted in the same availability zone
Please Note : This is a series of 6
Reference: All the material credit goes to the course hosted by Mongo
Now feels good to have the course certificate:
No comments:
Post a Comment