Skip to main content

Mongo Learning Series 6

Week 6: Application Engineering


Mongo Application Engineering
1.       Durability of Writes
2.       Availability / Fault Tolerance
3.       Scaling

WriteConcern


Traditionally when we insert/update records that operation is performed as a fire and forget, Mongo Shell however wants to know if the operation is successful and hence calls getLastError every single time.
There are couple of arguments for (getLastError) with which the operations can be perfomed
W: 1  - - wait for a write acknowledgement. Still not durable, if the changes were made in memory returns true. Not necessarily after it is written to disk. If the system fails before writing to disk the data will be lost.
J:1  --  journal. Return only acknowledgement on disk write and is guaranteed. The operation can be replayed if lost.


Api.mongodb.org



Network Errors

                Although w=1, j =1 is set there are other factors which might not save the state complete. Lets say you did an insert, that insert was done using a connection which had j=1, w=1. The driver issues a get last error. The write did get complete, but unfortunately before it completed, the network connection got reset. In that case, you will not know if the write completed or not. Because you did not get an acknowledgement that it completed.

Replication:
ReplicaSets: Replica sets are the set of mongo nodes. All nodes act together and mirror each other. One primary and multiple secondary. Data written to primary is asynchronously replicated. The decision of which is primary is dynamic. The application and its drivers always connects to the primary. If the primary goes down, then the secondary performs a election on which one needs to be a primary and there should be a strict majority.



The minimum number of nodes to form a replica set is 3.
Types of Replica Sets:
1.       Regular
2.       Arbiter  (Voting)
3.       Delayed / Regular (Disaster recovery node – It cannot be a primary node)
4.       Hidden (Often used for Analytics, cannot be a primary node)
MongoDB does not offer eventual consistency by default.
It offers write consistency. As in the primary configuration for the MongoDB is to write and the read from the primary. If we change the read from secondary there might be some discrepancies.
Failover usually about 3 seconds
rs.slaveOk()
rs.isMaster()
seedlist
rs.stepDown()
w:’majority’
rs.status()
rs.conf()
rs.help()

Read Preference: the default read is from the primary, but when you have lot of nodes and if you want to configure to read from secondary as well you set the read preference. The read preferences are set on the drivers (Pymongo has 4, there are others in other drivers)
List of Read preferences allowed:
1.       Primary
2.       Secondary
3.       Primary Preferred
4.       Secondary preferred
5.       Nearest
6.       Tagged

Sharding


There can be more than one mongos
The shard can be arranged as rangebased
The data is identified by the shard key


Shard help
Sh.help()

Implications of sharding on development
1.       Every document includes the Shard key
2.       Shard key is immutable, which means that it cannot be changed so need to be careful
3.       Index that starts with the Shard Key
4.       When you do an update Shard key has to be specified or set multi to true
a.       When multi it is going to send the updates to all of the nodes
5.       No shard key means send to all nodes => scatter gather
6.       No unique key unless part of the shard key

Choosing a shard key
1.       Sufficient cardinality
2.       Hot spotting : monotonically increasing
Import
mongoimport --db dbName --collection collectionName --file fileName.json
doc=db.thinks.findOne();
for (key in doc) print(key);




Week 7: Case Studies


Jon Hoffman from Foursquare

Scala, MongoDB
5 million check-ins a day
Over 2.5 billion
AWS is used as a Application Server
The Database is hosted on own racks, SSD based
Migrated from AWS due to some performance issues, which were in the past. AWS has fixed those with the SSD offering

Ryan Bubinski from Codecademy

Ruby for server side
Javascript for client side and some server side
API in Ruby
App layer in Ruby and Javascript
All client side is javascript
Mongoid ODM (Object document mapper)
Rails for application layer
Rack api
nginx 
10Gen MMS
Cookiebased session storage
Redis session store (inmemory session store – key value based)
Millions of submisssions
The submissions vary from 100 of kilo bytes to MBs
1st gen O(I million)  order of magnitude of 1 million
                Hosted service
2nd Gen O(10 million)
Ec2
Quad extra large memory instances
EBS
4X large memory
Provisioned IOPS
Replica sets
Single primary
2 secondary
Writes to primary
Reads from secondary
To handle horizontal scale on the read load and use one machine to handle the write load
Sharded temporarily:
2 shards with replica sets
                3rd gen O(100+ millions)
                                S3 backed answer storage
                                Used S3 as a key value store
                writeConcern
                                For all writes which involves a confirmation or user acknowledgement use safe mode
                                For logging and other event based writes disable safe mode
                Rsync for replication
                Heroku
                Application layer and API layer handles both reads and writes are hosted on Heroku
                Heroku are AWS backed
                Both Codeacademy and Heroku (AWS) are hosted in the same availability zone

Please Note : This is a series of 6 
Reference: All the material credit goes to the course hosted by Mongo
Now feels good to have the course certificate:


Comments

Popular posts from this blog

Mongo Learning Series 1

Mongo Learning First of all, I want to thank and congratulate the MongoDB team for hosting such a wonderful introductory interactive course.  Good job guys. For those interested here is the url https://education.mongodb.com/ It is a 7 week course. The syllabus follows: Week 1: Introduction Introduction & Overview - Overview, Design Goals, the Mongo Shell, JSON Intro, installing tools, overview of blog project. Bottle, Pymongo Week 2: CRUD CRUD (Creating, Reading and Updating Data) - Mongo shell, query operators, update operators and a few commands Week 3: Schema Design Schema Design - Patterns, case studies and tradeoffs Week 4: Performance Using indexes, monitoring and understanding performance. Performance in sharded environments. Week 5: Aggregation Framework Goals, the use of the pipeline...

TechSharp [T#] Going beyond Architecture Center of Excellence

Consulting Services companies goes through multitude of challenges in its Sales cycle, Delivery Cycle and over all Competency building and maintaining cycle. In this 2 part blog, I write about the various issues, Well whats the point in discussing problems with out a solution, Worry Not, The blog culminates with a tried and tested solution. Tried Architecture as Shared Services? Felt like Abstracting the best of the resources, while encapsulating them well within at the same time? Tried creating COE’s?  Have the management shot back stating it is overused/abused concept, tried and failed? Yes there are lot of reasons to fail when NOT done right. This blog entry documents the RIGHT way, tried and tested Recursively. Introduction What is TechSharp? Why do we need it? Is it an Architecture center of excellence? What is its significance? What issues does it resolve or even prevent from happening in the first place? Who benefits from it? What are the levels of Archi...
Continued . . Part 1 How it all fits together [Solution] Technology positioning system’s [TPS] output forms the key to the organizations focus, TechSharp will work on the trending Languages, Frameworks, Techniques, Tools, and Platforms across domains in the IT industry, and identify the right ones for the organization to adopt. This data will be used to propose and implement solution for new projects by the delivery teams. Sales force will use this data to sell the organizations capabilities. TPS will also align with the sales pipeline outliers, and form a combined Technology positioning for the organization. Imagine the possibilities of leveraging this data.. The Recruitment team can use this data to plan their hiring. They leverage TechSharp to put together the JD’s for the technologies. The LEAD team has a clear technology training focus. Delivery team can plan bench or recommend up skilling of existing str...