Monday, December 23, 2013

Why NOSQL? Ok. But, Why So many?

1. Introduction


NOSQL: Not Only SQL, term generally referred to non SQL centric relational data stores

2. Why NOSQL?

Necessity is the mother of all inventions. A look at what prompted the creation of NOSQL databases.
1.       Exorbitant growth of data:
a.        Large datasets become onerous when stored in relational databases
b.       Query execution time increases creating performance bottlenecks
2.       Data model/structure mismatch: Storing hierarchical/graph/relationship data as rows and columns is highly inefficient,  and so is Storing serialized objects
3.       Introduction of Distributed Caching infrastructure on top of relational data storage for performance and its related consistency problems
4.       Heavy usage of blob storage beats the purpose
5.       Massive Scale out
6.       High Availability: always be able to write with a massive write performance, small continuous volatile reads and write
7.       Need for Faster key value access
8.       Difficulty in handling volatility in schema and data types some relating to change in business and some due to data acquisition
9.       Complexity in Partitioning/Sharding: Done mostly for manageability, performance or availability
10.    Performance in large databases
11.    Too Generic, Need for specialist databases
12.    Cost based optimization though simplified it for the naïve developers, it is unpredictible more so when there is high resource queries being executed concurrently.
13.    Resource contention, Resource concurrency, blocking queries, index updates, concurrent disk issues such as log back ups, check pointing,

Is NOSQL the answer to everything stated above? NO, but certainly helps in resolving a few
What NOSQL promises in short is high performance and flexibility with high availability and scalability

3. Why so Many?


What NOSQL databases doesn’t promise is ACID. NOSQL database implementations vary in confirming to various consistency semantics, most tend to confirm BASE. Let’s look at what they are

ACID
“Atomic: All operations in a transaction succeed or every operation is rolled back.
Consistent: On transaction completion, the database is structurally sound.
Isolated: Transactions do not contend with one another. Contentious access to state is moderated by the database so that transactions appear to run sequentially.
Durable: The results of applying a transaction are permanent, even in the presence of failures - Wikipedia”

BASE
“Basic availability: The store appears to work most of the time.
Soft-state: Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
Eventual consistency: Stores exhibit consistency at some later point (e.g., lazily at read time) – O’Rielly ”

It is important to note that not all NOSQL databases confirm to eventual consistency
Apart from the need for Specialist databases supporting specialised data structures, let’s look at the CAP Theorem

“The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
According to the theorem, a distributed system cannot satisfy all three of these guarantees at the same time”
-          Wikipedia

With drastically different business dynamics, and priorities amongst enterprises, NOSQL databases tend to pick two of the above mentioned characteristics.

Given the need for flexibility in data structure, there are a multitude of NOSQL databases being introduced, see figure below



4. Types of NOSQL databases

1.       Wide Column Store (Column Families): The data model stores columns of data together, instead of rows optimized for queries over large datasets
2.       Document Store: Pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents
3.       Key Value/Tuple Store: Every single item in the database is stored as an attribute name (or "key"), together with its value
4.       Graph Databases: Graph is a set of nodes and the relationships that connect them. Some graph databases use native graph, while some serialize the graph data and store in to relational, object or other data store
5.       Multi Model Databses: Serve multiple data models
6.       Object Databases: Data is persisted in the form of objects
7.       Grid and Cloud Database Solutions: Data persisted across multiple servers that work together to manage information and related operations
8.       XML Databases: Data persisted in XML format
9.       MultiDimensional Databases: type of database that is optimized for data warehouse and online analytical processing (OLAP) applications
10.    Multi Value databases: Data is persisted as keys and multiple values , they have features that support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued
11.    Event Sourcing: Persist application's state by storing the history that determines the current state of the application

5. Key Aspects


1.       NOSQL is not an all in solution, certain scenario mentioned above naturally fits the NOSQL semantics. NOSQL is certainly not a replcement for relational stores
2.       Consider NOSQL for Real time analytics on operational data
3.       Consider NOSQL when there are many systems including streaming data
4.       NoSQL databases provide a linear approach to database scaling, making scaling easier and intuitive
5.       All NOSQL databases are developed to be distributed, scalable databases
6.       Data duplication and denormalization are a norm
7.       Consider NOSQL for hierarchical, Content Caching, distributed file systems, Social Networking, recommendation engine and graph like data
8.       NOSQL databases can support unstructured and unpredictable data
9.       NOSQL databases use a cluster of servers to store data. Data and the operations are usually spread across clusters
10.    Consider NOSQL databases which provide Integrated Caching
11.    NOSQL is developed for continous availability
12.    Certain NOSQL implementations provide configurable consistency models (strong vs eventual), but this will have performance implications
13.    Only a few NOSQL databases support ACID
14.    Only a few NOSQL databases support transactions
15.    Consider NOSQL databases when you have large amounts of data, large enough to not fit in one physical server
16.    Consider NOSQL database when you have a object-relational impedence mismatch
17.    NOSQL databases trade off consistency for efficiency
18.    Consider NOSQL databases when you need schema flexibility
19.    Consider NOSQL database if you are looking for massive write performance
20.    Consider NOSQL database if you are looking for fast key value access
21.    NOSQL provides horizontal scaling  

6. NewSQL


NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still maintaining the ACID guarantees of a traditional database system – Wikipedia”
As we have seen above NOSQL databases have been developed to serve different purposes, with one of the main advantages being scale out. NewSQL is an attempt to provide all the benefits of NOSQL while continuing to support ACID.
Google Spanner is one of the main contenders with a semi-relational data model, while NuoDB achieves it by splitting the transactional (in-memory) and the storage tier accompanied by peer-to-peer coordination.


7. Conclusion


Be it mergers and acquisitions, or change in business dynamics, or the agility in development large enterprises are bound to have hybrid solutions. Having multiple RDBMS’s, data warehouses, data marts in one environment is not unseen or unheard off. It is more than likely for enterprises to add NOSQL/NewSQL databases in to the mix. Be on the lookout for true shared-nothing distributed architectures!

Prashanth B Panduranga (Shan)
Associate Director-Technology  |  725-976-7006  |  pandurangap@aditi.com