1. Introduction
NOSQL: Not Only SQL, term generally
referred to non SQL centric relational data stores
2. Why NOSQL?
Necessity is the mother of all inventions. A
look at what prompted the creation of NOSQL databases.
1.
Exorbitant growth of data:
a.
Large datasets become onerous
when stored in relational databases
b.
Query execution time increases
creating performance bottlenecks
2.
Data model/structure mismatch:
Storing hierarchical/graph/relationship data as rows and columns is highly
inefficient, and so is Storing
serialized objects
3.
Introduction of Distributed
Caching infrastructure on top of relational data storage for performance and
its related consistency problems
4.
Heavy usage of blob storage beats
the purpose
5.
Massive Scale out
6.
High Availability: always be
able to write with a massive write performance, small continuous volatile reads
and write
7.
Need for Faster key value
access
8.
Difficulty in handling
volatility in schema and data types some relating to change in business and
some due to data acquisition
9.
Complexity in Partitioning/Sharding:
Done mostly for manageability, performance or availability
10.
Performance in large databases
11.
Too Generic, Need for
specialist databases
12.
Cost
based optimization though simplified it for the naïve developers, it is
unpredictible more so when there is high resource queries being executed
concurrently.
13.
Resource
contention, Resource concurrency, blocking queries, index updates, concurrent
disk issues such as log back ups, check pointing,
Is NOSQL the answer to everything stated above?
NO, but certainly helps in resolving a few
What NOSQL promises in short is high
performance and flexibility with high availability and scalability
3. Why so Many?
What NOSQL databases doesn’t promise is ACID.
NOSQL database implementations vary in confirming to various consistency semantics,
most tend to confirm BASE. Let’s look at what they are
ACID
“Atomic:
All operations in a transaction succeed or every operation is rolled back.
Consistent:
On transaction completion, the database is structurally sound.
Isolated:
Transactions do not contend with one another. Contentious access to state is
moderated by the database so that transactions appear to run sequentially.
Durable:
The results of applying a transaction are permanent, even in the presence of
failures - Wikipedia”
BASE
“Basic
availability: The store appears to work most of the time.
Soft-state:
Stores don’t have to be write-consistent, nor do different replicas have to be
mutually consistent all the time.
Eventual
consistency: Stores exhibit consistency at some later point (e.g., lazily at
read time) – O’Rielly ”
It is important to note that not all NOSQL
databases confirm to eventual consistency
Apart from the need for Specialist
databases supporting specialised data structures, let’s look at the CAP Theorem
“The
CAP theorem, also known as Brewer's theorem, states that it is impossible for a
distributed computer system to simultaneously provide all three of the
following guarantees
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary
message loss or failure of part of the system)
According
to the theorem, a distributed system cannot
satisfy all three of these guarantees at the same time”
-
Wikipedia
With drastically different business
dynamics, and priorities amongst enterprises, NOSQL databases tend to pick two
of the above mentioned characteristics.
Given the need for flexibility in data
structure, there are a multitude of NOSQL databases being introduced, see
figure below
Data Reference: http://nosql-database.org/
4. Types of NOSQL databases
1.
Wide
Column Store (Column Families): The data model stores columns of data together,
instead of rows optimized for queries over large datasets
2.
Document
Store: Pair each key with a complex data structure known as a document.
Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents
3.
Key
Value/Tuple Store: Every single item in the database is stored as an attribute
name (or "key"), together with its value
4.
Graph
Databases: Graph is a set of nodes and the relationships that connect them. Some
graph databases use native graph, while some serialize the graph data and store
in to relational, object or other data store
5.
Multi
Model Databses: Serve multiple data models
6.
Object
Databases: Data is persisted in the form of objects
7.
Grid
and Cloud Database Solutions: Data persisted across multiple servers that work
together to manage information and related operations
8.
XML
Databases: Data persisted in XML format
9.
MultiDimensional
Databases: type of database that is optimized for data warehouse and online
analytical processing (OLAP) applications
10.
Multi
Value databases: Data is persisted as keys and multiple values , they have
features that support and encourage the use of attributes which can take a list
of values, rather than all attributes being single-valued
11.
Event
Sourcing: Persist application's state by storing the history that determines
the current state of the application
5. Key Aspects
1.
NOSQL is not an all in
solution, certain scenario mentioned above naturally fits the NOSQL semantics. NOSQL is certainly not a replcement for relational
stores
2.
Consider NOSQL for Real time
analytics on operational data
3.
Consider NOSQL when there are
many systems including streaming data
4.
NoSQL
databases provide a linear approach to database scaling, making scaling easier
and intuitive
5.
All
NOSQL databases are developed to be distributed, scalable databases
6.
Data
duplication and denormalization are a norm
7.
Consider
NOSQL for hierarchical, Content Caching, distributed file systems, Social
Networking, recommendation engine and graph like data
8.
NOSQL
databases can support unstructured and unpredictable data
9.
NOSQL databases
use a cluster of servers to store data. Data and the operations are usually
spread across clusters
10.
Consider
NOSQL databases which provide Integrated Caching
11.
NOSQL
is developed for continous availability
12.
Certain
NOSQL implementations provide configurable consistency models (strong vs
eventual), but this will have performance implications
13.
Only a
few NOSQL databases support ACID
14.
Only a
few NOSQL databases support transactions
15.
Consider
NOSQL databases when you have large amounts of data, large enough to not fit in
one physical server
16.
Consider
NOSQL database when you have a object-relational impedence mismatch
17.
NOSQL
databases trade off consistency for efficiency
18.
Consider
NOSQL databases when you need schema flexibility
19.
Consider
NOSQL database if you are looking for massive write performance
20.
Consider
NOSQL database if you are looking for fast key value access
21.
NOSQL
provides horizontal scaling
6. NewSQL
“NewSQL
is a class of modern relational database management systems that seek to
provide the same scalable performance of NoSQL systems for online transaction
processing (read-write) workloads while still maintaining the ACID guarantees
of a traditional database system – Wikipedia”
As we have seen above NOSQL databases have
been developed to serve different purposes, with one of the main advantages
being scale out. NewSQL is an attempt to provide all the benefits of NOSQL
while continuing to support ACID.
Google Spanner is one of the main
contenders with a semi-relational data model, while NuoDB achieves it by
splitting the transactional (in-memory) and the storage tier accompanied by
peer-to-peer coordination.
7. Conclusion
Be it mergers and acquisitions, or change
in business dynamics, or the agility in development large enterprises are bound
to have hybrid solutions. Having multiple RDBMS’s, data warehouses, data marts
in one environment is not unseen or unheard off. It is more than likely for
enterprises to add NOSQL/NewSQL databases in to the mix. Be on the lookout for
true shared-nothing distributed architectures!
Prashanth B Panduranga (Shan)
|
|