BigData World: Know About Cassandra

Cassandra:

This post will give a brief introduction about one of the NoSQl Database Cassandra

Cassandra in 50 Words or Less
“Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable.

Distributed
Cassandra is distributed, which means that it is capable of running on multiple machines while appearing to users as a unified whole.

Decentralized
Cassandra, however, is decentralized, meaning that every node is identical;No node act as master or slave;no Cassandra node performs certain organizing operations distinct from any other node. Instead, Cassandra features a peer-to-peer protocol and uses gossip to maintain and keep in sync a list of nodes that are alive or dead.

The fact that Cassandra is decentralized means that there is no single point of failure. All of the nodes in a Cassandra cluster function exactly the same. This is sometimes referred to as “server symmetry.”

Elastic Scalability ;
Scalability is an architectural feature of a system that can continue serving a greater number of requests with little degradation in performance.

There are 2 types of scaling...
Vertical scaling—simply adding more hardware capacity and memory to your existing machine—is the easiest way to achieve this.

Horizontal scaling means adding more machines that have all or some of the data on them so that no one machine has to bear the entire burden of serving requests.

But then the software itself must have an internal mechanism for keeping its data in sync with the other nodes in the cluster.

Elastic scalability refers to a special property of horizontal scalability. It means that your cluster can seamlessly scale up and scale back down. To do this, the cluster must be able to accept new nodes that can begin participating by getting a copy of some or all of the data and start serving new user requests without major disruption or reconfiguration of the entire cluster. You don’t have to restart your process. You don’t have to change your application queries. You don’t have to manually rebalance the data yourself.
Just add another machine—Cassandra will find it and start sending it work.

Scaling down, of course, means removing some of the processing capacity from your cluster.

High Availability

In general architecture terms, the availability of a system is measured according to its ability to fulfill requests.

Cassandra is highly available. You can replace failed nodes in the cluster with no downtime, and you can replicate data to multiple data centers to offer improved local performance and prevent downtime if one data center experiences a catastrophe such as fire or flood.

The replication factor lets you decide how much you want to pay in performance to gain more consistency. You set the replication factor to the number of nodes in the cluster you want the updates to propagate to (remember that an update means any add, update, or delete operation).

Tuneable Consistency :
Consistency essentially means that a read always returns the most recently written value.
But Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability.

BigData World

Wednesday, 21 August 2013

Know About Cassandra

No comments:

Post a Comment