• Get In Touch
January 28, 2016

How-to Install Apache Cassandra on Ubuntu 14.04

Want your very own server? Get our 1GB memory, Xeon V4, 25GB SSD VPS for £10.00 / month.
Get a Cloud Server

Overview

Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. You can deploy a Cassandra cluster on a datacenter or span across multiple datacenters.

Apache Cassandra was initially developed on Facebook by Avinash Lakshman and Prashant Malik to power the Facebook Inbox Search feature. Lakhsman was also one of the author of Amazon’s Dynamo paper.

Cassandra was released as an open source project on July 2008 in Google Code. On March 2009 it became Apache an Incubator project and graduated as an Apache top level project in February 17, 2010.

The Cassandra name was taken from the Greek mythological prophet.

The current licese of Apache Cassandra is Apache License version 2.0

Several key features that Cassandra has :

  • Massively scalable architecture – a masterless design where all nodes are the same, which provides operational simplicity and easy scale-out.

  • Active everywhere design – all nodes may be written to and read from.

  • Linear scale performance – the ability to add nodes without going down produces predictable increases in performance.

  • Continuous availability – offers redundancy of both data and node function, which eliminate single points of failure and provide constant uptime.

  • Transparent fault detection and recovery – nodes that fail can easily be restored or replaced.

  • Flexible and dynamic data model – supports modern data types with fast writes and reads.

  • Strong data protection – a commit log design ensures no data loss and built in security with backup/restore keeps data protected and safe.

  • Tunable data consistency – support for strong or eventual data consistency across a widely distributed cluster.

  • Multi-data center replication – cross data center (in multiple geographies) and multi-cloud availability zone support for writes/reads.

  • Data compression – data compressed up to 80% without performance overhead.

  • CQL (Cassandra Query Language) – an SQL-like language that makes moving from a relational database very easy.

You can see the references section at the end of this article to learn more about the basics and the detail of Cassandra.

Update Base System

Before we install any prerequisites and Cassandra, let’s update our system to latest update available. You can run the command below to get the latest updates.

$ sudo apt-get update
$ sudo apt-get upgrade

Install JDK 8

Apache Cassandra is run on top of Java Virtual Machine (JVM). We’ll install Oracle JDK 8 on the system before we install Apache Cassandra. Apache Cassandra can also run on OpenJDK, IBM JVM and Azul Zing JVM.

We will install Oracle JDK using the Webupd8 team team PPA repository.

First step, add the webupd8team ppa repository :

$ sudo add-apt-repository ppa:webupd8team/java
...
Press [ENTER] to continue or ctrl-c to cancel adding it

...
OK

You need to press enter to continue adding the webupd8team PPA repository. The output has been truncated above to show you only the most important part.
Let apt-get download and read the metadata of webupd8 repository:

$ sudo apt-get update

Install JDK 8.

$ sudo apt-get -y install oracle-java8-installer

The -y option above will make you agree automatically with the packages to be installed including dependencies. If you want to check what packages will be installed you can remove the -y option above.

Package configuration. Choose OK

Accepting Oracle Binary Code Lisence Terms. Choose Yes

After installing Java 8, you can check the current java version by running command below :

$ java -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

Install Apache Cassandra

Since the system is already at the latest update and we already installed Oracle JDK 8 on the system, we can start installing Apache Cassandra now.

First of all, let’s add the DataStax repository key.

$ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -

Add DataStax Cassandra repository to a new apt source list.

$ echo "deb http://debian.datastax.com/community stable main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

Make apt-get read the metadata of Cassandra repository.

$ sudo apt-get update

Now let’s install Cassandra 2.2. This is the latest stable version of Cassandra at the time of writing. You can check Planet Cassandra Download Page for information on the latest stable relase of Apache Cassandra.

$ sudo apt-get install dsc22=2.2.3-1 cassandra=2.2.3

This is optional, we’ll install Cassandra utilities.

$ sudo apt-get install cassandra-tools=2.2.3

You can check the Cassandra service using the command below

$ sudo service cassandra status
 * could not access pidfile for Cassandra

The information above is actually false. The Cassandra process is running but it report that it could not access Cassandra pidfile. This is due to bug on the Cassandra init script. We’ll fix this in the next section.

We can also check cluster status using nodetool command :

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns    Host ID                               Rack
UN  127.0.0.1  179.29 KB  256          ?       7cd1bdc4-8bfa-49d9-a453-e0cf83bf956f  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Let’s try connecting to Cassandra server using cqlsh. You can use the command below

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> quit

We will not do anything now, so just type quit on the cqlsh shell.

Fixing Service Status

The init script on Ubuntu is broken when being use to check the service status. You can check by running command below :

$ sudo service cassandra status
 * could not access pidfile for Cassandra

This bug is already acknowledged by the Cassandra team. You can see the CASSANDRA-9822 issue or Issue 63 on GitHub.
The fix is pretty simple. Open /etc/init.d/cassandra. You need to use sudo to edit this file. Find line below (should be on line 60) :

CMD_PATT="cassandra.+CassandraDaemon"

change it to:

CMD_PATT="cassandra"

After changing this line, if you check the service status it should return correct status info.

$ sudo service cassandra status
 * Cassandra is running

Configure the Apache Cassandra Cluster Name

The default cluster name for Cassandra is Test Cluster. In this section, we’ll change cluster name to something else. First of all we need to stop Cassandra and delete all the data.

$ sudo service cassandra stop
$ sudo rm -rf /var/lib/cassandra/data/system/*

Edit cassandra configuration. open /etc/cassandra/cassandra.yaml. Find line below :

cluster_name: 'Test Cluster'

Change it to your cluster name that you want to use. In this example we change the cluster name to HostPresto Cluster

cluster_name: 'HostPresto Cluster'

After changing Cassandra cluster name let’s start Cassandra and check the service status

$ sudo service cassandra start
$ sudo service cassandra status
 * Cassandra is running

To make sure the cluster name is already changed, let’s connect again using cqlsh :

$ cqlsh
Connected to HostPresto Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> 

We see above that cqlsqh is already connected to HostPresto Cluster

Using Apache Cassandra

Let’s try our Cassandra installation by creating a movie database. First of all let’s create a keyspace, this is a namespace for tables. The keyspace name below is **moviedb***.

cqlsh> CREATE KEYSPACE moviedb
   ... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Use the keyspace that we just created.

cqlsh> use moviedb;

Create the movies table.

cqlsh:moviedb> CREATE TABLE movies (
``` language-bash
           ...  id int PRIMARY KEY,
           ...  title text,
           ...  year text
           ... );
</code></pre>

<pre><code><br />Let's describe the table that we just created :
</code></pre>

cqlsh:moviedb> DESC movies;

CREATE TABLE moviedb.movies (

<pre><code class="language-bash">    id int PRIMARY KEY,
    title text,
    year text
</code></pre>

) WITH bloom_filter_fp_chance = 0.01

<pre><code class="language-bash">    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
</code></pre>

CREATE INDEX movies_title_idx ON moviedb.movies (title);

<pre><code><br />The table is ready, now time to add some data.
</code></pre>

cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (1, 'Birdman','2014');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (2, '12 Years a Slave','2013');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (3, 'Argo','2012');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (4, 'The Artist','2011');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (5, 'The King''s Speech','2010');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (6, 'The Hurt Locker','2009');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (7, 'Slumdog Millionaire','2008');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (8, 'No Country for Old Men','2007');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (9, 'The Departed','2006');
cqlsh:moviedb> INSERT INTO movies (id,title,year) VALUES (10, 'Crash','2005');

<pre><code>Let's see everything that we added to the table using ```SELECT```.

cqlsh:moviedb> SELECT * from movies;

id | title | year
—-+————————+——
5 | The King’s Speech | 2010
10 | Crash | 2005
1 | Birdman | 2014
8 | No Country for Old Men | 2007
2 | 12 Years a Slave | 2013
4 | The Artist | 2011
7 | Slumdog Millionaire | 2008
6 | The Hurt Locker | 2009
9 | The Departed | 2006
3 | Argo | 2012

(10 rows)


We need to create an index for title column so we can search based on title.

cqlsh:moviedb> CREATE INDEX on movies (title);
cqlsh:moviedb> SELECT * FROM movies WHERE title = ‘Argo’;

id | title | year
—-+——-+——
3 | Argo | 2012

(1 rows)
cqlsh:moviedb>
“`
That’s it, the basic usage of Apache Cassandra.

Try Cassandra Online

If you want to try Cassandra first without installing Cassandra on your computer or server you can use Try Cassandra.

References

Summary

In this tutorial we learned how-to install Apache Cassandra from the Datastax repository. Do some basic configuration and also the basic usage of Apache Cassandra. Now you can start exploring this single node installation of Apache Cassandra. Have Fun!

Want your very own server? Get our 1GB memory, Xeon V4, 25GB SSD VPS for £10.00 / month.
Get a Cloud Server

Share this Article!

Related Posts

Node.js Authentication – A Complete Guide with Passport and JWT

Node.js Authentication – A Complete Guide with Passport and JWT

Truth be told, it’s difficult for a web application that doesn’t have some kind of identification, even if you don’t see it as a security measure in and of itself. The Internet is a kind of lawless land, and even on free services like Google’s, authentication ensures that abuses will be avoided or at least […]

Node.js and MongoDB: How to Connect MongoDB With Node

Node.js and MongoDB: How to Connect MongoDB With Node

MongoDB is a document-oriented NoSQL database, which was born in 2007 in California as a service to be used within a larger project, but which soon became an independent and open-source product. It stores documents in JSON, a format based on JavaScript and simpler than XML, but still with good expressiveness. It is the dominant […]

Using MySQL with Node.js: A Complete Tutorial

Using MySQL with Node.js: A Complete Tutorial

Although data persistence is almost always a fundamental element of applications, Node.js has no native integration with databases. Everything is delegated to third-party libraries to be included manually, in addition to the standard APIs. Although MongoDB and other non-relational databases are the most common choice with Node because if you need to scale an application, […]

Node.Js Vs Django: Which Is the Best for Your Project

Node.Js Vs Django: Which Is the Best for Your Project

Django and NodeJs are two powerful technologies for web development, both have great functionality, versatile applications, and a great user interface. Both are open source and can be used for free. But which one fits your project best? NodeJs is based on JavaScript, while Django is written in Python. These are two equally popular technologies […]

Nodejs Vs PHP:  Which Works Best?

Nodejs Vs PHP: Which Works Best?

Before getting into the “battle” between Node.js and PHP we need to understand why the issue is still ongoing. It all started with the increased demand for smartphone applications, their success forcing developers to adapt to new back-end technologies that could handle a multitude of simultaneous requests. JavaScript has always been identified as a client-side […]