Templates by BIGtheme NET

Insert data into Cassandra using hector API

In this Article, I will show How to insert data into Cassandra with hector API.
Simple use case that helps to understands, basic cassandra terminology “Column Family”, “Row Key”.
How these data is stored in data base. How to create Cassandra Data Model for the given use case.
How to write a simple java program that insert data into Cassandra database.

Tools Uses :

1) Apache-cassandra-2.1.6
2) eclipse version Luna 4.4.1.
3) Maven 3.3.3
4) JDK 1.6 or above

The Cassandra Data Model is depends on the search
criteria or How end-user query the data.

Simple Use Case :

The Use case is to searching a property depends on different search criteria
City, State, Owner, Description.

The list of available properties are,

123

Here we will save Property as column family, for understanding we can assume
column family like a table in traditional RDBS.

In any NOSQL like Cassandra all the data is stored as key-value pairs.
Here, keys of individual properties are P-1, P-2, P-3, P-4, P-5, P-6
and P-7 (we call these keys as “Row Key” ), each key map to the value say list
of columns (Country, State, Owner and Description).

1234

Again each column is also a key-value pairs and each key is map with value.
Here keys are City, State, Owner, Description and corresponding mapped values are
C1, S1, O1 and “2BHK, 1200 sq ft”.

This data will be stored as key-value pairs some thing like as given,

ColumnFamily: Property
We'll store all the Property data here.

Row Key => P-1,P-2...P-7 (implies Property id must be unique)
Column Name: An attribute for the entry (Country, State, Owner, Description)
Column Value: Value of the associated attribute

Access: get Property by property_id ( get all columns from a specific Row)

Property : { // CF
	P-1 : { // row key
		Country: C1,
		State: S1,            
		Owner: O1,
		Description: "2BHK, 1200 sq ft"
	},
	P-2 : { // row key            
		Country: C2,
		State: S2,            
		Owner: O2,
		Description: "2BHK, 1000 sq ft"
	},
	P-3 : { // row key
		Country: C3,
		State: S1,            
		Owner: O3,
		Description: "1BHK, 800 sq ft"
	},
	P-4 : { // row key
		Country: C4,
		State: S3,            
		Owner: O4,
		Description: "1BHK, 750 sq ft"
	},
	P-5 : { // row key
		Country: C2,
		State: S4,            
		Owner: O1,
		Description: "3BHK, 1450 sq ft"
	},
	P-6 : { // row key
		Country: C3,
		State: S1,            
		Owner: O5,
		Description: "2BHK, 950 sq ft"
	},
	P-7 : { // row key
		Country: C1,
		State: S2,            
		Owner: O5,
		Description: "3BHK, 1750 sq ft"
	},
	..........
}    

How end-user query the data base :

There are multiple search criteria How end-user will query for the property.
1) Get all of properties in City C1.
2) Get all of properties in City C3 and State S1.

Steps to write Java Program to Insert Data :

1) Create a simple maven project.

2) Add the dependencies

3) Write a simple program to insert data into Cassandra Database.

4) Start the Cassandra server.

5) Run the program and verify the data in cassandra.

Add the given dependency to hector API,


	me.prettyprint
	hector-core
	0.8.0-2

Write a simple program to insert data into Cassandra Database :

This program will do the following,

1) Create a Cluster object.

Cluster cluster = null;
cluster = HFactory.getOrCreateCluster( "Cassandra DB Operations Cluster", "localHost:9160" );

HFactory is the hector convenience class with bunch of static methods.
getOrCreateCluster() is a static method, that tries to create a Cluster instance for an
existing Cassandra cluster.

If another class already called getOrCreateCluster, the factory returns the cached instance.
If the instance doesn’t exist in memory, a new ThriftCluster is created and cached.

This method is expecting two parameters,
(a) clusterName – This should be unique name (we should not have two clusters with same name )
and this name will be used as key to store the cluster object in map of clusters.
(b) hostIp – Using provided hostIp value, internally this method create CassandraHostConfigurator
instance and pass that as second parameter.

2) Create or use existing key space

    Keyspace keySpace = HFactory.createKeyspace("devjavasource", cluster);

Here “devjavasource” is the existed keyspace, I am using the same.
createKeyspace() static method in HFactory class will Creates a Keyspace with the default
consistency level policy (default is – ON_FAIL_TRY_ALL_AVAILABLE).

This consistency level,

What should the client do if a call to cassandra node fails
and we suspect that the node is down.
(e.g. it’s a communication error, not an application error).

There are three different consistency levels,

(a) FAIL_FAST – On communication failure, just return the error to the client and don’t retry.

(b) ON_FAIL_TRY_ONE_NEXT_AVAILABLE – On communication error try one more server before giving up.
Before giving up, cassandra node try one more server node is up to process the client request.

(c) ON_FAIL_TRY_ALL_AVAILABLE – On communication error try all known servers before giving up.
This is the case, If and only if all nodes are down. Then only client get communication failure.
That is why Cassandra is more stable and we can deliver most robust applications.

3) Create a Column Family Definition :

Simple way to do is, Define a column family and add this to cluster.

// Define a Column Family 
ColumnFamilyDefinition cfDefination = HFactory.createColumnFamilyDefinition(
		"devjavasource", "property",
		ComparatorType.UTF8TYPE);
		
// Add a column family in cluster
cluster.addColumnFamily(cfDefination);

Here createColumnFamilyDefinition() is a static method in HFactory class,
with three arguments,but third argument ComparatorType value is optional.

In case, we are not provide third argument. then it use default ComparatorType
type value ComparatorType.BYTESTYPE.

This will Create a column family for a given keyspace.
With the default configuration.

// Default column family defenation
columnMetadata = Collections.emptyList();
columnType = ColumnType.STANDARD;
comparatorType = ComparatorType.BYTESTYPE;
readRepairChance = CFMetaData.DEFAULT_READ_REPAIR_CHANCE;
keyCacheSize = CFMetaData.DEFAULT_KEY_CACHE_SIZE;
keyCacheSavePeriodInSeconds = CFMetaData.DEFAULT_KEY_CACHE_SAVE_PERIOD_IN_SECONDS;
gcGraceSeconds = CFMetaData.DEFAULT_GC_GRACE_SECONDS;
minCompactionThreshold = CFMetaData.DEFAULT_MIN_COMPACTION_THRESHOLD;
maxCompactionThreshold = CFMetaData.DEFAULT_MAX_COMPACTION_THRESHOLD;
memtableFlushAfterMins = CFMetaData.DEFAULT_MEMTABLE_LIFETIME_IN_MINS;
memtableThroughputInMb = CFMetaData.DEFAULT_MEMTABLE_THROUGHPUT_IN_MB;
memtableOperationsInMillions = CFMetaData.DEFAULT_MEMTABLE_OPERATIONS_IN_MILLIONS;
replicateOnWrite = CFMetaData.DEFAULT_REPLICATE_ON_WRITE;

4) Insert key and list of columns into column family.

Syntax is,

Mutator<String> mutator = HFactory.createMutator(keySpace, SE);
mutator.insert("key-name", columnFamilyName, 
		HFactory.createStringColumn("column-name", "column-value"));

This Mutator inserts or deletes values from the cluster.

There are two main ways to use a mutator:

1) Use the insert/delete methods to immediately insert of delete values.

2) Use the addInsertion/addDeletion methods to schedule batch operations
and then execute() all of them in batch. This is for batch update.

But Mutator class is not thread-safe.

Complete Source code is Here,
App.java

package com.devjavasource.cassandra.CassandraDbService;

import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.ComparatorType;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;

public class App {
public static void main(String[] args) {
	Cluster cluster = null;
	Keyspace keySpace = null;
	try {
	cluster = HFactory.getOrCreateCluster(
			"production", "localHost:9160");
	// If the key space is not exist, you have to ctreate one with name
	// "devjavasource"
	keySpace = HFactory.createKeyspace("devjavasource", cluster);

	// Define a Column Family 
	ColumnFamilyDefinition cfDefination = HFactory.createColumnFamilyDefinition(
			"devjavasource", "property",
			ComparatorType.UTF8TYPE);
	// Add a column family in cluster
	cluster.addColumnFamily(cfDefination);

	System.out.println("Insert Data into Cassandra Database with Hector Api ...");
	System.out.println("========================================================");
	//Insert data into Cassandra Database
	final String columnFamilyName = cfDefination.getName();
	Mutator<String> mutator = HFactory.createMutator(keySpace, SE);
	// Insert first property values, with key P-1 -> C1	S1	O1	2BHK, 1200 sq ft
	mutator.insert("P-1", columnFamilyName, HFactory.createStringColumn("City", "C1"));
	mutator.insert("P-1", columnFamilyName, HFactory.createStringColumn("State", "S1"));
	mutator.insert("P-1", columnFamilyName, HFactory.createStringColumn("Owner", "O1"));
	mutator.insert("P-1", columnFamilyName, HFactory.createStringColumn("Description", "2BHK, 1200 sq ft"));
	System.out.println("Propert with key P-1 is inserted...");
	// Insert first property values, with key P-2 -> C2	S2	O2	2BHK, 1000 sq ft
	mutator.insert("P-2", columnFamilyName, HFactory.createStringColumn("City", "C2"));
	mutator.insert("P-2", columnFamilyName, HFactory.createStringColumn("State", "S2"));
	mutator.insert("P-2", columnFamilyName, HFactory.createStringColumn("Owner", "O2"));
	mutator.insert("P-2", columnFamilyName, HFactory.createStringColumn("Description", "2BHK, 1000 sq ft"));
	System.out.println("Propert with key P-2 is inserted...");
	// Insert first property values, with key P-3 -> C3	S1	O3	1BHK, 800 sq ft
	mutator.insert("P-3", columnFamilyName, HFactory.createStringColumn("City", "C3"));
	mutator.insert("P-3", columnFamilyName, HFactory.createStringColumn("State", "S1"));
	mutator.insert("P-3", columnFamilyName, HFactory.createStringColumn("Owner", "O3"));
	mutator.insert("P-3", columnFamilyName, HFactory.createStringColumn("Description", "1BHK, 800 sq ft"));
	System.out.println("Propert with key P-3 is inserted...");
	// Insert first property values, with key P-4 -> C4	S3	O4	1BHK, 750 sq ft
	mutator.insert("P-4", columnFamilyName, HFactory.createStringColumn("City", "C4"));
	mutator.insert("P-4", columnFamilyName, HFactory.createStringColumn("State", "S3"));
	mutator.insert("P-4", columnFamilyName, HFactory.createStringColumn("Owner", "O4"));
	mutator.insert("P-4", columnFamilyName, HFactory.createStringColumn("Description", "1BHK, 750 sq ft"));
	System.out.println("Propert with key P-4 is inserted...");
	// Insert first property values, with key P-5 -> C2	S4	O1	3BHK, 1450 sq ft
	mutator.insert("P-5", columnFamilyName, HFactory.createStringColumn("City", "C3"));
	mutator.insert("P-5", columnFamilyName, HFactory.createStringColumn("State", "S4"));
	mutator.insert("P-5", columnFamilyName, HFactory.createStringColumn("Owner", "O1"));
	mutator.insert("P-5", columnFamilyName, HFactory.createStringColumn("Description", "3BHK, 1450 sq ft"));
	System.out.println("Propert with key P-5 is inserted...");
	// Insert first property values, with key P-6 -> C3	S1	O5	2BHK, 950 sq ft
	mutator.insert("P-6", columnFamilyName, HFactory.createStringColumn("City", "C3"));
	mutator.insert("P-6", columnFamilyName, HFactory.createStringColumn("State", "S1"));
	mutator.insert("P-6", columnFamilyName, HFactory.createStringColumn("Owner", "O5"));
	mutator.insert("P-6", columnFamilyName, HFactory.createStringColumn("Description", "2BHK, 950 sq ft"));
	System.out.println("Propert with key P-6 is inserted...");
	// Insert first property values, with key P-7 -> C1	S2	O5	3BHK, 1750 sq ft
	mutator.insert("P-7", columnFamilyName, HFactory.createStringColumn("City", "C1"));
	mutator.insert("P-7", columnFamilyName, HFactory.createStringColumn("State", "S2"));
	mutator.insert("P-7", columnFamilyName, HFactory.createStringColumn("Owner", "O5"));
	mutator.insert("P-7", columnFamilyName, HFactory.createStringColumn("Description", "3BHK, 1750 sq ft"));
	System.out.println("Propert with key P-7 is inserted...");
	
	System.out.println("All Properties are inserted...");		
	} catch (Exception exp) {
		exp.printStackTrace();
	} finally {
		cluster.getConnectionManager().shutdown();
	}
 }	
	final static StringSerializer SE = StringSerializer.get();
}

Start the Cassandra server :

Cassandra server should be up and running.
If the server is not running, run the server using following command.

Command to start Casandra server is,
C:\apache-cassandra-2.1.6\bin>cassandra.bat -f

Run Maven project :

Select and Run As -> Java Application.

Out Put :

Insert Data into Cassandra Database with Hector Api ...
========================================================
Propert with key P-1 is inserted...
Propert with key P-2 is inserted...
Propert with key P-3 is inserted...
Propert with key P-4 is inserted...
Propert with key P-5 is inserted...
Propert with key P-6 is inserted...
Propert with key P-7 is inserted...
All Properties are inserted...

You can download complete project, Here

CassandraDbService

*** Venkat – Happy leaning ****