Program 1:
Implement the following Data structures in java
a)lists
b)Stacks c)Queues
sol:
a) List
(i) ArrayList
import java.util.ArrayList; import
java.util.Collections; import java.util.Iterator;
public class ArrayListExample {
public static
void main(String[] args) {
// Creating
an ArrayList
ArrayList<String> list = new ArrayList<>();
// Adding elements list.add("Apple");
list.add("Banana");
list.add("Cherry");
list.add("Mango");
System.out.println("Initial List: " + list);
// Accessing an element
System.out.println("Element at index 2: " +
list.get(2));
// Updating an element list.set(1,
"Blueberry");
System.out.println("After updating index 1: " +
list);
// Removing an element
list.remove("Mango");
System.out.println("After removing
'Mango': " +
list);
// Checking
if an element exists
System.out.println("Contains 'Apple'?
" + list.contains("Apple"));
// Sorting the list Collections.sort(list);
System.out.println("Sorted List: " +
list);
// Iterating using for-each loop System.out.println("Iterating using for-each loop:"); for (String item : list) {
System.out.println(item);
}
// Iterating using Iterator System.out.println("Iterating using
Iterator:"); Iterator<String> iterator = list.iterator(); while (iterator.hasNext()) {
System.out.println(iterator.next());
}
// Getting size of the list System.out.println("Size of list: " + list.size());
// Clearing the list
list.clear();
System.out.println("After clearing:
" + list);
}
}
Output:
Initial List: [Apple,
Banana, Cherry, Mango] Element at index 2: Cherry
After updating index 1: [Apple,
Blueberry, Cherry, Mango]
After removing 'Mango':
[Apple, Blueberry, Cherry] Contains 'Apple'? true
Sorted List: [Apple,
Blueberry, Cherry] Iterating
using for-each loop:
Apple Blueberry Cherry
Iterating using Iterator:
Apple Blueberry Cherry
Size of list: 3 After clearing: []
(ii) LINKED
LIST
import java.util.LinkedList; public
class LinkedListExample {
public static void main(String[] args) {
System.out.println("\nLinkedList Example:"); LinkedList<String> list = new
LinkedList<>();
// Adding elements list.add("A");
list.add("B");
list.add("C");
list.add("D");
System.out.println("Initial LinkedList: " + list);
// Adding
elements at first and last
positions
list.addFirst("Start");
list.addLast("End");
System.out.println("After adding at first
and last: "
+ list);
// Accessing elements
System.out.println("First Element: " + list.getFirst());
System.out.println("Last Element: " + list.getLast());
// Removing
elements
System.out.println("Removed First: " + list.removeFirst());
System.out.println("Removed Last: " + list.removeLast());
System.out.println("LinkedList after removals: " + list);
// Checking if an element exists
System.out.println("Contains 'B'? " + list.contains("B"));
// Getting
size
System.out.println("Size of LinkedList: "
+ list.size());
// Iterating through the LinkedList
System.out.println("Iterating through LinkedList:"); for (String item :
list) {
System.out.println(item);
}
// Clearing the LinkedList
list.clear();
System.out.println("LinkedList after clearing:
" + list);
}
}
Output:
LinkedList Example:
Initial LinkedList: [A, B, C, D]
After adding at first and last: [Start,
A, B, C, D, End] First Element: Start
Last Element: End Removed First: Start Removed Last: End
LinkedList after removals:
[A, B, C, D] Contains 'B'?
true
Size of LinkedList: 4 Iterating through LinkedList:
A B C D
LinkedList after clearing: []
(iii) VECTOR
import java.util.Vector; public class VectorExample {
public static void main(String[] args) {
System.out.println("\nVector Example:"); Vector<Integer> vector
= new Vector<>();
// Adding elements vector.add(10); vector.add(20); vector.add(30); vector.add(40);
System.out.println("Initial Vector:
" + vector);
// Adding at a specific
index vector.add(1, 15);
System.out.println("After adding 15 at index
1: " + vector);
// Replacing an element
vector.set(2, 25);
System.out.println("After updating
index 2: " + vector);
// Removing elements
System.out.println("Removed
Element: " + vector.remove(0));
System.out.println("Vector after removals: " + vector);
// Checking if an element exists
System.out.println("Contains 20? " + vector.contains(20));
// Getting an element
System.out.println("Element at index 1: " +
vector.get(1));
// Getting size and capacity System.out.println("Size: " +
vector.size()); System.out.println("Capacity: " + vector.capacity());
// Iterating through the Vector System.out.println("Iterating
through Vector:"); for (Integer num : vector) {
System.out.println(num);
}
// Clearing the Vector
vector.clear();
System.out.println("Vector after clearing: "
+ vector);
}
}
Output:
Vector Example:
Initial Vector:
[10, 20, 30,
40]
After adding 15 at index 1: [10, 15, 20,
30, 40]
After updating index 2: [10, 15, 25, 30, 40]
Removed Element:
10
Vector after
removals: [15, 25, 30, 40]
Contains 20?
false
Element at index 1: 25
Size: 4
Capacity: 10
Iterating through
Vector:
15
25
30
40
Vector after
clearing: []
b) STACK
import java.util.Stack; public class StackExample {
public static void main(String[] args) {
// Creating
a Stack
Stack<String> stack =
new Stack<>();
// PUSH operation (Adding elements) stack.push("Apple");
stack.push("Banana"); stack.push("Cherry");
System.out.println("Stack after push: " + stack);
// PEEK operation (View top element) System.out.println("Top element
(peek): " + stack.peek());
// POP operation (Removing top element)
System.out.println("Popped element: "
+ stack.pop());
System.out.println("Stack after pop: " + stack);
// SEARCH
operation (Find position
of element)
int position = stack.search("Apple"); // Returns 1 (position from top)
System.out.println("Position of 'Apple': " + position);
// CHECK if Stack
is empty
System.out.println("Is stack empty? "
+ stack.isEmpty());
}
}
Output:
Popped element:
Cherry
Stack after pop: [Apple, Banana] Position of 'Apple': 2
Is stack empty?
False
c)
QUEUE
(i) PRIORITY QUEUE
import java.util.PriorityQueue; public class PriorityQueueExample {
public static void main(String[] args) {
System.out.println("\nPriorityQueue Example:");
PriorityQueue<Integer> pq = new PriorityQueue<>();
// Adding elements pq.add(30);
pq.add(10);
pq.add(20);
pq.add(40);
System.out.println("Initial PriorityQueue: " + pq);
// Accessing the head element
System.out.println("Peek (Head Element): " +
pq.peek());
// Removing elements
System.out.println("Poll (Removing Head): " + pq.poll());
System.out.println("PriorityQueue after poll: " + pq);
// Checking if an element exists System.out.println("Contains 20? " + pq.contains(20));
// Getting size
System.out.println("Size of PriorityQueue: " +
pq.size());
// Iterating through the PriorityQueue
System.out.println("Iterating through PriorityQueue:"); for (Integer num :
pq) {
System.out.println(num);
}
// Clearing the PriorityQueue
pq.clear();
System.out.println("PriorityQueue after clearing: " + pq);
}
}
Output:
PriorityQueue Example:
Initial PriorityQueue: [10, 30, 20, 40]
Peek (Head Element): 10
Poll (Removing Head): 10 PriorityQueue after poll: [20, 30, 40]
Contains 20? true
Size of PriorityQueue: 3 Iterating
through PriorityQueue:
20
30
40
PriorityQueue after clearing: []
(ii) DEQUE
import java.util.Deque; import java.util.ArrayDeque; public class
DequeExample {
public static void main(String[] args) {
System.out.println("\nDeque Example:"); Deque<String> deque = new ArrayDeque<>();
deque.add("A");
deque.addFirst("Start");
deque.addLast("End"); deque.add("B");
System.out.println("Deque: " + deque);
System.out.println("First Element: " + deque.getFirst());
System.out.println("Last Element: " + deque.getLast());
System.out.println("Removed First: " + deque.removeFirst());
System.out.println("Removed Last: " + deque.removeLast());
System.out.println("Deque after removals: " + deque);
System.out.println("Contains
'A'? " + deque.contains("A"));
System.out.println("Size: " + deque.size());
for (String
item : deque)
{
System.out.println(item);
}
deque.clear();
System.out.println("Deque after clearing: " + deque);
}
}
Output:
Deque Example:
Deque: [Start, A, End, B] First Element: Start
Last Element: B Removed First: Start Removed Last: B
Deque after removals:
[A, End] Contains 'A'? true
Size: 2 A
End
Deque after
clearing: []
(iii) ArrayDeque
import java.util.ArrayDeque; public
class ArrayDequeExample {
public static
void main(String[] args) {
// Creating
an ArrayDeque
ArrayDeque<String> deque =
new ArrayDeque<>();
// Adding elements
at the end deque.add("Apple");
deque.add("Banana"); deque.add("Cherry");
// Adding elements
at the front deque.addFirst("Mango");
deque.addLast("Orange");
// Printing the deque
System.out.println("Deque after
additions: " +
deque);
// Removing elements deque.removeFirst();
// Removes "Mango" deque.removeLast(); // Removes "Orange"
// Printing the deque after removals System.out.println("Deque after
removals: " + deque);
// Accessing elements
System.out.println("First element: " + deque.getFirst());
System.out.println("Last element: " + deque.getLast());
}
}
Output:
Deque after additions: [Mango, Apple, Banana,
Cherry, Orange] Deque after
removals: [Apple, Banana, Cherry]
First element: Apple Last element:
Cherry
Program2:
Implement the following data structures in java
a)Map
b) Set sol:
a)
Map
(i) HashMap
import java.util.HashMap; public
class HashMapExample {
public static void main(String[] args) {
System.out.println("\nHashMap Example:"); HashMap<Integer, String> map = new HashMap<>();
map.put(1, "Apple");
map.put(2, "Banana");
map.put(3, "Cherry");
map.put(4, "Date");
System.out.println("Initial HashMap: " + map);
System.out.println("Get key 2: " + map.get(2));
map.remove(3);
System.out.println("After removing key 3: " + map);
System.out.println("Contains key 1? " + map.containsKey(1));
System.out.println("Contains value 'Banana'?
" + map.containsValue("Banana"));
System.out.println("Size: " + map.size());
System.out.println("Iterating
through HashMap:"); for
(var entry : map.entrySet()) {
System.out.println("Key: "
+ entry.getKey() + ", Value:
" + entry.getValue());
}
map.clear();
System.out.println("HashMap after clearing:
" + map);
}
}
Output:
HashMap Example:
Initial HashMap: {1=Apple,
2=Banana, 3=Cherry, 4=Date} Get key 2: Banana
After removing key 3: {1=Apple, 2=Banana, 4=Date}
Contains key 1? true
Contains value 'Banana'? true Size: 3
Iterating through
HashMap:
Key: 1, Value: Apple Key: 2, Value:
Banana Key: 4, Value: Date
HashMap after clearing: {}
(ii) LinkedHashMap
import java.util.LinkedHashMap; public class LinkedHashMapExample {
public static void main(String[] args) {
System.out.println("\nLinkedHashMap Example:");
LinkedHashMap<String, Integer> map = new LinkedHashMap<>();
map.put("One", 1);
map.put("Two", 2);
map.put("Three", 3);
map.put("Four", 4);
System.out.println("Initial LinkedHashMap: " + map);
System.out.println("Get value for 'Two': "
+ map.get("Two")); map.remove("Three");
System.out.println("After removing 'Three': " + map);
System.out.println("Contains key 'One'?
" + map.containsKey("One"));
System.out.println("Contains value 4? " + map.containsValue(4));
System.out.println("Size: " + map.size());
System.out.println("Iterating
through LinkedHashMap:");
for (var entry : map.entrySet()) {
System.out.println("Key: " + entry.getKey() + ", Value:
" + entry.getValue());
}
map.clear();
System.out.println("LinkedHashMap after clearing: "
+ map);
}
}
Output:
LinkedHashMap Example:
Initial LinkedHashMap: {One=1,
Two=2, Three=3, Four=4} Get value for 'Two': 2
After removing 'Three':
{One=1, Two=2, Four=4} Contains key 'One'? true
Contains value 4? true
Size: 3
Iterating through LinkedHashMap:
Key: One, Value: 1
Key: Two, Value: 2 Key: Four,
Value: 4
LinkedHashMap after
clearing: {}
(iii) TreeMap
import java.util.TreeMap; public
class TreeMapExample {
public static void main(String[] args) {
System.out.println("\nTreeMap Example:"); TreeMap<Integer, String> map = new TreeMap<>();
map.put(5, "Eagle");
map.put(1, "Apple");
map.put(3, "Cherry");
map.put(2, "Banana"); System.out.println("Initial TreeMap:
" + map);
System.out.println("Get value for key 2: " + map.get(2));
map.remove(3);
System.out.println("After removing key 3: " + map);
System.out.println("Contains key 1? " + map.containsKey(1));
System.out.println("Contains value 'Eagle'?
" + map.containsValue("Eagle"));
System.out.println("Size: " + map.size());
System.out.println("Iterating
through TreeMap:"); for
(var entry : map.entrySet()) {
System.out.println("Key: " + entry.getKey() + ", Value:
" + entry.getValue());
}
System.out.println("First Key: " +
map.firstKey());
System.out.println("Last Key: " + map.lastKey()); map.clear();
System.out.println("TreeMap after
clearing: " + map);
}
}
Output:
TreeMap Example:
Initial TreeMap: {1=Apple, 2=Banana, 3=Cherry, 5=Eagle} Get value for key 2: Banana
After removing key 3: {1=Apple, 2=Banana, 5=Eagle}
Contains key 1? true
Contains value 'Eagle'?
true Size: 3
Iterating through
TreeMap:
Key: 1, Value: Apple Key: 2, Value:
Banana Key: 5, Value: Eagle First Key: 1
Last Key: 5
TreeMap after clearing: {}
b) Set
SET
(i) HashSet
import java.util.HashSet; public
class HashSetExample {
public static
void main(String[] args) {
System.out.println("\nHashSet
Example:"); HashSet<String> set = new HashSet<>(); set.add("Apple");
set.add("Banana");
set.add("Cherry");
set.add("Date");
System.out.println("Initial HashSet: "
+ set); set.remove("Cherry");
System.out.println("After removing 'Cherry': " + set);
System.out.println("Contains 'Apple'? "
+ set.contains("Apple"));
System.out.println("Size: " + set.size());
System.out.println("Iterating through HashSet:");
for (String item : set) { System.out.println(item);
}
set.clear();
System.out.println("HashSet after
clearing: " + set);
}
}
Output:
HashSet Example:
Initial HashSet: [Apple, Cherry, Date, Banana] After removing
'Cherry': [Apple, Date, Banana]
Contains 'Apple'? true
Size: 3
Iterating through HashSet: Apple
Date Banana
HashSet after clearing:
[]
(ii) LinkedHashSet
import java.util.LinkedHashSet; public class LinkedHashSetExample {
public static void main(String[] args) {
System.out.println("\nLinkedHashSet Example:");
LinkedHashSet<Integer> set = new LinkedHashSet<>(); set.add(10);
set.add(20);
set.add(30);
set.add(40);
System.out.println("Initial LinkedHashSet: "
+ set); set.remove(30);
System.out.println("After removing 30: " + set);
System.out.println("Contains 20? " + set.contains(20));
System.out.println("Size: " + set.size());
System.out.println("Iterating through LinkedHashSet:"); for (Integer num :
set) {
System.out.println(num);
}
set.clear();
System.out.println("LinkedHashSet after clearing: "
+ set);
}
}
Output:
LinkedHashSet Example:
Initial LinkedHashSet: [10, 20, 30, 40]
After removing 30: [10, 20, 40]
Contains 20?
true
Size: 3
Iterating through
LinkedHashSet:
10
20
40
LinkedHashSet after clearing: []
(iii) TreeSet
import java.util.TreeSet;
public
class TreeSetExample {
public static
void main(String[] args) {
// Creating
a TreeSet
TreeSet<Integer> treeSet
= new TreeSet<>();
// Adding elements
to the TreeSet treeSet.add(20);
treeSet.add(10);
treeSet.add(40); treeSet.add(30); treeSet.add(50);
// Printing TreeSet (It will be sorted)
System.out.println("TreeSet: " + treeSet);
// Removing an element
treeSet.remove(30);
System.out.println("After removing
30: " +
treeSet);
// Checking
if an element exists
System.out.println("Does TreeSet
contain 20? " + treeSet.contains(20));
// Retrieving first and last elements System.out.println("First element:
" + treeSet.first());
System.out.println("Last element: " + treeSet.last());
// Getting subset (headSet, tailSet, subSet)
System.out.println("Elements less than 40: "
+ treeSet.headSet(40));
System.out.println("Elements
greater than or equal to 20: "
+ treeSet.tailSet(20));
System.out.println("Elements between 10 and 40: " +
treeSet.subSet(10, 40));
// Checking size of the TreeSet System.out.println("Size of TreeSet: "
+ treeSet.size());
// Clearing the TreeSet
treeSet.clear();
System.out.println("After clearing, is empty? " + treeSet.isEmpty());
}
}
Output:
TreeSet: [10, 20, 30, 40,
50]
After removing 30: [10, 20, 40, 50] Does TreeSet contain 20? true First element: 10
Last element:
50
Elements less than 40: [10, 20]
Elements greater
than or equal to 20: [20, 40, 50]
Elements between 10 and 40: [10, 20] Size of TreeSet: 4
After clearing, is empty? true
(iv) SortedSet
import java.util.SortedSet;
import java.util.TreeSet;
public class SortedSetExample {
public static void main(String[] args) {
System.out.println("\nSortedSet Example:");
SortedSet<Integer> set = new TreeSet<>(); set.add(50);
set.add(10);
set.add(40);
set.add(20);
set.add(30);
System.out.println("Initial SortedSet: "
+ set); set.remove(30);
System.out.println("After removing 30: "
+ set);
System.out.println("First Element: " + set.first());
System.out.println("Last Element: " + set.last());
System.out.println("Contains 20? "
+ set.contains(20));
System.out.println("Size: " + set.size());
System.out.println("Iterating through SortedSet:");
for (Integer num : set) { System.out.println(num);
}
set.clear();
System.out.println("SortedSet after clearing: "
+ set);
}
}
Output:
SortedSet Example:
Initial SortedSet: [10, 20, 30, 40,
50]
After removing 30: [10, 20, 40, 50]
First Element: 10
Last Element:
50
Contains 20?
true
Size: 4
Iterating through SortedSet: 10
20
40
50
SortedSet after
clearing: []
Program 3:
Implement the following file management tasks in Hadoop:
·
Adding
files and directories
·
Retrieving files
·
Deleting files
Create
a Directory in HDFS:
hdfs
dfs -mkdir /user/gowthu/data
(Creates a directory named
data under /user/gowthu/)
Upload (Add) a File to
HDFS:
hdfs
dfs -put localfile.txt /user/gowthu/data/
(Uploads localfile.txt from the local system to HDFS /user/gowthu/data/)
Copy a File from Local to
HDFS:
hdfs
dfs -copyFromLocal example.txt /user/gowthu/data/
(Copies example.txt from the local system
to /user/gowthu/data/ in HDFS)
List Files in HDFS:
hdfs
dfs -ls /user/gowthu/data/
(Lists
all files inside
/user/gowthu/data/)
Retrieve a File from HDFS to Local:
hdfs
dfs -get /user/gowthu/data/example.txt
(Downloads example.txt from HDFS to the current local
directory)
Copy a File from HDFS to Local:
hdfs dfs -copyToLocal /user/gowthu/data/example.txt
/home/hasan/ (Copies example.txt
from HDFS to /home/hasan/ on the local system) Delete a File in HDFS:
hdfs dfs -rm /user/gowthu/data/in1.txt
(Deletes in1.txt from HDFS)
Delete a Directory in HDFS:
hdfs
dfs -rm -r /user/gowthu/data/
(Recursively deletes /user/gowthu/data/ and all its files)
Program 4:
Run
a basic Word Count Map Reduce program
to understand Map Reduce
Paradigm.
import java.io.IOException; import java.util.StringTokenizer;
import
org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public
class WordCount {
public
static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public
void map(Object key, Text value,
Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while
(itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public
static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new
IntWritable();
public void reduce(Text key, Iterable<IntWritable>
values, Context
context
) throws IOException, InterruptedException { int sum = 0;
for (IntWritable val : values)
{ sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static
void main(String[] args) throws Exception { Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
a) Open Terminal
§ Run the following
commands step by step.
b) Check Current Directory
§ ls
§ pwd
c) Create an Input
File
§ cat > /home/cloudera/processfile1.txt
§ Enter
some text:
(Example:Hadoop is good for Big Data Hadoop is not for Small Data
It is a Java-based framework)
d) Upload Input File to HDFS
§ hdfs
dfs -mkdir /inputfolder1
§ hdfs
dfs -put /home/cloudera/processfile1.txt /inputfolder1/
e) Verify Input File in
HDFS
§ hdfs
dfs -cat /inputfolder1/processfile1.txt
f) Run the MapReduce
Job
§ hadoop
jar /home/cloudera/wordCount.jar WordCountDriver
/inputfolder1/processfile1.txt /output1
g) Check Output Directory in HDFS
§ hdfs
dfs -ls /output1
h) View Final Word Count Output
§ hdfs
dfs -cat /output1/part-r-00000
i) Cross-check with Original
File
|
§ cat /home/cloudera/processfile1.txt |
Output: |
|
Big |
1 |
Data |
2 |
Hadoop |
2 |
It |
1 |
Java-based 1
Small 1
a 1
for 2
Program 5:
Run
Pig then write Pig Latin scripts to sort, group, join, project,
and filter your data.
$pig
grunt> titanic
= LOAD 'titanic_sample.csv' USING PigStorage(',')
AS
(PassengerId:int, Survived:int, Pclass:int, Name:chararray, Sex:chararray, Age:int, SibSp:int, Parch:int,
Ticket:chararray, Fare:float,
Cabin:chararray, Embarked:chararray);
grunt> DUMP
titanic;
output:
(1,0,3,Braund,male,22,1,0,A/5 21171,7.25,S)
(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C)
(3,1,3,Heikkinen,female,26,0,0,STON/O2.
3101282,7.925,S) (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S)
(5,0,3,Allen,male,35,0,0,373450,8.05,S)
Sort Passengers by Age:
grunt> sorted_data = ORDER titanic BY Age ASC; grunt>DUMP sorted_data;
output:
(1,0,3,Braund,male,22,1,0,A/5 21171,7.25,S)
(3,1,3,Heikkinen,female,26,0,0,STON/O2.
3101282,7.925,S) (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S)
(5,0,3,Allen,male,35,0,0,373450,8.05,S)
(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C)
Group Passengers by Survival
Status:
grunt> grouped_data = GROUP titanic BY Survived;
grunt>DUMP grouped_data;
output:
(0,{(1,0,3,Braund,male,22,1,0,A/5 21171,7.25,S), (5,0,3,Allen,male,35,0,0,373450,8.05,,S)})
(1,{(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C),
(3,1,3,Heikkinen,female,26,0,0,STON/O2.
3101282,7.925,S), (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S)})
Project (Select) Only Specific Columns:
grunt> projected_data = FOREACH
titanic GENERATE PassengerId, Name, Age;
grunt>DUMP projected_data;
output:
(1,Braund,22)
(2,Cumings,38) (3,Heikkinen,26) (4,Futrelle,35)
(5,Allen,35)
Filter Female Passengers Below Age 30:
grunt> filtered_data = FILTER
titanic BY Sex == 'female'
AND Age < 30;
grunt>DUMP filtered_data;
output:
(3,1,3,Heikkinen,female,26,0,0,STON/O2. 3101282,7.925,S)
Join Titanic Data with Ticket Prices:
Sample
Ticket Price Dataset
(ticket_prices.csv)
Ticket,Price
PC 17599,71.28
113803,53.1
STON/O2.
3101282,7.92
grunt> ticket_data = LOAD 'ticket_prices.csv' USING PigStorage(',')
AS (Ticket:chararray, Price:float);
grunt>joined_data = JOIN titanic
BY Ticket, ticket_data BY Ticket;
grunt>DUMP joined_data;
output:
(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C,PC 17599,71.28)
(3,1,3,Heikkinen,female,26,0,0,STON/O2.
3101282,7.925,S,STON/O2. 3101282,7.92)
(4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S,113803,53.1)
Clean Data:
grunt> cleaned_data = FILTER
titanic BY Age IS NOT NULL AND Fare IS NOT NULL;
grunt> cleaned_data = FOREACH
cleaned_data GENERATE PassengerId, Survived, Pclass,
LOWER(Sex) AS Sex, Age, Fare, Embarked;
grunt>DUMP cleaned_data;
output:
(1,0,3,male,22,7.25,S)
(2,1,1,female,38,71.2833,C)
(3,1,3,female,26,7.925,S)
(4,1,1,female,35,53.1,S)
(5,0,3,male,35,8.05,S)
Normalize Fare (Scale between
0-1):
grunt> fare_stats = FOREACH (GROUP
cleaned_data ALL) GENERATE MIN(cleaned_data.Fare) AS
min_fare,
MAX(cleaned_data.Fare) AS max_fare;
grunt> normalized_data = FOREACH cleaned_data GENERATE PassengerId, Survived, Pclass, Sex, Age,
(Fare - fare_stats.min_fare) / (fare_stats.max_fare - fare_stats.min_fare) AS NormalizedFare,
Embarked;
grunt> dump normalized_data;
output:
(1,0,3,male,22,0.0,S)
(2,1,1,female,38,1.0,C)
(3,1,3,female,26,0.0112,S)
(4,1,1,female,35,0.774,S)
(5,0,3,male,35,0.0133,S)
Load data into new file:
grunt> STORE normalized_data INTO 'output/normalized_titanic' USING PigStorage(',');
grunt> exit;
$ hdfs dfs -cat output/normalized_titanic
output:
1,0,3,male,22,0.0,S
2,1,1,female,38,1.0,C
3,1,3,female,26,0.0112,S
4,1,1,female,35,0.774,S
5,0,3,male,35,0.0133,S
Program 6:
Run Hive then use Hive to create, alter, and drop databases, tables,
views, functions, and Indexes.
1
Create database:
hive> create database csea; hive> create database cseb; hive>use csea
2 Show database:
hive> show databases ;
Output:
OK
csea cseb default
Time taken: 0.985
seconds, Fetched: 3 row(s)
4 Alter database:
hive> alter database csea set DBPROPERTIES ('creator'='abc');
Output:
OK
Time
taken: 0.196 seconds
5 Drop database:
hive> DROP DATABASE csea;
Output:
OK
Time
taken: 0.196 seconds
6 Create Index:
CREATE TABLE orders ( order_id INT, customer_id INT, product
STRING, category STRING,
price
DOUBLE,
order_date STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE INDEX category_index ON TABLE orders (category)
AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndex' WITH DEFERRED REBUILD;
Output:
OK
Time
taken: 0.167 seconds
7 Altering Index:
ALTER
INDEX category_index ON orders REBUILD; Output:
OK
Time
taken: 0.167 seconds
8 Drop Index:
DROP
INDEX category_index ON orders;
Output:
OK
Time
taken: 0.161 seconds
9 Create table:
hive> use csea;
hive(csea)> create table student(sno:int,sna:string)
>row format delimited
>fields terminated by '\t'
>stored as textfile;
Output:
OK
Time
taken 0.343 seconds
10 Altering in table:
hive(csea)> alter table student
sno RENAME TO redg_no;
Output:
OK
Time
taken 0.042 seconds;
11 Drop Table:
hive(csea)> DROP table student;
Output:
OK
Time
taken 0.432 seconds.
12 Create view:
hive> CREATE VIEW 2012_emp_view (empno,empname,Joining_yr) AS
>
SELECT eno,ena,year FROM employee WHERE year=2012;
Output:
OK
Time
taken: 0.079 seconds
13 Alter view:
hive> ALTER VIEW 2012_emp_view AS
> SELECT
eno,year FROM employee
WHERE year=2012;
Output:
OK
Time
taken: 0.117 seconds
14 Drop View:
hive> DROP VIEW 2012_emp_view;
Output:
OK
Time
taken: 0.808 seconds
15 Create function:
hive> CREATE TEMPORARY FUNCTION
abc AS 'com.example.hive.udf.PrimeCheckUDF';
Output:
OK
Time
taken: 0.908 seconds
16 Altering function:
hive> ALTER FUNCTION abc
USING
JAR '/new/path/to/updated_prime_check_udf.jar';
Output:
OK
Time
taken: 0.704 seconds
17 Drop function:
hive> drop FUNCTION abc;
Output:
OK
Time
taken: 0.808 seconds
CTAS in Hive (Create Table
As Select): Create Table
CREATE TABLE high_salary_employees AS SELECT emp_id,
emp_name, salary
FROM
employee WHERE salary > 50000; Create partitioned table:
CREATE TABLE sales_partitioned ( sale_id INT,
product_id INT, amount FLOAT
)
PARTITIONED BY (sale_date STRING) STORED AS PARQUET;
Output:
OK
Time taken:
0.135 seconds
Creating a Bucketed Table
CREATE TABLE customers_bucketed ( customer_id INT,
name STRING, email STRING
)
CLUSTERED BY (customer_id) INTO 4 BUCKETS STORED AS ORC;
Output:
OK
Time taken:0.197 seconds
Joins:
Step1: Create Patients Table hive> CREATE TABLE patients
(
>
patient_id INT,
>
name STRING,
>
age INT
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;
OK
Time taken: 4.409
seconds
Step2: Create Diagnosis Table hive> CREATE TABLE diagnosis (
>
diagnosis_id INT,
>
patient_id INT,
>
disease STRING
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;
OK
Time taken: 0.111 seconds Step3: Prepare Input Data Files
[cloudera@quickstart ~]$ cat >patients.txt
1,john,45
2,bipin,44 3,rahul,23 4,neer,45
5,ram,24^Z
[1]+
Stopped cat > patients.txt
[cloudera@quickstart ~]$ cat patients.txt
1,john,45
2,bipin,44
3,rahul,23 4,neer,45
Step4:[cloudera@quickstart ~]$ cat >diagnosis1.txt
101,1,diabtes
103,2,hypertension 104,4,sugar 105,3,BP
^Z
[3]+
Stopped cat > diagnosis1.txt
[cloudera@quickstart ~]$ cat diagnosis1.txt
101,1,diabtes
103,2,hypertension 104,4,sugar 105,3,BP
Step5: : Upload Data to HDFS
1.
Create a directory
[cloudera@quickstart ~]$ hdfs dfs -mkdir -p /user/hive/warehouse/patients_data
2.
Put the files into directory
[cloudera@quickstart ~]$ hdfs dfs -put /home/cloudera/patients.txt
/user/hive/warehouse/patients_data
[cloudera@quickstart ~]$ hdfs dfs -put /home/cloudera/diagnosis1.txt
/user/hive/warehouse/patients_data
3.
verify files loaded into hdfs or not
[cloudera@quickstart ~]$ hdfs dfs -ls /user/hive/warehouse/patients_data/
Found 2 items
-rw-r--r-- 1 cloudera
supergroup 54
2025-03-21 05:47
/user/hive/warehouse/patients_data/diagnosis1.txt
-rw-r--r-- 1 cloudera
supergroup 42
2025-03-21 05:46
/user/hive/warehouse/patients_data/patients.txt
Step6:Loading data from files into tables
hive> LOAD DATA INPATH '/user/hive/warehouse/patients_data/patients.txt'
> INTO TABLE patients;
Loading data to table default.patients
Table default.patients stats:
[numFiles=1, totalSize=42] OK
Time taken: 0.977
seconds
hive> LOAD DATA INPATH '/user/hive/warehouse/patients_data/diagnosis1.txt'
> INTO TABLE diagnosis;
Loading data to table default.diagnosis
Table default.diagnosis stats:
[numFiles=1, totalSize=54] OK
Time taken: 0.38 seconds Step7:check the data loaded
or not hive> select * from
patients; Output:
OK
1 |
john |
45 |
2 |
bipin |
44 |
3 |
rahul |
23 |
4 |
neer |
45 |
Time taken: 0.44 seconds, Fetched:
4 row(s) hive> select *
from diagnosis;
Output:
OK
101 |
1 |
diabtes |
103 |
2 |
hypertension |
104 |
4 |
sugar |
105 |
3 |
BP |
Time taken: 0.071 seconds, Fetched:
4 row(s) Step8: run the
queries
Inner Join (Only Matching Records)
hive> SELECT
p.patient_id, p.name, p.age, d.disease
> FROM patients
p
> JOIN diagnosis
d
> ON p.patient_id = d.patient_id;
Output:
Query ID = cloudera_20250321055050_d4edf790-2711-4070-8e25-2ea1285b59cd
Total jobs = 1
Execution completed successfully MapredLocal task succeeded
OK
1 |
john |
45 |
diabtes |
2 |
bipin |
44 |
hypertension |
4 |
neer |
45 |
sugar |
3 |
rahul |
23 |
BP |
Time taken: 42.857 seconds,
Fetched: 4 row(s)
Left Join (All Patients, Even Without Diagnosis)
hive> SELECT
p.patient_id, p.name, p.age, d.disease
> FROM patients
p
> LEFT JOIN diagnosis d
> ON p.patient_id = d.patient_id;
Output:
Query ID = cloudera_20250321055252_3858002d-eae4-406a-9b01-87d332efabc2
Total jobs = 1
Execution completed successfully
MapredLocal task succeeded
OK
1 |
john |
45 |
diabtes |
2 |
bipin |
44 |
hypertension |
3 |
rahul |
23 |
BP |
4 |
neer |
45 |
sugar |
Time
taken: 36.066 seconds, Fetched: 4 row(s) Right Join (All Diagnoses, Even Without
Patient) hive> SELECT p.patient_id, p.name, p.age, d.disease
> FROM patients
p
> RIGHT JOIN diagnosis d
> ON p.patient_id = d.patient_id;
Output:
Query ID = cloudera_20250321055454_b59fbe32-c0b0-4b52-b2ae-199db6988f36
Total jobs = 1
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 1.77 sec HDFS Read: 6561 HDFS Write: 72 SUCCESS
Total MapReduce
CPU Time Spent:
1 seconds 770 msec
OK
1 |
john |
45 |
diabtes |
2 |
bipin |
44 |
hypertension |
4 |
neer |
45 |
sugar |
3 |
rahul |
23 |
BP |
Time taken: 31.549 seconds,
Fetched: 4 row(s)
Full Outer Join (All Records, Filling
Missing Values)
hive> SELECT
p.patient_id, p.name, p.age, d.disease
> FROM patients
p
> FULL OUTER
JOIN diagnosis d
> ON p.patient_id = d.patient_id;
Output:
Query ID = cloudera_20250321055555_9279366d-fae8-425a-997c-725517746534
Total jobs = 1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 5.34
sec
HDFS Read: 13287 HDFS
Write: 72 SUCCESS
Total MapReduce
CPU Time Spent:
5 seconds 340 msec
OK
1 |
john |
45 |
diabtes |
2 |
bipin |
44 |
hypertension |
3 |
rahul |
23 |
BP |
4 |
neer |
45 |
sugar |
Time taken: 48.205 seconds,
Fetched: 4 row(s)