Program 1:

Implement the following Data structures in java a)lists

b)Stacks c)Queues

sol:

a) List

(i) ArrayList

import java.util.ArrayList; import java.util.Collections; import java.util.Iterator;

public class ArrayListExample {

public static void main(String[] args) {

// Creating an ArrayList

ArrayList<String> list = new ArrayList<>();

// Adding elements list.add("Apple");

list.add("Banana");

list.add("Cherry");

list.add("Mango"); System.out.println("Initial List: " + list);

// Accessing an element

System.out.println("Element at index 2: " + list.get(2));

// Updating an element list.set(1, "Blueberry");

System.out.println("After updating index 1: " + list);

// Removing an element list.remove("Mango");

System.out.println("After removing 'Mango': " + list);

// Checking if an element exists

System.out.println("Contains 'Apple'? " + list.contains("Apple"));

// Sorting the list Collections.sort(list);

System.out.println("Sorted List: " + list);

// Iterating using for-each loop System.out.println("Iterating using for-each loop:"); for (String item : list) {

System.out.println(item);

}

// Iterating using Iterator System.out.println("Iterating using Iterator:"); Iterator<String> iterator = list.iterator(); while (iterator.hasNext()) {

System.out.println(iterator.next());

}

// Getting size of the list System.out.println("Size of list: " + list.size());

// Clearing the list list.clear();

System.out.println("After clearing: " + list);

}

Output:

Initial List: [Apple, Banana, Cherry, Mango] Element at index 2: Cherry

After updating index 1: [Apple, Blueberry, Cherry, Mango]

After removing 'Mango': [Apple, Blueberry, Cherry] Contains 'Apple'? true

Sorted List: [Apple, Blueberry, Cherry] Iterating using for-each loop:

Apple Blueberry Cherry

Iterating using Iterator:

Apple Blueberry Cherry

Size of list: 3 After clearing: []

(ii) LINKED LIST

import java.util.LinkedList; public class LinkedListExample {

public static void main(String[] args) { System.out.println("\nLinkedList Example:"); LinkedList<String> list = new LinkedList<>();

// Adding elements list.add("A");

list.add("B");

list.add("C");

list.add("D");

System.out.println("Initial LinkedList: " + list);

// Adding elements at first and last positions

list.addFirst("Start"); list.addLast("End");

System.out.println("After adding at first and last: " + list);

// Accessing elements

System.out.println("First Element: " + list.getFirst()); System.out.println("Last Element: " + list.getLast());

// Removing elements

System.out.println("Removed First: " + list.removeFirst()); System.out.println("Removed Last: " + list.removeLast()); System.out.println("LinkedList after removals: " + list);

// Checking if an element exists System.out.println("Contains 'B'? " + list.contains("B"));

// Getting size

System.out.println("Size of LinkedList: " + list.size());

// Iterating through the LinkedList System.out.println("Iterating through LinkedList:"); for (String item : list) {

System.out.println(item);

}

// Clearing the LinkedList list.clear();

System.out.println("LinkedList after clearing: " + list);

}

Output:

LinkedList Example:

Initial LinkedList: [A, B, C, D]

After adding at first and last: [Start, A, B, C, D, End] First Element: Start

Last Element: End Removed First: Start Removed Last: End

LinkedList after removals: [A, B, C, D] Contains 'B'? true

Size of LinkedList: 4 Iterating through LinkedList:

A B C D

LinkedList after clearing: []

(iii) VECTOR

import java.util.Vector; public class VectorExample {

public static void main(String[] args) { System.out.println("\nVector Example:"); Vector<Integer> vector = new Vector<>();

// Adding elements vector.add(10); vector.add(20); vector.add(30); vector.add(40);

System.out.println("Initial Vector: " + vector);

// Adding at a specific index vector.add(1, 15);

System.out.println("After adding 15 at index 1: " + vector);

// Replacing an element vector.set(2, 25);

System.out.println("After updating index 2: " + vector);

// Removing elements

System.out.println("Removed Element: " + vector.remove(0)); System.out.println("Vector after removals: " + vector);

// Checking if an element exists System.out.println("Contains 20? " + vector.contains(20));

// Getting an element

System.out.println("Element at index 1: " + vector.get(1));

// Getting size and capacity System.out.println("Size: " + vector.size()); System.out.println("Capacity: " + vector.capacity());

// Iterating through the Vector System.out.println("Iterating through Vector:"); for (Integer num : vector) {

System.out.println(num);

}

// Clearing the Vector vector.clear();

System.out.println("Vector after clearing: " + vector);

}

Output:

Vector Example:

Initial Vector: [10, 20, 30, 40]

After adding 15 at index 1: [10, 15, 20, 30, 40]

After updating index 2: [10, 15, 25, 30, 40]

Removed Element: 10

Vector after removals: [15, 25, 30, 40]

Contains 20? false

Element at index 1: 25

Size: 4

Capacity: 10

Iterating through Vector:

Vector after clearing: []

b) STACK

import java.util.Stack; public class StackExample {

public static void main(String[] args) {

// Creating a Stack

Stack<String> stack = new Stack<>();

// PUSH operation (Adding elements) stack.push("Apple"); stack.push("Banana"); stack.push("Cherry");

System.out.println("Stack after push: " + stack);

// PEEK operation (View top element) System.out.println("Top element (peek): " + stack.peek());

// POP operation (Removing top element) System.out.println("Popped element: " + stack.pop()); System.out.println("Stack after pop: " + stack);

// SEARCH operation (Find position of element)

int position = stack.search("Apple"); // Returns 1 (position from top) System.out.println("Position of 'Apple': " + position);

// CHECK if Stack is empty

System.out.println("Is stack empty? " + stack.isEmpty());

}

Output:

Popped element: Cherry

Stack after pop: [Apple, Banana] Position of 'Apple': 2

Is stack empty? False

c) QUEUE

(i) PRIORITY QUEUE

import java.util.PriorityQueue; public class PriorityQueueExample {

public static void main(String[] args) { System.out.println("\nPriorityQueue Example:"); PriorityQueue<Integer> pq = new PriorityQueue<>();

// Adding elements pq.add(30);

pq.add(10);

pq.add(20);

pq.add(40);

System.out.println("Initial PriorityQueue: " + pq);

// Accessing the head element

System.out.println("Peek (Head Element): " + pq.peek());

// Removing elements

System.out.println("Poll (Removing Head): " + pq.poll()); System.out.println("PriorityQueue after poll: " + pq);

// Checking if an element exists System.out.println("Contains 20? " + pq.contains(20));

// Getting size

System.out.println("Size of PriorityQueue: " + pq.size());

// Iterating through the PriorityQueue System.out.println("Iterating through PriorityQueue:"); for (Integer num : pq) {

System.out.println(num);

}

// Clearing the PriorityQueue pq.clear();

System.out.println("PriorityQueue after clearing: " + pq);

}

Output:

PriorityQueue Example:

Initial PriorityQueue: [10, 30, 20, 40] Peek (Head Element): 10

Poll (Removing Head): 10 PriorityQueue after poll: [20, 30, 40]

Contains 20? true

Size of PriorityQueue: 3 Iterating through PriorityQueue:

PriorityQueue after clearing: []

(ii) DEQUE

import java.util.Deque; import java.util.ArrayDeque; public class DequeExample {

public static void main(String[] args) { System.out.println("\nDeque Example:"); Deque<String> deque = new ArrayDeque<>(); deque.add("A");

deque.addFirst("Start"); deque.addLast("End"); deque.add("B"); System.out.println("Deque: " + deque);

System.out.println("First Element: " + deque.getFirst()); System.out.println("Last Element: " + deque.getLast()); System.out.println("Removed First: " + deque.removeFirst()); System.out.println("Removed Last: " + deque.removeLast()); System.out.println("Deque after removals: " + deque);

System.out.println("Contains 'A'? " + deque.contains("A")); System.out.println("Size: " + deque.size());

for (String item : deque)

{

System.out.println(item);

}

deque.clear();

System.out.println("Deque after clearing: " + deque);

}

Output:

Deque Example:

Deque: [Start, A, End, B] First Element: Start

Last Element: B Removed First: Start Removed Last: B

Deque after removals: [A, End] Contains 'A'? true

Size: 2 A

End

Deque after clearing: []

(iii) ArrayDeque

import java.util.ArrayDeque; public class ArrayDequeExample {

public static void main(String[] args) {

// Creating an ArrayDeque

ArrayDeque<String> deque = new ArrayDeque<>();

// Adding elements at the end deque.add("Apple"); deque.add("Banana"); deque.add("Cherry");

// Adding elements at the front deque.addFirst("Mango"); deque.addLast("Orange");

// Printing the deque

System.out.println("Deque after additions: " + deque);

// Removing elements deque.removeFirst(); // Removes "Mango" deque.removeLast(); // Removes "Orange"

// Printing the deque after removals System.out.println("Deque after removals: " + deque);

// Accessing elements

System.out.println("First element: " + deque.getFirst()); System.out.println("Last element: " + deque.getLast());

}

Output:

Deque after additions: [Mango, Apple, Banana, Cherry, Orange] Deque after removals: [Apple, Banana, Cherry]

First element: Apple Last element: Cherry

Program2:

Implement the following data structures in java a)Map

b) Set sol:

a) Map

(i) HashMap

import java.util.HashMap; public class HashMapExample {

public static void main(String[] args) { System.out.println("\nHashMap Example:"); HashMap<Integer, String> map = new HashMap<>(); map.put(1, "Apple");

map.put(2, "Banana");

map.put(3, "Cherry");

map.put(4, "Date");

System.out.println("Initial HashMap: " + map); System.out.println("Get key 2: " + map.get(2)); map.remove(3);

System.out.println("After removing key 3: " + map); System.out.println("Contains key 1? " + map.containsKey(1)); System.out.println("Contains value 'Banana'? " + map.containsValue("Banana")); System.out.println("Size: " + map.size());

System.out.println("Iterating through HashMap:"); for (var entry : map.entrySet()) {

System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());

}

map.clear();

System.out.println("HashMap after clearing: " + map);

}

Output:

HashMap Example:

Initial HashMap: {1=Apple, 2=Banana, 3=Cherry, 4=Date} Get key 2: Banana

After removing key 3: {1=Apple, 2=Banana, 4=Date} Contains key 1? true

Contains value 'Banana'? true Size: 3

Iterating through HashMap:

Key: 1, Value: Apple Key: 2, Value: Banana Key: 4, Value: Date

HashMap after clearing: {}

(ii) LinkedHashMap

import java.util.LinkedHashMap; public class LinkedHashMapExample {

public static void main(String[] args) { System.out.println("\nLinkedHashMap Example:"); LinkedHashMap<String, Integer> map = new LinkedHashMap<>(); map.put("One", 1);

map.put("Two", 2);

map.put("Three", 3);

map.put("Four", 4);

System.out.println("Initial LinkedHashMap: " + map); System.out.println("Get value for 'Two': " + map.get("Two")); map.remove("Three");

System.out.println("After removing 'Three': " + map); System.out.println("Contains key 'One'? " + map.containsKey("One")); System.out.println("Contains value 4? " + map.containsValue(4)); System.out.println("Size: " + map.size());

System.out.println("Iterating through LinkedHashMap:"); for (var entry : map.entrySet()) {

System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());

}

map.clear();

System.out.println("LinkedHashMap after clearing: " + map);

}

Output:

LinkedHashMap Example:

Initial LinkedHashMap: {One=1, Two=2, Three=3, Four=4} Get value for 'Two': 2

After removing 'Three': {One=1, Two=2, Four=4} Contains key 'One'? true

Contains value 4? true Size: 3

Iterating through LinkedHashMap:

Key: One, Value: 1 Key: Two, Value: 2 Key: Four, Value: 4

LinkedHashMap after clearing: {}

(iii) TreeMap

import java.util.TreeMap; public class TreeMapExample {

public static void main(String[] args) { System.out.println("\nTreeMap Example:"); TreeMap<Integer, String> map = new TreeMap<>(); map.put(5, "Eagle");

map.put(1, "Apple");

map.put(3, "Cherry");

map.put(2, "Banana"); System.out.println("Initial TreeMap: " + map);

System.out.println("Get value for key 2: " + map.get(2)); map.remove(3);

System.out.println("After removing key 3: " + map); System.out.println("Contains key 1? " + map.containsKey(1)); System.out.println("Contains value 'Eagle'? " + map.containsValue("Eagle")); System.out.println("Size: " + map.size());

System.out.println("Iterating through TreeMap:"); for (var entry : map.entrySet()) {

System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());

}

System.out.println("First Key: " + map.firstKey());

System.out.println("Last Key: " + map.lastKey()); map.clear();

System.out.println("TreeMap after clearing: " + map);

}

Output:

TreeMap Example:

Initial TreeMap: {1=Apple, 2=Banana, 3=Cherry, 5=Eagle} Get value for key 2: Banana

After removing key 3: {1=Apple, 2=Banana, 5=Eagle} Contains key 1? true

Contains value 'Eagle'? true Size: 3

Iterating through TreeMap:

Key: 1, Value: Apple Key: 2, Value: Banana Key: 5, Value: Eagle First Key: 1

Last Key: 5

TreeMap after clearing: {}

b) Set

SET

(i) HashSet

import java.util.HashSet; public class HashSetExample {

public static void main(String[] args) {

System.out.println("\nHashSet Example:"); HashSet<String> set = new HashSet<>(); set.add("Apple");

set.add("Banana");

set.add("Cherry");

set.add("Date");

System.out.println("Initial HashSet: " + set); set.remove("Cherry");

System.out.println("After removing 'Cherry': " + set); System.out.println("Contains 'Apple'? " + set.contains("Apple")); System.out.println("Size: " + set.size()); System.out.println("Iterating through HashSet:");

for (String item : set) { System.out.println(item);

}

set.clear();

System.out.println("HashSet after clearing: " + set);

}

Output:

HashSet Example:

Initial HashSet: [Apple, Cherry, Date, Banana] After removing 'Cherry': [Apple, Date, Banana] Contains 'Apple'? true

Size: 3

Iterating through HashSet: Apple

Date Banana

HashSet after clearing: []

(ii) LinkedHashSet

import java.util.LinkedHashSet; public class LinkedHashSetExample {

public static void main(String[] args) { System.out.println("\nLinkedHashSet Example:"); LinkedHashSet<Integer> set = new LinkedHashSet<>(); set.add(10);

set.add(20);

set.add(30);

set.add(40);

System.out.println("Initial LinkedHashSet: " + set); set.remove(30);

System.out.println("After removing 30: " + set); System.out.println("Contains 20? " + set.contains(20)); System.out.println("Size: " + set.size()); System.out.println("Iterating through LinkedHashSet:"); for (Integer num : set) {

System.out.println(num);

}

set.clear();

System.out.println("LinkedHashSet after clearing: " + set);

}

Output:

LinkedHashSet Example:

Initial LinkedHashSet: [10, 20, 30, 40]

After removing 30: [10, 20, 40]

Contains 20? true

Size: 3

Iterating through LinkedHashSet:

LinkedHashSet after clearing: []

(iii) TreeSet

import java.util.TreeSet;

public class TreeSetExample {

public static void main(String[] args) {

// Creating a TreeSet

TreeSet<Integer> treeSet = new TreeSet<>();

// Adding elements to the TreeSet treeSet.add(20);

treeSet.add(10); treeSet.add(40); treeSet.add(30); treeSet.add(50);

// Printing TreeSet (It will be sorted) System.out.println("TreeSet: " + treeSet);

// Removing an element treeSet.remove(30);

System.out.println("After removing 30: " + treeSet);

// Checking if an element exists

System.out.println("Does TreeSet contain 20? " + treeSet.contains(20));

// Retrieving first and last elements System.out.println("First element: " + treeSet.first()); System.out.println("Last element: " + treeSet.last());

// Getting subset (headSet, tailSet, subSet) System.out.println("Elements less than 40: " + treeSet.headSet(40));

System.out.println("Elements greater than or equal to 20: " + treeSet.tailSet(20)); System.out.println("Elements between 10 and 40: " + treeSet.subSet(10, 40));

// Checking size of the TreeSet System.out.println("Size of TreeSet: " + treeSet.size());

// Clearing the TreeSet treeSet.clear();

System.out.println("After clearing, is empty? " + treeSet.isEmpty());

}

Output:

TreeSet: [10, 20, 30, 40, 50]

After removing 30: [10, 20, 40, 50] Does TreeSet contain 20? true First element: 10

Last element: 50

Elements less than 40: [10, 20]

Elements greater than or equal to 20: [20, 40, 50]

Elements between 10 and 40: [10, 20] Size of TreeSet: 4

After clearing, is empty? true

(iv) SortedSet

import java.util.SortedSet; import java.util.TreeSet;

public class SortedSetExample {

public static void main(String[] args) { System.out.println("\nSortedSet Example:"); SortedSet<Integer> set = new TreeSet<>(); set.add(50);

set.add(10);

set.add(40);

set.add(20);

set.add(30);

System.out.println("Initial SortedSet: " + set); set.remove(30);

System.out.println("After removing 30: " + set);

System.out.println("First Element: " + set.first()); System.out.println("Last Element: " + set.last()); System.out.println("Contains 20? " + set.contains(20)); System.out.println("Size: " + set.size()); System.out.println("Iterating through SortedSet:");

for (Integer num : set) { System.out.println(num);

}

set.clear();

System.out.println("SortedSet after clearing: " + set);

}

Output:

SortedSet Example:

Initial SortedSet: [10, 20, 30, 40, 50]

After removing 30: [10, 20, 40, 50]

First Element: 10

Last Element: 50

Contains 20? true

Size: 4

Iterating through SortedSet: 10

SortedSet after clearing: []

Program 3:

Implement the following file management tasks in Hadoop:

· Adding files and directories

· Retrieving files

· Deleting files

Create a Directory in HDFS:

hdfs dfs -mkdir /user/gowthu/data

(Creates a directory named data under /user/gowthu/)

Upload (Add) a File to HDFS:

hdfs dfs -put localfile.txt /user/gowthu/data/

(Uploads localfile.txt from the local system to HDFS /user/gowthu/data/)

Copy a File from Local to HDFS:

hdfs dfs -copyFromLocal example.txt /user/gowthu/data/

(Copies example.txt from the local system to /user/gowthu/data/ in HDFS)

List Files in HDFS:

hdfs dfs -ls /user/gowthu/data/

(Lists all files inside /user/gowthu/data/)

Retrieve a File from HDFS to Local:

hdfs dfs -get /user/gowthu/data/example.txt

(Downloads example.txt from HDFS to the current local directory)

Copy a File from HDFS to Local:

hdfs dfs -copyToLocal /user/gowthu/data/example.txt /home/hasan/ (Copies example.txt from HDFS to /home/hasan/ on the local system) Delete a File in HDFS:

hdfs dfs -rm /user/gowthu/data/in1.txt

(Deletes in1.txt from HDFS)

Delete a Directory in HDFS:

hdfs dfs -rm -r /user/gowthu/data/

(Recursively deletes /user/gowthu/data/ and all its files)

Program 4:

Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.

import java.io.IOException; import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) {

word.set(itr.nextToken()); context.write(word, one);

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context

) throws IOException, InterruptedException { int sum = 0;

for (IntWritable val : values) { sum += val.get();

}

result.set(sum); context.write(key, result);

}

public static void main(String[] args) throws Exception { Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

a) Open Terminal

§ Run the following commands step by step.

b) Check Current Directory

§ ls

§ pwd

c) Create an Input File

§ cat > /home/cloudera/processfile1.txt

§ Enter some text:

(Example:Hadoop is good for Big Data Hadoop is not for Small Data

It is a Java-based framework)

d) Upload Input File to HDFS

§ hdfs dfs -mkdir /inputfolder1

§ hdfs dfs -put /home/cloudera/processfile1.txt /inputfolder1/

e) Verify Input File in HDFS

§ hdfs dfs -cat /inputfolder1/processfile1.txt

f) Run the MapReduce Job

§ hadoop jar /home/cloudera/wordCount.jar WordCountDriver

/inputfolder1/processfile1.txt /output1

g) Check Output Directory in HDFS

§ hdfs dfs -ls /output1

h) View Final Word Count Output

§ hdfs dfs -cat /output1/part-r-00000

i) Cross-check with Original File

	§ cat /home/cloudera/processfile1.txt
Output:
Big	1
Data	2
Hadoop	2
It	1

Java-based 1

Small 1

a 1

for 2

Program 5:

Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your data.

$pig

grunt> titanic = LOAD 'titanic_sample.csv' USING PigStorage(',')

AS (PassengerId:int, Survived:int, Pclass:int, Name:chararray, Sex:chararray, Age:int, SibSp:int, Parch:int, Ticket:chararray, Fare:float,

Cabin:chararray, Embarked:chararray); grunt> DUMP titanic;

output:

(1,0,3,Braund,male,22,1,0,A/5 21171,7.25,S)

(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C)

(3,1,3,Heikkinen,female,26,0,0,STON/O2. 3101282,7.925,S) (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S)

(5,0,3,Allen,male,35,0,0,373450,8.05,S)

Sort Passengers by Age:

grunt> sorted_data = ORDER titanic BY Age ASC; grunt>DUMP sorted_data;

output:

(1,0,3,Braund,male,22,1,0,A/5 21171,7.25,S)

(3,1,3,Heikkinen,female,26,0,0,STON/O2. 3101282,7.925,S) (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S)

(5,0,3,Allen,male,35,0,0,373450,8.05,S)

(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C)

Group Passengers by Survival Status:

grunt> grouped_data = GROUP titanic BY Survived; grunt>DUMP grouped_data;

output:

(0,{(1,0,3,Braund,male,22,1,0,A/5 21171,7.25,S), (5,0,3,Allen,male,35,0,0,373450,8.05,,S)}) (1,{(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C),

(3,1,3,Heikkinen,female,26,0,0,STON/O2. 3101282,7.925,S), (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S)})

Project (Select) Only Specific Columns:

grunt> projected_data = FOREACH titanic GENERATE PassengerId, Name, Age; grunt>DUMP projected_data;

output:

(1,Braund,22) (2,Cumings,38) (3,Heikkinen,26) (4,Futrelle,35)

(5,Allen,35)

Filter Female Passengers Below Age 30:

grunt> filtered_data = FILTER titanic BY Sex == 'female' AND Age < 30; grunt>DUMP filtered_data;

output:

(3,1,3,Heikkinen,female,26,0,0,STON/O2. 3101282,7.925,S)

Join Titanic Data with Ticket Prices:

Sample Ticket Price Dataset (ticket_prices.csv)

Ticket,Price

PC 17599,71.28

113803,53.1

STON/O2. 3101282,7.92

grunt> ticket_data = LOAD 'ticket_prices.csv' USING PigStorage(',') AS (Ticket:chararray, Price:float);

grunt>joined_data = JOIN titanic BY Ticket, ticket_data BY Ticket; grunt>DUMP joined_data;

output:

(2,1,1,Cumings,female,38,1,0,PC 17599,71.2833,C85,C,PC 17599,71.28)

(3,1,3,Heikkinen,female,26,0,0,STON/O2. 3101282,7.925,S,STON/O2. 3101282,7.92) (4,1,1,Futrelle,female,35,1,0,113803,53.1,C123,S,113803,53.1)

Clean Data:

grunt> cleaned_data = FILTER titanic BY Age IS NOT NULL AND Fare IS NOT NULL;

grunt> cleaned_data = FOREACH cleaned_data GENERATE PassengerId, Survived, Pclass, LOWER(Sex) AS Sex, Age, Fare, Embarked;

grunt>DUMP cleaned_data;

output:

(1,0,3,male,22,7.25,S)

(2,1,1,female,38,71.2833,C)

(3,1,3,female,26,7.925,S)

(4,1,1,female,35,53.1,S)

(5,0,3,male,35,8.05,S)

Normalize Fare (Scale between 0-1):

grunt> fare_stats = FOREACH (GROUP cleaned_data ALL) GENERATE MIN(cleaned_data.Fare) AS min_fare,

MAX(cleaned_data.Fare) AS max_fare;

grunt> normalized_data = FOREACH cleaned_data GENERATE PassengerId, Survived, Pclass, Sex, Age,

(Fare - fare_stats.min_fare) / (fare_stats.max_fare - fare_stats.min_fare) AS NormalizedFare, Embarked;

grunt> dump normalized_data;

output:

(1,0,3,male,22,0.0,S)

(2,1,1,female,38,1.0,C)

(3,1,3,female,26,0.0112,S)

(4,1,1,female,35,0.774,S)

(5,0,3,male,35,0.0133,S)

Load data into new file:

grunt> STORE normalized_data INTO 'output/normalized_titanic' USING PigStorage(','); grunt> exit;

$ hdfs dfs -cat output/normalized_titanic

output:

1,0,3,male,22,0.0,S

2,1,1,female,38,1.0,C

3,1,3,female,26,0.0112,S

4,1,1,female,35,0.774,S

5,0,3,male,35,0.0133,S

Program 6:

Run Hive then use Hive to create, alter, and drop databases, tables, views, functions, and Indexes.

1 Create database:

hive> create database csea; hive> create database cseb; hive>use csea

2 Show database:

hive> show databases ;

Output:

csea cseb default

Time taken: 0.985 seconds, Fetched: 3 row(s)

4 Alter database:

hive> alter database csea set DBPROPERTIES ('creator'='abc');

Output:

Time taken: 0.196 seconds

5 Drop database:

hive> DROP DATABASE csea;

Output:

Time taken: 0.196 seconds

6 Create Index:

CREATE TABLE orders ( order_id INT, customer_id INT, product STRING, category STRING,

price DOUBLE,

order_date STRING

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

CREATE INDEX category_index ON TABLE orders (category)

AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndex' WITH DEFERRED REBUILD;

Output:

Time taken: 0.167 seconds

7 Altering Index:

ALTER INDEX category_index ON orders REBUILD; Output:

Time taken: 0.167 seconds

8 Drop Index:

DROP INDEX category_index ON orders;

Output:

Time taken: 0.161 seconds

9 Create table:

hive> use csea;

hive(csea)> create table student(sno:int,sna:string)

>row format delimited

>fields terminated by '\t'

>stored as textfile;

Output:

Time taken 0.343 seconds

10 Altering in table:

hive(csea)> alter table student sno RENAME TO redg_no;

Output:

Time taken 0.042 seconds;

11 Drop Table:

hive(csea)> DROP table student;

Output:

Time taken 0.432 seconds.

12 Create view:

hive> CREATE VIEW 2012_emp_view (empno,empname,Joining_yr) AS

> SELECT eno,ena,year FROM employee WHERE year=2012;

Output:

Time taken: 0.079 seconds

13 Alter view:

hive> ALTER VIEW 2012_emp_view AS

> SELECT eno,year FROM employee WHERE year=2012;

Output:

Time taken: 0.117 seconds

14 Drop View:

hive> DROP VIEW 2012_emp_view;

Output:

Time taken: 0.808 seconds

15 Create function:

hive> CREATE TEMPORARY FUNCTION abc AS 'com.example.hive.udf.PrimeCheckUDF';

Output:

Time taken: 0.908 seconds

16 Altering function:

hive> ALTER FUNCTION abc

USING JAR '/new/path/to/updated_prime_check_udf.jar';

Output:

Time taken: 0.704 seconds

17 Drop function:

hive> drop FUNCTION abc;

Output:

Time taken: 0.808 seconds

CTAS in Hive (Create Table As Select): Create Table

CREATE TABLE high_salary_employees AS SELECT emp_id, emp_name, salary

FROM employee WHERE salary > 50000; Create partitioned table:

CREATE TABLE sales_partitioned ( sale_id INT,

product_id INT, amount FLOAT

)

PARTITIONED BY (sale_date STRING) STORED AS PARQUET;

Output:

Time taken: 0.135 seconds

Creating a Bucketed Table

CREATE TABLE customers_bucketed ( customer_id INT,

name STRING, email STRING

)

CLUSTERED BY (customer_id) INTO 4 BUCKETS STORED AS ORC;

Output:

Time taken:0.197 seconds

Joins:

Step1: Create Patients Table hive> CREATE TABLE patients (

> patient_id INT,

> name STRING,

> age INT

> )

> ROW FORMAT DELIMITED

> FIELDS TERMINATED BY ','

> STORED AS TEXTFILE; OK

Time taken: 4.409 seconds

Step2: Create Diagnosis Table hive> CREATE TABLE diagnosis (

> diagnosis_id INT,

> patient_id INT,

> disease STRING

> )

> ROW FORMAT DELIMITED

> FIELDS TERMINATED BY ','

> STORED AS TEXTFILE; OK

Time taken: 0.111 seconds Step3: Prepare Input Data Files

[cloudera@quickstart ~]$ cat >patients.txt 1,john,45

2,bipin,44 3,rahul,23 4,neer,45 5,ram,24^Z

[1]+ Stopped cat > patients.txt [cloudera@quickstart ~]$ cat patients.txt 1,john,45

2,bipin,44 3,rahul,23 4,neer,45

Step4:[cloudera@quickstart ~]$ cat >diagnosis1.txt 101,1,diabtes

103,2,hypertension 104,4,sugar 105,3,BP

[3]+ Stopped cat > diagnosis1.txt [cloudera@quickstart ~]$ cat diagnosis1.txt 101,1,diabtes

103,2,hypertension 104,4,sugar 105,3,BP

Step5: : Upload Data to HDFS

1. Create a directory

[cloudera@quickstart ~]$ hdfs dfs -mkdir -p /user/hive/warehouse/patients_data

2. Put the files into directory

[cloudera@quickstart ~]$ hdfs dfs -put /home/cloudera/patients.txt

/user/hive/warehouse/patients_data

[cloudera@quickstart ~]$ hdfs dfs -put /home/cloudera/diagnosis1.txt

/user/hive/warehouse/patients_data

3. verify files loaded into hdfs or not

[cloudera@quickstart ~]$ hdfs dfs -ls /user/hive/warehouse/patients_data/ Found 2 items

-rw-r--r-- 1 cloudera supergroup 54 2025-03-21 05:47

/user/hive/warehouse/patients_data/diagnosis1.txt

-rw-r--r-- 1 cloudera supergroup 42 2025-03-21 05:46

/user/hive/warehouse/patients_data/patients.txt Step6:Loading data from files into tables

hive> LOAD DATA INPATH '/user/hive/warehouse/patients_data/patients.txt'

> INTO TABLE patients;

Loading data to table default.patients

Table default.patients stats: [numFiles=1, totalSize=42] OK

Time taken: 0.977 seconds

hive> LOAD DATA INPATH '/user/hive/warehouse/patients_data/diagnosis1.txt'

> INTO TABLE diagnosis;

Loading data to table default.diagnosis

Table default.diagnosis stats: [numFiles=1, totalSize=54] OK

Time taken: 0.38 seconds Step7:check the data loaded or not hive> select * from patients; Output:

1	john	45
2	bipin	44
3	rahul	23
4	neer	45

Time taken: 0.44 seconds, Fetched: 4 row(s) hive> select * from diagnosis;

Output:

101	1	diabtes
103	2	hypertension
104	4	sugar
105	3	BP

Time taken: 0.071 seconds, Fetched: 4 row(s) Step8: run the queries

Inner Join (Only Matching Records)

hive> SELECT p.patient_id, p.name, p.age, d.disease

> FROM patients p

> JOIN diagnosis d

> ON p.patient_id = d.patient_id;

Output:

Query ID = cloudera_20250321055050_d4edf790-2711-4070-8e25-2ea1285b59cd Total jobs = 1

Execution completed successfully MapredLocal task succeeded

1	john	45	diabtes
2	bipin	44	hypertension
4	neer	45	sugar
3	rahul	23	BP

Time taken: 42.857 seconds, Fetched: 4 row(s)

Left Join (All Patients, Even Without Diagnosis)

hive> SELECT p.patient_id, p.name, p.age, d.disease

> FROM patients p

> LEFT JOIN diagnosis d

> ON p.patient_id = d.patient_id;

Output:

Query ID = cloudera_20250321055252_3858002d-eae4-406a-9b01-87d332efabc2 Total jobs = 1

Execution completed successfully

MapredLocal task succeeded OK

1	john	45	diabtes
2	bipin	44	hypertension
3	rahul	23	BP
4	neer	45	sugar

Time taken: 36.066 seconds, Fetched: 4 row(s) Right Join (All Diagnoses, Even Without Patient) hive> SELECT p.patient_id, p.name, p.age, d.disease

> FROM patients p

> RIGHT JOIN diagnosis d

> ON p.patient_id = d.patient_id;

Output:

Query ID = cloudera_20250321055454_b59fbe32-c0b0-4b52-b2ae-199db6988f36 Total jobs = 1

MapReduce Jobs Launched:

Stage-Stage-3: Map: 1 Cumulative CPU: 1.77 sec HDFS Read: 6561 HDFS Write: 72 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 770 msec OK

1	john	45	diabtes
2	bipin	44	hypertension
4	neer	45	sugar
3	rahul	23	BP

Time taken: 31.549 seconds, Fetched: 4 row(s)

Full Outer Join (All Records, Filling Missing Values)

hive> SELECT p.patient_id, p.name, p.age, d.disease

> FROM patients p

> FULL OUTER JOIN diagnosis d

> ON p.patient_id = d.patient_id;

Output:

Query ID = cloudera_20250321055555_9279366d-fae8-425a-997c-725517746534

Total jobs = 1

MapReduce Jobs Launched:

Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 5.34 sec HDFS Read: 13287 HDFS

Write: 72 SUCCESS

Total MapReduce CPU Time Spent: 5 seconds 340 msec OK

1	john	45	diabtes
2	bipin	44	hypertension
3	rahul	23	BP
4	neer	45	sugar

Time taken: 48.205 seconds, Fetched: 4 row(s)

Tuesday, 29 April 2025

BIG DATA CD 362

Output:

(ii) LINKED LIST

Output:

(iii) VECTOR

Output:

b) STACK

Output:

c) QUEUE

Output:

(ii) DEQUE

Output:

(iii) ArrayDeque

Output:

Output:

(ii) LinkedHashMap

Output:

(iii) TreeMap

Output:

Output:

(ii) LinkedHashSet

Output:

(iii) TreeSet

Output:

(iv) SortedSet

Output:

Upload (Add) a File to HDFS:

Copy a File from Local to HDFS:

List Files in HDFS:

Retrieve a File from HDFS to Local:

Copy a File from HDFS to Local:

hdfs dfs -rm /user/gowthu/data/in1.txt

Delete a Directory in HDFS:

Program 4:

a) Open Terminal

b) Check Current Directory

c) Create an Input File

d) Upload Input File to HDFS

e) Verify Input File in HDFS

f) Run the MapReduce Job

g) Check Output Directory in HDFS

h) View Final Word Count Output

i) Cross-check with Original File

output:

Sort Passengers by Age:

output:

Group Passengers by Survival Status:

output:

Project (Select) Only Specific Columns:

output:

Filter Female Passengers Below Age 30:

output:

Join Titanic Data with Ticket Prices:

output:

Clean Data:

output:

Normalize Fare (Scale between 0-1):

output:

Load data into new file:

output:

2 Show database:

Output:

4 Alter database:

Output:

5 Drop database:

Output:

6 Create Index:

Output:

7 Altering Index:

8 Drop Index:

Output:

9 Create table:

Output:

10 Altering in table:

Output:

11 Drop Table:

Output:

12 Create view:

Output: