Hello Friends,
Glad to present this blog which is for analysis of Weather Report POC, which is in Excel Format. This POC was given to me and asked by one of my friends to complete it.
Most of the time we get data in Excel Format and according to that we have to make changes in our coding. So, in this POC I have modified my previous code to accept the excel data, for the convenience of making you all understand the concept.
NOTE:- Though this POC is to read EXCEL data, I have not used the same in my coding but still it worked. (I have no idea how & why it happened. Kindly share if you know anything on the same.)
I worked out this POC on my previous POC's processed system. So all required jar files for excel reading were already there in hadoop lib folder.
I worked out this POC on my previous POC's processed system. So all required jar files for excel reading were already there in hadoop lib folder.
If you face any problem in reading the input file kindly use EXCEL INPUT FORMAT from my previous blog to read the data.
Problem Statement:
1. The system receives temperatures of various cities captured at regular intervals of time on each day in an input file.
2. All cities weather information for a week will be inputted to the system in a single input file.
3. System will process the input data file and generates a report with Maximum and Minimum temperatures of each day.
4. Generates a separate output report for each Month.
Ex: January-r-00000
February-r-00000
March-r-00000
5. Develop a PIG Script to filter the Map Reduce Output in the below fashion
- Provide the Unique data
- Sort the Unique data based on RETAIL_ID in DESC order
6. EXPORT the same PIG Output from HDFS to MySQL using SQOOP
7. Store the same PIG Output in a HIVE External Table.
- Provide the Unique data
- Sort the Unique data based on RETAIL_ID in DESC order
6. EXPORT the same PIG Output from HDFS to MySQL using SQOOP
7. Store the same PIG Output in a HIVE External Table.
Input File Format:- .xls (EXCEL Format)
This POC Input file and Problem statement was shared to me by Mr. Amol Wani which contains temperature statistics with time for multiple Months. Schema of record set is as shown in picture below :-
https://drive.google.com/file/d/0BzYUKIo7aWL_WkFYdWU5QWdJLTA/view?usp=sharing
hadoop fs -mkdir /InputData
2. MAP REDUCE CODES:-
NOTE:- If you face any problem in reading the input file kindly uncomment the following and add necessary class path & jar files.
// job.setInputFormatClass(ExcelInputFormat.class);
// job.setOutputFormatClass(TextOutputFormat.class);
// LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
In Mapper, after reading input data from excel, I am removing the first two lines which doesn't contain any related data, and then splitting the entire data and taking only Date and Temperatures as my output from Mapper which be be used as input for Reducer.
package com.poc.weather;
import java.io.IOException;
In Reducer phase taking the output from Mapper, I am splitting the temperatures to get max and min temp. and comparing them with other data of different hours from a single day to get the max and min temp of that day.
After getting the max and min temp, I am checking the date for sorting them into different months.
package com.poc.weather;
hadoop jar WeatherPoc.jar com.poc.weather.WeatherReportProcessor /InputData/WeatherReport.xls /WeatherOutput
We can clearly see that the input records is 8986 but the output is 365. ; It has sorted the data into number of days in a year which has been kept in different months as specified in coding.
A = LOAD '/WeatherReport/' USING PigStorage ('\t') AS (date:chararray, mintemp:float, maxtemp:float);
sqoop eval --connect jdbc:mysql://localhost/ --username root --password root --query "create database if not exists WEATHERPOC;";
sqoop eval --connect jdbc:mysql://localhost/ --username root --password root --query "use WEATHERPOC;";
sqoop eval --connect jdbc:mysql://localhost/ --username root --password root --query "grant all privileges on WEATHERPOC.* to ‘’@’localhost’;”;
sqoop eval --connect jdbc:mysql://localhost/WEATHERPOC --username root --password root --query "create table weatherpoc(date varchar(50), mintemp float, maxtemp float);";
6. STORE THE PIG OUTPUT IN A HIVE EXTERNAL TABLE
create external table weatherpoc(Name string, mintemp float, maxtemp float)
stored as textfile location '/WeatherPOC';
DOWNLOAD MY INPUT FILE FROM BELOW LINK:
https://drive.google.com/file/d/0BzYUKIo7aWL_WkFYdWU5QWdJLTA/view?usp=sharing
1. TO TAKE INPUT DATA ON HDFS
hadoop fs -mkdir /InputData
hadoop fs -put WeatherReport.txt /InputData
2. MAP REDUCE CODES:-
WEATHER REPORT PROCESSOR
(DRIVER CLASS)
(DRIVER CLASS)
NOTE:- If you face any problem in reading the input file kindly uncomment the following and add necessary class path & jar files.
// job.setInputFormatClass(ExcelInputFormat.class);
// job.setOutputFormatClass(TextOutputFormat.class);
// LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
Please go through my previous blog on Any Excel Data reading.
package com.poc.weather;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import com.poc.ExcelInputFormat;
public class WeatherReportProcessor {
public static String January = "January";
public static String February = "February";
public static String March = "March";
public static String April = "April";
public static String May = "May";
public static String June = "June";
public static String July = "July";
public static String August = "August";
public static String September = "September";
public static String October = "October";
public static String November = "November";
public static String December = "December";
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Weather Report");
job.setJarByClass(WeatherReportProcessor.class);
job.setMapperClass(WeatherMapper.class);
job.setReducerClass(WeatherReducer.class);
// job.setInputFormatClass(ExcelInputFormat.class);
// job.setOutputFormatClass(TextOutputFormat.class);
// LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleOutputs.addNamedOutput(job, January, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, February, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, March, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, April, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, May, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, June, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, July, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, August, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, September, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, October, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, November, TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, December, TextOutputFormat.class, Text.class, Text.class);
// job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
WEATHER MAPPER
(HAVING MAPPER LOGIC)
In Mapper, after reading input data from excel, I am removing the first two lines which doesn't contain any related data, and then splitting the entire data and taking only Date and Temperatures as my output from Mapper which be be used as input for Reducer.
package com.poc.weather;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WeatherMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
try {
if (value.toString().contains("ID") || value.toString().contains("mm"))
return;
else {
String[] str = value.toString().split(" ");
String data = "";
for (int i = 0; i < str.length; i++) {
if (str[i] != null || str[i] != " ") {
data += (str[i] + " ");
}
}
String Trim = data.trim().replaceAll("\\s+", "\t");
String[] Split = Trim.toString().split("\t");
String Date = Split[1] + Split[2] + Split[3] + Split[4] + Split[5];
String Temp = Split[9] + "\t" + Split[10];
context.write(new Text(Date), new Text(Temp));
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
WEATHER REDUCER
(HAVING REDUCER LOGIC)
In Reducer phase taking the output from Mapper, I am splitting the temperatures to get max and min temp. and comparing them with other data of different hours from a single day to get the max and min temp of that day.
After getting the max and min temp, I am checking the date for sorting them into different months.
package com.poc.weather;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
public class WeatherReducer extends Reducer<Text, Text, Text, Text> {
MultipleOutputs<Text, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs<Text, Text>(context);
}
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
float f1 = 0, f2 = 50;
Text result = new Text();
while (values.iterator().hasNext()) {
String sr = values.iterator().next().toString();
String[] str1 = sr.split("\t");
float max = Float.parseFloat(str1[0]);
float min = Float.parseFloat(str1[1]);
if (max > f1) {
f1 = max;
} else if (min < f2) {
f2 = min;
}
}
result = new Text(Float.toString(f1) + "\t" + Float.toString(f2));
String fileName = "";
if (key.toString().contains("/01/")) {
fileName = WeatherReportProcessor.January;
} else if (key.toString().contains("/02/")) {
fileName = WeatherReportProcessor.February;
} else if (key.toString().contains("/03/")) {
fileName = WeatherReportProcessor.March;
} else if (key.toString().contains("/04/")) {
fileName = WeatherReportProcessor.April;
} else if (key.toString().contains("/05/")) {
fileName = WeatherReportProcessor.May;
} else if (key.toString().contains("/06/")) {
fileName = WeatherReportProcessor.June;
} else if (key.toString().contains("/07/")) {
fileName = WeatherReportProcessor.July;
} else if (key.toString().contains("/08/")) {
fileName = WeatherReportProcessor.August;
} else if (key.toString().contains("/09/")) {
fileName = WeatherReportProcessor.September;
} else if (key.toString().contains("/10/")) {
fileName = WeatherReportProcessor.October;
} else if (key.toString().contains("/11/")) {
fileName = WeatherReportProcessor.November;
} else if (key.toString().contains("/12/")) {
fileName = WeatherReportProcessor.December;
}
// String strArr[] = key.toString().split("_");
// key.set(strArr[1]);
mos.write(fileName, key, result);
}
@Override
public void cleanup(Context context) throws IOException, InterruptedException {
mos.close();
}
}
3. EXECUTING THE MAP REDUCE CODE
hadoop jar WeatherPoc.jar com.poc.weather.WeatherReportProcessor /InputData/WeatherReport.xls /WeatherOutput
We can clearly see that the input records is 8986 but the output is 365. ; It has sorted the data into number of days in a year which has been kept in different months as specified in coding.
4. PIG SCRIPT
PigScript1.pig
A = LOAD '/WeatherReport/' USING PigStorage ('\t') AS (date:chararray, mintemp:float, maxtemp:float);
B = DISTINCT A;
DUMP B;
PigScript2.pig
A = LOAD '/WeatherReport/' USING PigStorage ('\t') AS (date:chararray, mintemp:float, maxtemp:float);
B = DISTINCT A;
5. EXPORT the PIG Output from HDFS to MySQL using SQOOP
sqoop eval --connect jdbc:mysql://localhost/ --username root --password root --query "create database if not exists WEATHERPOC;";
sqoop eval --connect jdbc:mysql://localhost/ --username root --password root --query "use WEATHERPOC;";
sqoop eval --connect jdbc:mysql://localhost/ --username root --password root --query "grant all privileges on WEATHERPOC.* to ‘localhost’@’%’;”;
sqoop eval --connect jdbc:mysql://localhost/WEATHERPOC --username root --password root --query "create table weatherpoc(date varchar(50), mintemp float, maxtemp float);";
Goto hive shell using command:
hive
show databases;
create database WeatherPOC;
use WeatherPOC;
create external table weatherpoc(Name string, mintemp float, maxtemp float)
row format delimited
fields terminated by '\t'
Understanding Hadoop By Mahesh Maharana: Hadoop Poc On Excel Data Weather Report Analysis >>>>> Download Now
ReplyDelete>>>>> Download Full
Understanding Hadoop By Mahesh Maharana: Hadoop Poc On Excel Data Weather Report Analysis >>>>> Download LINK
>>>>> Download Now
Understanding Hadoop By Mahesh Maharana: Hadoop Poc On Excel Data Weather Report Analysis >>>>> Download Full
>>>>> Download LINK
bitlis
ReplyDeleteurfa
mardin
tokat
çorum
BS3
İstanbul Lojistik
ReplyDeleteZonguldak Lojistik
Konya Lojistik
Ağrı Lojistik
Ordu Lojistik
67Z77
istanbul evden eve nakliyat
ReplyDeletekonya evden eve nakliyat
düzce evden eve nakliyat
bursa evden eve nakliyat
diyarbakır evden eve nakliyat
0HS7J
istanbul evden eve nakliyat
ReplyDeletekonya evden eve nakliyat
düzce evden eve nakliyat
bursa evden eve nakliyat
diyarbakır evden eve nakliyat
KU3RG
D8C80
ReplyDeleteHatay Parça Eşya Taşıma
Bitrue Güvenilir mi
Rize Şehir İçi Nakliyat
Denizli Parça Eşya Taşıma
Bilecik Lojistik
Kırşehir Şehirler Arası Nakliyat
Çerkezköy Halı Yıkama
Karabük Şehirler Arası Nakliyat
Ünye Oto Lastik
739D9
ReplyDeletebinance %20 indirim
A55D9
ReplyDeletebursa sesli sohbet odası
ağrı canlı sohbet ücretsiz
mersin kızlarla rastgele sohbet
kızlarla rastgele sohbet
görüntülü sohbet kızlarla
tekirdağ görüntülü sohbet yabancı
uşak sohbet muhabbet
bilecik sesli mobil sohbet
sivas sesli sohbet uygulamaları
6818D
ReplyDeletetelefonda görüntülü sohbet
erzincan görüntülü sohbet kızlarla
agri telefonda kızlarla sohbet
Bolu Canlı Sohbet Bedava
ardahan sohbet
yalova bedava görüntülü sohbet
sivas kadınlarla sohbet
malatya kızlarla canlı sohbet
mobil sohbet odaları
F4E8B
ReplyDeleteHozat
Üsküdar
Mazgirt
Pertek
Bayramören
Ovacık
Ağın
Çelebi
Hadim
7F10F
ReplyDeleteGate io Borsası Güvenilir mi
Kripto Para Kazma
Bitcoin Kazanma Siteleri
Facebook Beğeni Hilesi
Osmo Coin Hangi Borsada
Madencilik Nedir
Likee App Takipçi Hilesi
Bitcoin Kazanma
Soundcloud Takipçi Hilesi
شركة تسليك مجاري بالخبر 3gabKhktN3
ReplyDelete