1. Load data from another table
insert overwrite table batting
SELECT
regexp_extract(col_value,'^(?:([^,]*)\,?){1}',1) player_id,
regexp_extract(col_value,'^(?:([^,]*)\,?){2}',1) year,
regexp_extract(col_value,'^(?:([^,]*)\,?){9}',1) run
FROM temp_batting;
2. Load data from File
LOAD DATA INPATH '/user/hue/Batting.csv' OVERWRITE INTO TABLE temp_batting
Search This Blog
May 26, 2015
May 25, 2015
HIVE: Create Table
CREATE TABLE TABLE_NAME (COL1 STRING, COL2 INT);
create table Batting
(PlayerId STRING, Year INT,Runs INT);
create table Batting
(PlayerId STRING, Year INT,Runs INT);
May 24, 2015
Hive: Select Statements
1. SELECT * FROM Table_Name;
2. SELECT * FROM Table_Name WHERE Column_Name =’’;
3. SELECT COUNT(*) FROM TABLE_Name;
4. SELECT COL_1, MAX(COL_2) FROM Table_Name
GROUP BY COL_1
5. SELECT a.* from Table_Name a
Join (select * from table_name ) b
On (a.col_1 = b.col_1)
2. SELECT * FROM Table_Name WHERE Column_Name =’’;
3. SELECT COUNT(*) FROM TABLE_Name;
4. SELECT COL_1, MAX(COL_2) FROM Table_Name
GROUP BY COL_1
5. SELECT a.* from Table_Name a
Join (select * from table_name ) b
On (a.col_1 = b.col_1)
May 23, 2015
Hive: Hive Commands
How to invoke Hive
hive
How to exit from Hive
quit;
List all databases present in Hive
show databases;
Execute commands in a database
use database_name;
List all tables present in a database
show tables
How to view table structure
describe table_name;
hive
How to exit from Hive
quit;
List all databases present in Hive
show databases;
Execute commands in a database
use database_name;
List all tables present in a database
show tables
How to view table structure
describe table_name;
May 22, 2015
SQOOP: Jobs
How to get help of SQOOP JOB command
sqoop help job
How to view saved SQOOP jobs
sqoop job --list
How to view content of saved SQOOP job
sqoop job --show job_name
How to delete a saved SQOOP Job
sqoop job --delete job_name
sqoop help job
How to view saved SQOOP jobs
sqoop job --list
How to view content of saved SQOOP job
sqoop job --show job_name
How to delete a saved SQOOP Job
sqoop job --delete job_name
May 21, 2015
SQOOP: EXPORT data from HDFS to MYSQL
How to export data from HDFS (Hadoop) to MYSQL
The database and table must present. The HDFS file format is in tab delimiter in the following command.
sqoop export --connect jdbc:mysql://sandbox.hortonworks.com/custrm --username hive --password hive --table customer --fields-terminated-by '\t' --lines-terminated-by '\n' --export-dir /user/root/customer/ -m 1
The database and table must present. The HDFS file format is in tab delimiter in the following command.
sqoop export --connect jdbc:mysql://sandbox.hortonworks.com/custrm --username hive --password hive --table customer --fields-terminated-by '\t' --lines-terminated-by '\n' --export-dir /user/root/customer/ -m 1
May 15, 2015
SQOOP: Import data from MYSQL database to HDFS and HIVE
To Get MYSQL Server hostname and port number
Use the following commands inside MySQL
use information_schema;
select * from GLOBAL_VARIABLES where variable_name like 'hostname';
select * from GLOBAL_VARIABLES where variable_name like 'port';
Display all the database names present in MYSQL Server
database_name -- Replace this with MYSQL database name
table_name -- Replace this with table name
IMPORT data from MYSQL TABLE with no primary key to HIVE TABLE
database_name -- Replace with MYSQL Database
table_name -- Replace with table name in MYSQL
-m 1 -- allocates one mapper.
–split-by column_name
you can use either -m option or --split-by option
To import data from MYSQL to an existing folder in HDFS
database_name -- Replace with MYSQL Database
table_name -- Replace with table name in MYSQL
--append option to write files in the existing folder.
IMPORT data using MYSQL Query to HDFS
database_name -- Replace with MYSQL Database
Use the following commands inside MySQL
use information_schema;
select * from GLOBAL_VARIABLES where variable_name like 'hostname';
select * from GLOBAL_VARIABLES where variable_name like 'port';
Display all the database names present in MYSQL Server
sqoop list-databases --connect jdbc:mysql://localhost –username=hive –password=hive
IMPORT data from MYSQL TABLE to HIVE TABLE
sqoop import --connect
jdbc:mysql://localhost/database_name --username=hive --password=hive --table table_name
--hive-import database_name -- Replace this with MYSQL database name
table_name -- Replace this with table name
IMPORT data from MYSQL TABLE with no primary key to HIVE TABLE
sqoop import --connect
jdbc:mysql://localhost/database_name --username=hive --password=hive --table table_name
--hive-import -m 1
database_name -- Replace with MYSQL Database
table_name -- Replace with table name in MYSQL
-m 1 -- allocates one mapper.
–split-by column_name
you can use either -m option or --split-by option
To import data from MYSQL to an existing folder in HDFS
sqoop
import --connect jdbc:mysql://localhost/Database_Name --username=hive --password=hive
--table table_name --appenddatabase_name -- Replace with MYSQL Database
table_name -- Replace with table name in MYSQL
--append option to write files in the existing folder.
IMPORT data using MYSQL Query to HDFS
sqoop import --connect jdbc:mysql://localhost/Database_Name
--username=hive --password=hive -e’select * from employee where $CONDITIONS’ --target-dir /user/root/database/tablename
--target-dir option path starts from the home folder.
May 10, 2015
Hadoop: Setting up single node Cluster on Linux desktop
Following are the helpful links for setting up Hadoop on Linux
Successfully setup the Hadoop cluster on LINUX desktop.
1. http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
http://qursaan-howto.blogspot.com/2014/12/how-to-install-hadoop-as-single-node-on.html
http://www.geekyboy.com/archives/629
command to save the read only files
:w !sudo tee %O
3. VI editor commands for editing configuration files
http://www.cs.rit.edu/~cslab/vi.html
http://www.cs.colostate.edu/helpdocs/vi.html
January 20, 2015
Hadoop: Install Rstudio Server on Horton Works Hadoop Sandbox
Steps to Install RSTUDIO in HortonWorks Sandbox
- Login to HortonWorks sandbox with root user
- Install RStudio Server
- Create a different user (sandbox)
- Add port forward in VirtualBox
- Restart Hortonwors Hadoop Sandbox
- Login to Sandbox other than root user (Use user created in step 3)
- Open the RStudio in web browser.
- Open Google Chrome and type htt127.0.0.1:8787
Login to HortonWorks Sandbox
·
Start the Sandbox
·
Click ALT+F5
·
Login with user root and pwd hadoop
INSTALL RSTUDIO SERVER
Run
the following commands in the sandbox console
sudo yum install --nogpgcheck
rstudio-server-0.97.332-x86_64.rpm
CREATE ANOTHER USER
useradd sandbox
passwd sandbox
useradd sandbox
passwd sandbox
This is because you can not use root to log into
RStudio Server. Here I use sandbox as username, but you can choose whatever you
want.
ADDING PORT FORWARD IN VIRTUALBOX
1.
Go To Devices à
Network à
Network Settings
2.
Click on Advanced à
Port Forwarding (If Advanced is expanded you can see Port Forwarding)
3.
Click on Add button
4.
Add the RStudio port forwarding entries
6. You
need to use Google Chrome browser
January 19, 2015
Hadoop: Adding PORT forward to access Rstudio server from Hortonworks Hadoop Sandbox in VirtualBox
After installing RStudio Server on Hortonworks Hadoop Sandbox, to access RStudio from the browser (Google Chrome), need to set up port forwarding in VirtualBox.
Use the following steps for setting up port forwarding.
ADDING PORT FORWARD IN VIRTUALBOX
- Go To Devices à Network à Network Settings
- Click on Advanced à Port Forwarding (If Advanced is expanded you can see Port Forwarding)
- Click on Add button
- Add the RStudio port forwarding entries
- You can browse the RSTUDIO from the browser using http://127.0.0.1:8787
- You need to use Google Chrome browser.
Network Settings |
Port Forwarding |
Adding Port for forwarding |
January 18, 2015
Hadoop: Exploring Hadoop Sandbox files from Windows System
Problem: Members who are working on Windows OS with Hadoop sandbox or Hadoop not able to view or move the files from Windows OS to Hadoop Sandbox.
Solution:
1.You can use HUE interface to upload files from windows to Hadoop.
2. You can install HDFS Explorer on windows and explorer the files. This will help you to identify where the files are present in Hadoop sandbox.
3. you can use the following Hadoop commands
Hadoop fs -ls /<<folderpath>>
Install HDFS Explorer from the following link.
http://bigdata.red-gate.com/
http://hadoop.drawbackz.com/stack/12821/how-to-transfer-files-from-windows-to-hortonworks-sandbox.html
http://bigdata.red-gate.com/connecting-to-sandboxes.html
Solution:
1.You can use HUE interface to upload files from windows to Hadoop.
2. You can install HDFS Explorer on windows and explorer the files. This will help you to identify where the files are present in Hadoop sandbox.
3. you can use the following Hadoop commands
Hadoop fs -ls /<<folderpath>>
Install HDFS Explorer from the following link.
http://bigdata.red-gate.com/
http://hadoop.drawbackz.com/stack/12821/how-to-transfer-files-from-windows-to-hortonworks-sandbox.html
http://bigdata.red-gate.com/connecting-to-sandboxes.html
Subscribe to:
Posts (Atom)