Search This Blog

May 26, 2015

Hive: Insert OR Load data

1. Load data from another table
insert overwrite table batting
SELECT
regexp_extract(col_value,'^(?:([^,]*)\,?){1}',1) player_id,
regexp_extract(col_value,'^(?:([^,]*)\,?){2}',1) year,
regexp_extract(col_value,'^(?:([^,]*)\,?){9}',1) run
FROM temp_batting;
2. Load data from File
LOAD DATA INPATH '/user/hue/Batting.csv' OVERWRITE INTO TABLE temp_batting

May 25, 2015

HIVE: Create Table

CREATE TABLE TABLE_NAME (COL1 STRING, COL2 INT);

create table Batting
(PlayerId STRING, Year INT,Runs INT);

May 24, 2015

Hive: Select Statements

1. SELECT * FROM Table_Name;
2. SELECT * FROM Table_Name WHERE Column_Name =’’;
3. SELECT COUNT(*) FROM TABLE_Name;
4. SELECT COL_1, MAX(COL_2) FROM Table_Name
GROUP BY COL_1
5. SELECT a.* from Table_Name a
Join (select * from table_name ) b
On (a.col_1 = b.col_1)

May 23, 2015

Hive: Hive Commands

How to invoke Hive


hive


How to exit from Hive


quit;


List all databases present in Hive


show databases;


Execute commands in a database


use database_name;


List all tables present in a database


show tables


How to view table structure


describe table_name;


May 22, 2015

SQOOP: Jobs

How to get help of SQOOP JOB command


   sqoop help job


How to view saved SQOOP jobs


sqoop job --list


How to view content of saved SQOOP job


  sqoop job --show job_name


How to delete a saved SQOOP Job


 sqoop job --delete job_name

May 21, 2015

SQOOP: EXPORT data from HDFS to MYSQL

How to export data from HDFS (Hadoop) to MYSQL

The database and table must present. The HDFS file format is in tab delimiter in the following command.


sqoop export --connect jdbc:mysql://sandbox.hortonworks.com/custrm --username hive --password hive  --table customer --fields-terminated-by '\t' --lines-terminated-by '\n' --export-dir /user/root/customer/ -m 1

May 15, 2015

SQOOP: Import data from MYSQL database to HDFS and HIVE

To Get MYSQL Server hostname and port number
Use the following commands inside MySQL


use information_schema;
select * from GLOBAL_VARIABLES where variable_name like 'hostname';
select * from GLOBAL_VARIABLES where variable_name like 'port';


Display all the database names present in MYSQL Server

sqoop list-databases --connect  jdbc:mysql://localhost –username=hive –password=hive


IMPORT data from MYSQL TABLE to HIVE TABLE
sqoop import --connect jdbc:mysql://localhost/database_name --username=hive --password=hive --table table_name --hive-import

database_name  -- Replace this with MYSQL database name
table_name  -- Replace this with table name


IMPORT data from MYSQL TABLE with no primary key to HIVE TABLE

sqoop import --connect jdbc:mysql://localhost/database_name --username=hive --password=hive --table table_name --hive-import -m 1

database_name -- Replace with MYSQL Database
table_name  -- Replace with table name in MYSQL

-m 1 -- allocates one mapper.
–split-by column_name
you can use either -m option or --split-by option

To import data from MYSQL to an existing folder in HDFS

sqoop import --connect jdbc:mysql://localhost/Database_Name --username=hive --password=hive --table table_name  --append

database_name -- Replace with MYSQL Database
table_name  -- Replace with table name in MYSQL

--append option to write files in the existing folder.

IMPORT data using MYSQL Query to HDFS

sqoop import --connect jdbc:mysql://localhost/Database_Name --username=hive --password=hive -e’select * from employee where $CONDITIONS’  --target-dir /user/root/database/tablename
 
database_name -- Replace with MYSQL Database
--target-dir option path starts from the home folder.
 







May 10, 2015

Hadoop: Setting up single node Cluster on Linux desktop


Following are the helpful links for setting up Hadoop on Linux

Successfully setup the Hadoop cluster on LINUX desktop.

1. http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
    http://qursaan-howto.blogspot.com/2014/12/how-to-install-hadoop-as-single-node-on.html
2. Editing read only file in VI editor 
    http://www.geekyboy.com/archives/629
command to save the read only files
:w !sudo tee %
O

3. VI editor commands for editing configuration files
    http://www.cs.rit.edu/~cslab/vi.html
    http://www.cs.colostate.edu/helpdocs/vi.html

January 20, 2015

Hadoop: Install Rstudio Server on Horton Works Hadoop Sandbox



Steps to Install RSTUDIO in HortonWorks Sandbox

  1.  Login to HortonWorks sandbox with root user
  2.  Install RStudio Server
  3.   Create a different user (sandbox)
  4.  Add port forward in VirtualBox
  5.  Restart Hortonwors Hadoop Sandbox
  6. Login to Sandbox other than root user (Use user created in step 3)
  7.  Open the RStudio in web browser.
  8.  Open Google Chrome and type htt127.0.0.1:8787

Login to HortonWorks Sandbox

·         Start the Sandbox
·         Click ALT+F5
·         Login with user root and pwd hadoop  

INSTALL RSTUDIO SERVER

Run the following commands in the sandbox console
sudo yum install --nogpgcheck rstudio-server-0.97.332-x86_64.rpm

CREATE ANOTHER USER
useradd sandbox
passwd sandbox

This is because you can not use root to log into RStudio Server. Here I use sandbox as username, but you can choose whatever you want.

ADDING PORT FORWARD IN VIRTUALBOX

1.       Go To Devices à Network à Network Settings
2.       Click on Advanced à Port Forwarding (If Advanced is expanded you can see Port Forwarding)
3.       Click on Add button
4.       Add the RStudio port forwarding entries
5.       You can browse the RSTUDIO from the browser using http://127.0.0.1:8787
6.  You need to use Google Chrome browser


January 19, 2015

Hadoop: Adding PORT forward to access Rstudio server from Hortonworks Hadoop Sandbox in VirtualBox


After installing RStudio Server on Hortonworks Hadoop Sandbox, to access RStudio from the browser (Google Chrome), need to set up port forwarding in VirtualBox.

Use the following steps for setting up port forwarding.

ADDING PORT FORWARD IN VIRTUALBOX
  1. Go To Devices à Network à Network Settings
  2.  Click on Advanced à Port Forwarding (If Advanced is expanded you can see Port Forwarding)
  3. Click on Add button
  4. Add the RStudio port forwarding entries
  5. You can browse the RSTUDIO from the browser using http://127.0.0.1:8787
  6. You need to use Google Chrome browser.
Network Settings



Port Forwarding
Adding Port for forwarding

January 18, 2015

Hadoop: Exploring Hadoop Sandbox files from Windows System

Problem: Members who are working on Windows OS with Hadoop sandbox or Hadoop not able to view or move the files from Windows OS to Hadoop Sandbox.

Solution:
1.You can use HUE interface to upload files from windows to Hadoop.
2. You can install HDFS Explorer on windows and explorer the files. This will help you to identify where the files are present in Hadoop sandbox.
3. you can use the following Hadoop commands
Hadoop fs -ls /<<folderpath>>

Install HDFS Explorer from the following link.

http://bigdata.red-gate.com/

http://hadoop.drawbackz.com/stack/12821/how-to-transfer-files-from-windows-to-hortonworks-sandbox.html

http://bigdata.red-gate.com/connecting-to-sandboxes.html