May 26, 2015

Hive: Insert OR Load data

1. Load data from another table
insert overwrite table batting
SELECT
regexp_extract(col_value,'^(?:([^,]*)\,?){1}',1) player_id,
regexp_extract(col_value,'^(?:([^,]*)\,?){2}',1) year,
regexp_extract(col_value,'^(?:([^,]*)\,?){9}',1) run
FROM temp_batting;
2. Load data from File
LOAD DATA INPATH '/user/hue/Batting.csv' OVERWRITE INTO TABLE temp_batting

May 25, 2015

HIVE: Create Table

CREATE TABLE TABLE_NAME (COL1 STRING, COL2 INT);

create table Batting
(PlayerId STRING, Year INT,Runs INT);

May 24, 2015

Hive: Select Statements

1. SELECT * FROM Table_Name;
2. SELECT * FROM Table_Name WHERE Column_Name =’’;
3. SELECT COUNT(*) FROM TABLE_Name;
4. SELECT COL_1, MAX(COL_2) FROM Table_Name
GROUP BY COL_1
5. SELECT a.* from Table_Name a
Join (select * from table_name ) b
On (a.col_1 = b.col_1)

May 23, 2015

Hive: Hive Commands

How to invoke Hive

hive

How to exit from Hive

quit;

List all databases present in Hive

show databases;

Execute commands in a database

use database_name;

List all tables present in a database

show tables

How to view table structure

describe table_name;

May 22, 2015

SQOOP: Jobs

How to get help of SQOOP JOB command

sqoop help job

How to view saved SQOOP jobs

sqoop job --list

How to view content of saved SQOOP job

sqoop job --show job_name

How to delete a saved SQOOP Job

sqoop job --delete job_name

May 21, 2015

SQOOP: EXPORT data from HDFS to MYSQL

How to export data from HDFS (Hadoop) to MYSQL

The database and table must present. The HDFS file format is in tab delimiter in the following command.

sqoop export --connect jdbc:mysql://sandbox.hortonworks.com/custrm --username hive --password hive --table customer --fields-terminated-by '\t' --lines-terminated-by '\n' --export-dir /user/root/customer/ -m 1

May 15, 2015

SQOOP: Import data from MYSQL database to HDFS and HIVE

To Get MYSQL Server hostname and port number
Use the following commands inside MySQL

use information_schema;
select * from GLOBAL_VARIABLES where variable_name like 'hostname';
select * from GLOBAL_VARIABLES where variable_name like 'port';

Display all the database names present in MYSQL Server

sqoop list-databases --connect jdbc:mysql://localhost –username=hive –password=hive

IMPORT data from MYSQL TABLE to HIVE TABLE

sqoop import --connect jdbc:mysql://localhost/database_name --username=hive --password=hive --table table_name --hive-import

database_name -- Replace this with MYSQL database name
table_name -- Replace this with table name

IMPORT data from MYSQL TABLE with no primary key to HIVE TABLE

sqoop import --connect
jdbc:mysql://localhost/database_name --username=hive --password=hive --table table_name
--hive-import -m 1

database_name -- Replace with MYSQL Database
table_name -- Replace with table name in MYSQL

-m 1 -- allocates one mapper.
–split-by column_name
you can use either -m option or --split-by option

To import data from MYSQL to an existing folder in HDFS

sqoop import --connect jdbc:mysql://localhost/Database_Name --username=hive --password=hive --table table_name --append

database_name -- Replace with MYSQL Database
table_name -- Replace with table name in MYSQL
--append option to write files in the existing folder.

IMPORT data using MYSQL Query to HDFS

sqoop import --connect jdbc:mysql://localhost/Database_Name --username=hive --password=hive -e’select * from employee where $CONDITIONS’ --target-dir /user/root/database/tablename

database_name -- Replace with MYSQL Database

--target-dir option path starts from the home folder.

May 10, 2015

Hadoop: Setting up single node Cluster on Linux desktop

Following are the helpful links for setting up Hadoop on Linux

Successfully setup the Hadoop cluster on LINUX desktop.

1. http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
http://qursaan-howto.blogspot.com/2014/12/how-to-install-hadoop-as-single-node-on.html

2. Editing read only file in VI editor
http://www.geekyboy.com/archives/629

command to save the read only files

:w !sudo tee %
O

3. VI editor commands for editing configuration files
http://www.cs.rit.edu/~cslab/vi.html
http://www.cs.colostate.edu/helpdocs/vi.html

January 20, 2015

Hadoop: Install Rstudio Server on Horton Works Hadoop Sandbox

Steps to Install RSTUDIO in HortonWorks Sandbox

Login to HortonWorks sandbox with root user
Install RStudio Server
Create a different user (sandbox)
Add port forward in VirtualBox
Restart Hortonwors Hadoop Sandbox
Login to Sandbox other than root user (Use user created in step 3)
Open the RStudio in web browser.
Open Google Chrome and type htt127.0.0.1:8787

Login to HortonWorks Sandbox

· Start the Sandbox

· Click ALT+F5

· Login with user root and pwd hadoop

INSTALL RSTUDIO SERVER

Run the following commands in the sandbox console

wget http://download2.rstudio.org/rstudio-server-0.97.332-x86_64.rpm

sudo yum install --nogpgcheck rstudio-server-0.97.332-x86_64.rpm

CREATE ANOTHER USER
useradd sandbox
passwd sandbox

This is because you can not use root to log into RStudio Server. Here I use sandbox as username, but you can choose whatever you want.

ADDING PORT FORWARD IN VIRTUALBOX

1. Go To Devices à Network à Network Settings

2. Click on Advanced à Port Forwarding (If Advanced is expanded you can see Port Forwarding)

3. Click on Add button

4. Add the RStudio port forwarding entries

5. You can browse the RSTUDIO from the browser using http://127.0.0.1:8787

6. You need to use Google Chrome browser

January 19, 2015

Hadoop: Adding PORT forward to access Rstudio server from Hortonworks Hadoop Sandbox in VirtualBox

After installing RStudio Server on Hortonworks Hadoop Sandbox, to access RStudio from the browser (Google Chrome), need to set up port forwarding in VirtualBox.

Use the following steps for setting up port forwarding.

ADDING PORT FORWARD IN VIRTUALBOX

Go To Devices à Network à Network Settings
Click on Advanced à Port Forwarding (If Advanced is expanded you can see Port Forwarding)
Click on Add button
Add the RStudio port forwarding entries
You can browse the RSTUDIO from the browser using http://127.0.0.1:8787
You need to use Google Chrome browser.

Network Settings

Port Forwarding

Adding Port for forwarding

January 18, 2015

Hadoop: Exploring Hadoop Sandbox files from Windows System

Problem: Members who are working on Windows OS with Hadoop sandbox or Hadoop not able to view or move the files from Windows OS to Hadoop Sandbox.

Solution:
1.You can use HUE interface to upload files from windows to Hadoop.
2. You can install HDFS Explorer on windows and explorer the files. This will help you to identify where the files are present in Hadoop sandbox.
3. you can use the following Hadoop commands
Hadoop fs -ls /<<folderpath>>

Install HDFS Explorer from the following link.

http://bigdata.red-gate.com/

http://hadoop.drawbackz.com/stack/12821/how-to-transfer-files-from-windows-to-hortonworks-sandbox.html

http://bigdata.red-gate.com/connecting-to-sandboxes.html

AskKrishna

Search This Blog