Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Leverage Google Cloud Platform features for your Hadoop deployments
February 27, 2014
Accenture recently
demonstrated
Google Compute Engine as an excellent place to run
Hadoop
for performance and price. They demonstrated both familiar HDFS-based deployments as well as performance and cost advantages of using the Google Cloud Storage
connector
.
If you use
Apache Hive
with Hadoop, we want to show you another great feature of the Google Cloud Platform that will save you time and money. The default Hive installation uses a local single-session database as the Hive Metastore to store Hive table and column names, column types, and so on. This local database is fine for single-user/single-session usage, but to use Hive with multiple concurrent connections, you need to install and maintain a relational database such as MySQL.
Google Cloud SQL
is managed MySQL in the cloud, which
now supports the MySQL wire protocol
, so you can connect to Google Cloud SQL using common tools from anywhere. Why not use it to run your MySQL database in the cloud too, and save yourself the time and cost of having to install and maintain MySQL? Let Google take care of it for you.
We provide a
sample application
to help you get started running Apache Hive and
Pig
on Google Compute Engine. The
README
includes detailed instructions for using Cloud SQL for the Hive Metastore.
The rest of this blog post hits the high points for what you need to do to use Google Cloud SQL for the Hive Metastore.
Create a Google Cloud SQL instance
In the
Google Cloud Console
, create a
Google Cloud SQL instance
. Make sure to:
Choose the Compute Engine zone of the Compute Engine instance running Hive as the preferred location for the Cloud SQL instance.
Select "Assign an IP Address" to the Cloud SQL instance.
Add the external IP address of the Compute Engine instance as an Authorized IP Address. Be sure to append "/32" to the IP address entered in this field.
Define a root password for MySQL (it's good practice).
Install MySQL client and MySQL JDBC Driver
Connect to the Google Compute Engine instance that is running Hive and install both the MySQL client and the MySQL JDBC driver. Add the JDBC driver JAR file to Hive's CLASSPATH. The simplest method is to copy the file to the hive/lib/directory.
Create the hivemeta database and the user to access it
Use the mysql command line tool to:
Connect to the Google Cloud SQL instance
Create a database called hivemeta (Hive requires the database to use latin1 character encoding, which you'll need to specify explicitly).
Create the database user (for example hdpuser).
Issue grants on the hivemeta database to the database user.
Configure Hive to use Cloud SQL
Update your hive/conf/hive-site.xml file to connect to the Google Cloud SQL database. Add the following configuration, replacing cloud-sql-ip-addr with the assigned IP address of the Cloud SQL instance, and replacing password with the hdpuser database user password:
Hive is now configured to use the Google Cloud SQL database as its metastore; there's no need for you to maintain a MySQL installation.
Other software that uses MySQL can similarly use Google Cloud SQL. Since Cloud SQL is MySQL, you won't need to migrate code - just establish connectivity and you are off and running.
-Posted by Matt Bookman, Cloud Solutions Architect
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment