Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Announcing Google BigQuery and Datastore Connectors for Hadoop
April 16, 2014
Today, we are making it easier for you to run Hadoop jobs directly against your data in
Google BigQuery
and
Google Cloud Datastore
with the Preview release of
Google BigQuery connector
and
Google Cloud Datastore connector
for Hadoop. The Google BigQuery and Google Cloud Datastore connectors implement Hadoop’s
InputFormat
and
OutputFormat
interfaces for accessing data. These two connectors complement the existing
Google Cloud Storage connector for Hadoop
, which implements the
Hadoop Distributed File System
interface for accessing data in Google Cloud Storage.
The connectors can be
automatically installed and configured
when deploying your Hadoop cluster using bdutil simply by including the extra “env” files:
./bdutil deploy bigquery_env.sh
./bdutil deploy datastore_env.sh
./bdutil deploy bigquery_env.sh datastore_env.sh
Diagram of Hadoop on Google Cloud Platform
These three connectors allow you to directly access data stored in Google Cloud Platform’s storage services from Hadoop and other Big Data open source software that use Hadoop's IO abstractions. As a result, your valuable data is available simultaneously to multiple Big Data clusters and other services, without duplications. This should dramatically simplify the operational model for your Big Data processing on Google Cloud Platform.
Here are some word-count MapReduce code samples to get you started:
Using the BigQuery connector
Using the Datastore connector
Using the Datastore connector for reading data and using the BigQuery connector for publishing results
As always, we would love to hear your feedback and ideas on improving these connectors and making Hadoop run better on Google Cloud Platform.
-Posted by Pratul Dublish, Product Manager
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment