Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Easier, faster and lower cost Big Data processing with the Google Cloud Storage Connector for Hadoop
January 14, 2014
Google Compute Engine VMs provide a
fast
and reliable way to run
Apache Hadoop
. Today, we’re making it easier to run
Hadoop on Google Cloud Platform
with the Preview release of the
Google Cloud Storage connector for Hadoop
that lets you focus on your data processing logic instead of on managing a cluster and file system.
Diagram of Hadoop on Google Cloud Platform. HDFS and the NameNode are optional when storing data in Google Cloud Storage
In the 10 years since we first introduced
Google File System (GFS)
— the basis for Hadoop Distributed File System (HDFS) — Google has continued to improve our storage system for large data processing. The latest iteration is
Colossus
.
Today’s launch delivers exactly that. Using a simple connector library, Hadoop can now run directly against
Google Cloud Storage
— an object store built on Colossus. That means you benefit from Google’s expertise in large data processing.
Here are a few other benefits of running Hadoop with Google Cloud Storage:
Compatibility:
The Google Cloud Storage connector for Hadoop code-compatible with Hadoop. Just change the URL to point to your data.
Quick startup:
Your data is ready to process. You don’t have to wait for extra minutes or more while your data is copied over to HDFS and the NameNode comes out of safe mode, and you don’t have to pay for the VM time for data copying either.
Greater availability and scalability:
Google Cloud Storage is globally replicated and has higher availability than HDFS because it’s independent of the compute nodes and the NameNode. If the VMs are turned down (or, cloud forbid, crash) your data lives on.
Lower costs:
Save on storage and compute: storage, because there’s no need to maintain two copies of your data, one for backups and one for running Hadoop; compute, because you don’t need to keep VMs going just to serve data. And with per-minute billing, you can run Hadoop jobs faster on more cores and know your costs aren’t getting rounded up to a whole hour.
No storage management overhead:
Whereas HDFS requires routine maintenance -- like file system checks, rebalancing, upgrades, rollbacks and NameNode restarts -- Google Cloud Storage just works. Your data is safe and consistent with no extra effort.
Interoperability:
By keeping your data in Google Cloud Storage, you can benefit from all of the other Google services that already play nicely together.
Performance:
Google’s infrastructure delivers high performance from Google Cloud Storage that’s comparable to HDFS -- without the overhead and maintenance.
To see the benefits for yourself, give Hadoop on Google Cloud Platform a try by following the
simple tutorial
.
We would love to hear your
feedback and ideas
on how to make Hadoop and MapReduce run even better on Google Cloud Platform.
-Posted by Jonathan Bingham, Product Manager
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment