Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Big data is easier than ever with Google Cloud Dataflow
April 16, 2015
Big data applications can provide extremely valuable insights, but extracting that value often demands high overhead – including significant deployment, tuning, and operational effort – diverse systems, and programming models. As a result, work other than the actual programming and data analysis dominates the time needed to build and maintain a big data application. The industry has come to accept these pains and inefficiencies as an unavoidable cost of doing business. We believe you deserve better.
In Google’s systems infrastructure team, we’ve been tackling challenging big data problems for more than a decade and are well aware of the difference that simple yet powerful data processing tools make. We have translated our experience from
MapReduce
,
FlumeJava
, and
MillWheel
into a single product,
Google Cloud Dataflow
. It's designed to reduce operational overhead and make programming and data analysis your only job, whether you’re a data scientist, data analyst or data-centric software developer. Along with other
Google Cloud Platform big data services
, Cloud Dataflow embodies the kind of highly productive and fully managed services designed to use big data,
the cloud way
.
Today we’re pleased to make
Google Cloud Dataflow
available in beta
, for use by anyone on
Google Cloud Platform
. With Cloud Dataflow, you can:
Merge your batch and stream processing pipelines thanks to a unified and convenient programming model. The model and the underlying managed service let you
easily express data processing pipelines
, make powerful decisions, obtain insights and eliminate the switching cost between batch and continuous stream processing.
Finely tune the desired correctness model for your data processing needs through
powerful API primitives
for handling late arriving data. You can process data based on event time as well as clock time and gracefully deal with upstream data latency when processing data from unbounded sources.
Leverage a fully-managed service, complete with dynamically adaptive auto-scaling and auto-tuning, that offers attractive performance out of the box. Whether you’re a developer or systems operator, you
no longer need to invest time
worrying about resource provisioning or attempting to optimize resource usage. Automation, a fully managed service, and the programming model work together to significantly lower both CAPEX and OPEX.
Enjoy
reduced complexity of managing and debugging
highly parallelized
processes
with a simplified monitoring interface that’s logically mapped to your processing logic as opposed to how your code’s mapped to the underlying execution plane.
Benefit from
integrated processing
of data across the Google Cloud Platform with optimized support for services such as
Google Cloud Storage
,
Google Cloud Datastore
,
Google Cloud Pub/Sub
, and
Google BigQuery
.
We’re also working with major open source contributors on maturing the Cloud Dataflow ecosystem. For example, we recently announced collaborations with
Data Artisans
for runtime
support for Apache Flink
and with
Cloudera
for runtime support for
Apache Spark
.
We’d like to thank our alpha users for their numerous suggestions, reports and support along this journey. Their input has certainly made Cloud Dataflow a better product. Now, during beta, everyone can use Cloud Dataflow and we continue to welcome questions and feedback on
Stack Overflow
. We hope that you’ll give
Google Cloud Dataflow
a try and enjoy big data made easy.
-Posted by Grzegorz Czajkowski, Director of Engineering
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment