Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Announcing General Availability of Google Cloud Dataflow and Cloud Pub/Sub
August 12, 2015
By the time you are done reading this blog post,
Google Cloud Platform
customers will have processed hundreds of millions of messages and analyzed thousands of terabytes of data utilizing
Cloud Dataflow
,
Cloud Pub/Sub
, and
BigQuery
. These fully-managed services remove the operational burden found in traditional data processing systems. They enable you to build applications on a platform that can scale with the growth of your business and drive down data processing latency, all while processing your data efficiently and reliably.
Every day, customers use Google Cloud Platform to execute business-critical big data processing workloads, including: financial fraud detection,
genomics analysis
, inventory management, click-stream analysis, A/B user interaction testing and cloud-scale ETL.
Today we are removing our “beta” label and making Cloud Dataflow generally available. Cloud Dataflow is specifically designed to remove the complexity of developing separate systems for batch and streaming data sources by providing a unified programming model. Based on more than a decade of Google innovation, including
MapReduce
,
FlumeJava
, and
Millwheel
, Cloud Dataflow is built to free you from the operational overhead related to large scale cluster management and optimization.
Cloud Dataflow provides a unified computation model for batch and streaming processing
With Cloud Dataflow GA you get:
A
fully managed
, fault tolerant, highly available, SLA-backed service for batch and stream processing.
"We are utilizing Cloud Dataflow to overcome elasticity challenges with our current Hadoop cluster. Starting with some basic ETL workflow for BigQuery ingestion, we transitioned into full blown clickstream processing and analysis. This has helped us significantly improve performance of our overall system and reduce cost."
Sudhir Hasbe, Director of Software Engineering,
Zullily.com
“The current iteration of Qubit’s real-time data supply chain was heavily inspired by the ground-breaking stream processing concepts described in Google’s MillWheel paper. Today we are happy to come full circle and build streaming pipelines on top of Cloud Dataflow - which has delivered on the promise of a highly-available and fault-tolerant data processing system with an incredibly powerful and expressive API.”
Jibran Saithi, Lead Architect, Qubit
A comprehensive model for balancing correctness, latency, and cost
when dealing with unordered data at massive scale. These concepts power key elements of the Cloud Dataflow programming model.
"Streaming Google Cloud Dataflow perfectly fits requirements of time series analytics platform at Wix.com, in particular, its scalability, low latency data processing and fault-tolerant computing. Wide range of data collection transformations and grouping operations allow to implement complex stream data processing algorithms."
Gregory Bondar, Ph.D., Sr. Director of Data Services Platform, Wix.com
Great performance. Cloud Dataflow is 2-3x faster and cheaper than Hadoop when evaluating classic MapReduce based pipelines, such as PageRank and WordCount. And with
dynamic work rebalancing
, Cloud Dataflow effectively optimizes resource utilization which provides additional performance gains without requiring manual intervention.
An extensible SDK. We have
expanded our technology partner
, 3rd party connector, and service provider integration efforts including
Tamr
,
Salesforce
,
ClearStory
,
springML
,
Cloudera
,
data Artisans
. We also continue to support alternate runner enablement for Apache Spark and
Apache Flink
.
"We're excited to collaborate with Google Cloud Platform on integrations with Salesforce Wave. The integrations with Google Cloud Dataflow further enable Wave to deliver insights to business users. Businesses can now use vast, diverse datasets like machine-generated data to derive customer insights in near-real-time."
Olivier Pin, VP of Product Management, Wave Analytics, Salesforce.com
"Tamr and Google Cloud Dataflow are simplifying how people access and use crucial data and distributed computing assets in the enterprise. The combination of Cloud Dataflow and Tamr running on Google Cloud Platform enables organizations to connect and enrich their enterprise data at internet scale."
Andy Palmer, co-founder and CEO of Tamr, Inc.
Cloud Dataflow seamlessly integrates with Google Cloud Platform, third party services & data stores
Native Google Cloud Platform integration for Cloud Storage, Cloud Datastore, BigQuery, and Cloud Pub/Sub. You now get full query support for our BigQuery source. Our integration with Cloud Pub/Sub now provides source timestamp processing in addition to arrival time processing. Source timestamps, when combined with flexible Windowing and Triggering primitives, enable developers to produce more accurate windows of data output.
"We are very excited about the productivity benefits offered by Cloud Dataflow and Cloud Pub/Sub. It took half a day to rewrite something that had previously taken over six months to build using Spark"
Paul Clarke, Director of Technology, Ocado
A decade of internal innovation also stands behind today’s general availability of Google Cloud Pub/Sub. Delivering over a trillion messages for our alpha and
beta
customers has helped tune our performance, refine our
v1
API
, and ensure a stable foundation for
Cloud Dataflow’s streaming ingestion
,
Cloud Logging’s streaming export
,
Gmail’s Push API
, and Cloud Platform customers streaming their own production workloads — at rates up to 1 million message operations per second.
Such diverse scenarios demonstrate how Cloud Pub/Sub is designed to deliver real-time and reliable messaging — in one global, managed service that helps you create simpler, more robust, and more flexible applications.
Cloud Pub/Sub connects your services to each other, to other Google APIs, and third parties.
Cloud Pub/Sub can help integrate applications and services reliably, as well as analyze big data streams in real-time. Traditional approaches require separate queueing, notification, and logging systems, each with their own APIs and tradeoffs between durability, availability, and scalability. Cloud Pub/Sub addresses a broad range of scenarios with a single API, a managed service that eliminates those tradeoffs, and remains cost-effective as you grow, with
pricing
as low as 5¢ per million message operations for sustained usage.
General availability is a key milestone, though hardly the end of the road.
We are continuing to innovate with the alpha release of the
gcloud
pubsub
tool and today’s beta release of our new
Identity and Access Management (IAM) APIs
and Permissions Editor in the Google Developers Console.These improvements allow users to control access down to the level of particular operations on specific topics and subscriptions. IAM ACLs make it easier to connect multiple Cloud Platform projects, either within the same organization or to third-party services.
Get Started
We’re looking forward to this next step for Google Cloud Platform as we continue to help developers and businesses everywhere benefit from Google’s technical and operational expertise in big data. Please visit
Cloud Dataflow
and
Cloud Pub/Sub
to learn more and contact us with your feedback, ideas for new connectors, or even new public data feeds we can help you share.
- Posted by Eric Schmidt (not that Eric), PM Cloud Dataflow & Rohit Khare, PM Cloud Pub/Sub
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment