Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Input → Transform → Output → Done!
February 18, 2014
We have published a sample App Engine application to help you move your data from one place in the cloud to another, transforming it along the way. The
Data Pipeline applicatio
n includes samples to get you started quickly and produce powerful pipelines right out the gate. It also has a simple API for extending its functionality.
Data Pipeline is a Python application that uses
Google App Engine Pipeline API
to control complex data processing pipelines. Pipelines are built of stages that can be wired together to process large amounts of data, with work going on in parallel. The application comes with several sample stages that use many of the Cloud Platform services. You can easily write new stages to perform custom data processing.
The Data Pipeline app comes with built-in functionality that lets you read data from:
URLs via HTTP
Google Cloud Datastore
Google Cloud Storage
transform it on:
Google App Engine using the
Google App Engine Pipeline API
Google Compute Engine
using
Apache Hadoop
and output it to:
BigQuery
Google Cloud Storage
For example, one of the pre-built dataflows takes a file from a Cloud Storage bucket, transforms it using a MapReduce job on Hadoop running on Compute Engine, and uploads the output file to BigQuery. To kick off the process, simply drop the file into Cloud Storage.
We hope that you will not only use the built-in transformations, but will create custom stages to transform data in whatever way you need. You can customize the pipelines easily by extending the Python API, which is available here on
Github
.
You can also customize the input and output; for example, you could customize the output to write to Google Cloud SQL.
You create and edit pipelines in a JSON configuration file in the application's UI. The app checks that the configuration is syntactically correct and each stage’s preconditions are met. After you save the config file, click the Run button to start the pipeline execution. You'll see the progress of the running pipeline in a new window.
Editing the config file
The source code is checked into
Github
. We invite you to download it and set up your pipelines today.
- Posted by Alex K, Cloud Solutions Engineer
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment