Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Move your big data lightning fast and at a low cost!
May 12, 2015
All applications need at least some data to function. Big data applications like gaming analytics, weather modeling, and video rendering as well as tools such as flume, MapReduce, and database replication are obvious examples of software that process and move large amounts of data. Even a seemingly simple website might have to copy dictionaries, articles, pictures, and all sorts of data across VMs, and that can add up to a lot. Sometimes that data must be accessible through a file system, and traditional tools like secure copy (scp) might not be enough to handle the increasing data sizes.
Big data applications commonly read data from disk, transform it, then use a tool like secure copy (scp) to move it to another VM for further computation. Scp is limited by several factors, from its threading model to encryption hardware in the virtual machines CPU’s, and is eventually limited by the Persistent Disk read and write quota per virtual machine. It can transfer close to 128MBytes/sec (single stream) or 240MBytes/sec (multiple streams).
This is what the current flow looks like:
Diagram: a common data pipeline scenario
In this post we will describe an innovative new way of transferring large amount of data between VMs. Google Compute Engine
Persistent Disks
offer a feature called
Snapshots,
which are point-in-time copies of all data on the disk. While snapshots are commonly used for backups, they can also rapidly be turned into new disks. These new disks can be attached to a different running virtual machine than where they were created, thereby moving the data from the first virtual machine to the second. The process of transferring data using snapshot involves three simple steps:
Create a snapshot from the source disk.
Download the snapshot to a new destination disk.
Attach and mount the destination disk to a virtual machine.
Using Persistent Disk Snapshot you can move data between your virtual machines at speeds upwards of 1024MBytes/sec (8Gbps). That’s an up to 8x speed increase over scp! Below is a graph that shows a comparison of moving data with secure copy and snapshots.
Diagram: Data Transfer comparisons
The huge advantage of the snapshot-based approach stems from the performance of Google Cloud Platform’s Persistent Disk snapshotting process. The following graph shows the time it takes to snapshot Persistent Disks of increasing size, along with the effective throughput (PD-SSD was used in this experiment). The time it takes to do the snapshot is roughly the same up to 500GB (bars in the graph) and steps up at the 1TB mark. Therefore, the effective throughput (i.e., “speed”) of the snapshot process, which is shown as the line in the graph, increases almost linearly.
Google Compute Engine Persistent Disk Snapshot speed is outstanding in the industry. Below is a comparison graph with another cloud provider that also provides snapshots. As you can see, while Google Cloud Platforms upload times remain flat as the size increases, our competitor’s upload time increases as the size increases.
Google Compute Engine tests were performed in us-centra1-f using PD-SSD. Snapshot sizes are: 32GB, 100GB, 200GB, 500GB and 1000GB.
There is a
cost
of 2.6 cents/GB/month for taking a Persistent Disk snapshot, which might seem like a lot on top of the hourly virtual machine price for copying data. However
the actual average cost comes out to about $0.003 per 500GB of data transferred
because the snapshot used for transfer purpose is short lived (under 10min)
and its pricing is prorated based on a granularity of seconds. You can delete the snapshot immediately after the transfer is complete.
That means for less than a penny you can move a terabyte of data at 8x the speed of traditional tools
.
For hands-on practice, you can find more about snapshot commands on our
documentation
, as well as a
previous blog
about how to safely make a Persistent Disk Snapshot. Happy Snapshotting!
-Posted by Stanley Feng, Software Engineer, Cloud Performance Team
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment