Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
BigQuery in Practice - Loading Data Sets that are Terabytes and Beyond
January 31, 2014
We all know the story of David and Goliath. But did you know that King Saul prepared David for battle by fully arming him? He put a coat of armor and a bronze helmet on him, and gave David his sword. David tried walking around in them but they didn't feel right to him. In the end, he decided to carry five small stones and a sling instead. These were the tools that he used to fight off lions as a shepherd boy. We know what the outcome was. David showed us that picking the right tools and using them well is one of the keys to success.
Let's suppose you are tasked to start a Big Data project. You decide to use
Google BigQuery
because:
Its hosting model allows you to quickly run your data analysis without having to set up a costly computing infrastructure.
The interactive speed allows your analysts to quickly validate hypothesis about their insights.
To get started though, your Goliath is to load multi-terabytes of data into BigQuery. The technical article,
BigQuery in practice - Loading Data Sets that are Terabytes and Beyond
, is intended for IT Professionals and Data Architects who are planning to deploy large data sets to Google BigQuery. When dealing with multi-terabytes to petabytes of data, managing the processing of data such as uploading, failure recovery, cost and quota management becomes paramount.
Just as David showed us the importance of using the right tools effectively, the paper presents various options and considerations to help you to decide on the optimal solution. It follows the common ingestion workflow as depicted in the following diagram and discusses the tools that you can use during each stage - from uploading the data to the Google Cloud Storage, running your Extract Transform and Load (ETL) pipelines, to loading the data into BigQuery.
Scenarios for data ingestion into BigQuery
When dealing with large data sets the correct implementation can mean a savings of hours or days, while an improper design may mean weeks of re-work. David was so successful that King Saul gave him a high rank in the army. Similarly, we are here to help you use Google Cloud Platform successfully so your Big Data project will achieve the same level of success.
- Posted by Wally Yau, Cloud Solutions Architect
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment