Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
What Lipstick can reveal about your Hadoop pipeline
March 10, 2014
It would be a strong statement to say that lipstick can change your life, but when it comes to supporting your data analysts who use
Apache Pig
, we think that
Netflix Lipstick
can make a big difference.
For some of us, when we see sights like this:
we get a little rush of adrenaline and know we are in the zone to debug, analyze, and optimize.
But for most, a little graphical visualization is in order. Lipstick makes life easier for all of us to understand our Pig data flows, recognize what's inefficient, and fix what is just plain incorrect.
Get a high-level look at the sequence of Hadoop jobs executing as a result of your Pig script. Watch the flow of data in real-time:
Pop-up a sampling of the output from one stage of the pipeline:
In doing so, data analysts are able to quickly observe mistakes and inefficiencies in their Pig jobs. A common observation is that data rows or columns are filtered much later than needed. Eliminating those data elements earlier produces more efficient Pig jobs. This reduces time to completion as well as cost.
I first saw Lipstick at a
Netflix OSS Meetup
and thought it was a great tool to increase data analyst and software engineering productivity.
If you are running Pig jobs on Google Compute Engine, we've got instructions to help you run
Netflix Lipstick on Google Compute Engine
. If you are not yet using Hadoop on Google Compute Engine, we have
resources to help you get started
.
-Posted by Matt Bookman, Solutions Architect
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment