Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
SwiftIQ leverages Google Cloud Connectors to unleash Hadoop ecosystem
April 3, 2014
Our guest post today comes from Massimo Ilario, co-founder and principal engineer of
SwiftIQ
, a cloud-based API infrastructure to facilitate data accessibility and adaptive machine learning predictions.
At SwiftIQ, we unify and analyze vast amounts of disparate data and apply scalable algorithms to extract insights for our customers, enabling smarter, real-time decisions. For instance, we help lots of supermarkets collect and analyze customer transaction data to predict in-store shopper engagement, which helps retailers plan floor layouts, optimize promotions and recommend product upsells to their customers. Historically, most supermarkets could not even store detailed in-store basket information because of the size of this data.
Swift Predictions
is our machine learning environment that makes predictions based on vast amounts of data. As we thought about scaling Swift Predictions, we knew we needed substantial infrastructure with reliable file storage, long-running processes and the ability to handle unpredictable scale. Google Cloud Platform met all our requirements for fast, powerful, cost-efficient cloud infrastructure. Starting with
Google App Engine
for development, we also use
Google Compute Engine
,
Google Cloud Storage
,
Google BigQuery
and
Google Cloud Datastore
.
One of the more data intensive algorithms, frequent pattern mining (FPM), is intended to analyze order combinations and return the top occurrences. For a supermarket with 50,000 items typically available in store, the FPM algorithm commonly creates a million rows of unique order combinations. Once processed, these results span tens of millions of records and are stored in BigQuery, which allows for rapid retrieval. Our infrastructure can then scale these models across dozens or hundreds of stores using Cloud Platform.
Swift Predictions runs on Apache Hadoop, where we were writing and tuning MapReduce jobs to shuffle data between App Engine and Compute Engine. Google’s recent introduction of
Google Cloud Storage Connector for Hadoop
has removed this obstacle. We can now freely move data between our Web-facing project and our long-running, backend processes. As developers, this higher level of interoperability means we have the flexibility to pick the best tool to house our results from Hadoop: Cloud Storage or BigQuery.
One of our product features, a data mining algorithm for analyzing shopper buying patterns, creates a considerable amount of input data in a sparse matrix representation. Our input data can only be represented as a split of large files into the Hadoop Distributed File System (HDFS), and our output data is best represented as results within BigQuery. "Cloud Storage Connector for Hadoop has proven to make this workflow a more manageable process since our Hadoop workflow communicates with Cloud Storage and BigQuery as if these were HDFS and any other native Hadoop InputFormat and OutputFormat."
App Engine modules are ideal for packaging and isolating functionality in our system. We benefit by dedicating our default module to serve the web application and then develop secondary modules to handle specific tasks that may be longer running or more intensive in certain areas. It has allowed us to decouple much of our source code from a main web application and simplify the maintenance for the long term. Coding changes to isolated modules need not require a more comprehensive QA cycle for a deployment.
Thanks to Cloud Platform, our development team spends a minimal amount of time managing the performance tuning and reliability of our entire machine learning piece, which allows us to focus on unlocking and delivering new insights for our customers.
-Contributed by Massimo Ilario, co-founder and Principal Engineer, SwiftIQ.
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment