Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
We think Germany will win. But don’t take our word for it...
July 11, 2014
We’ve had a great time giving you our predictions for the World Cup (check out our post before the
quarter-finals
and
semi-finals
). So far, we’ve gotten 13 of 14 games correct. But this isn't about us picking winners in World Cup soccer - it’s about what you can do with Google Cloud Platform. Now, we are open-sourcing our prediction model and packaging it up so you can do your own analysis and predictions.
We used
Google Cloud Dataflow
to ingest raw, touch-by-touch gameplay data from
Opta
for thousands of soccer matches. This data goes back to the 2006 World Cup, three years of English Barclays Premier League, two seasons of Spanish La Liga, and two seasons of U.S. MLS. We then polished the raw data into predictive statistics using
Google BigQuery
.
You can see BigQuery engineer Jordan Tigani (
+JordanTigani
) and developer advocate Felipe Hoffa (
@felipehoffa
) talk about how we did it in
this video from Google I/O
.
Our prediction for the final
It’s a narrow call, but Germany has the edge: our model gives them a 55% chance of defeating Argentina due to a number of factors. Thus far in the tournament, they’ve had better passing in the attacking half of their field, a higher number of shots (64 vs. 61) and a higher number of goals scored (17 vs. 8).
But, 55% is only a small edge. And, although we've been trumpeting our 13 of 14 record, picking winners isn't exactly the same as predicting outcomes. If you'd asked us which scenario was more likely, a 7 to 1 win for Germany against Brazil or a 0 to 1 defeat of Germany by Brazil,
we wouldn't have gotten that one quite right
.
(Oh, and we think Brazil has a tiny advantage in the third place game. They may have had a disappointing defeat on Tuesday, but the numbers still look good.)
But don’t take our word for it...
Now it’s your turn to take a stab at predicting. We have provided an
IPython notebook
that shows exactly how we built our model and used it to predict matches. We had to aggregate the data that we used, so you can't compute additional statistics from the raw data. However, for the real data geeks, you could try to see how well neural networks can predict the same data or try advanced techniques like principal components analysis. Alternatively, you can try adding your own features like player salaries or team travel distance. We've only scratched the surface, and there are lots of other approaches you can take.
You might also try simulating how the USA would have done if they had beat Belgium. Or how Germany in 2014 would fare against the unstoppable Spanish team of 2010. Or you could figure out whether the USA team is getting better by simulating the 2006 team against the 2010 and 2014 teams.
Here’s how you can do it
We’ve put everything on GitHub
. You’ll find the
IPython notebook
containing all of the code (using pandas and statsmodels) to build the same machine learning models that we've used to predict the games so far. We've packaged it all up in a Docker container so that you can run your own
Google Compute Engine
instance to crunch the data. For the most up-to-date step-by-step instructions, check out the
readme on GitHub
.
-Posted by Benjamin Bechtolsheim, Product Marketing Manager
No comments :
Post a Comment
Free Trial
Labels
Android
Announcement
api
app engine
Atmosphere Live
bigquery
BigTable
CDN
Cloud Console
Cloud Dataflow
Cloud Datastore
cloud endpoints
Cloud Pub/Sub
Cloud SDK
cloud sql
cloud storage
Cloudera
Compute
Compute Engine
container cluster
customer
Dev Tools
developer tools
developer-insights
Developers
Developers Console
devfests
Disaster Recovery
Encryption Keys
ESG
Event
events
GA
Go Client
Google App Engine
Google Apps
Google BigQuery
Google Cloud Deployment Manager
Google Cloud Networking
Google Cloud Platform
Google Cloud Storage
Google Compute Engine
Google Container Engine
gRPC
hadoop
Hardware
Helium
how to
IO2013
iOS
Kubernetes
Levyx
Local SSD
mapreduce
Media
Nearline
networking
open source
PaaS Solution
Partner
Pricing
Research
round-up
Server
Siggraph
solutions
Startup
Tableau
TCO
Technical
Windows
Wowza
Zync
Archive
2015
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud
No comments :
Post a Comment