Zynga is a publisher for mobile gaming with studios located across the globe. We have a diverse portfolio of games and our data scientists need to be able to provide actionable insights across diverse event taxonomies. One of the challenges that we face is that we want to build data products that scale across our catalog of titles, while minimizing the amount of domain knowledge needed in order to build predictive models. An approach we use is automated feature engineering to solve a variety of use cases ranging from anomaly detection to propensity modeling to clustering.
This session will discuss how Zynga is leveraging recent machine learning libraries to take data products from prototype to production that scale across all of our titles. We are utilizing Python libraries to translate our data sets from narrow and deep to shallow and wide representations that empowers our data scientists to apply supervised and unsupervised learning methods. We’ll discuss how we’ve scaled up these libraries to work with massive datasets on PySpark using Pandas UDFs, and made feature generation an accessible tool for our analytics organization. The result of this infrastructure is that our data scientists are spending less time on developing models, and instead are focusing on using predictive models to improve our games and live services.
We’ll provide a deep dive into our modeling infrastructure and show how we translate our tracking data into a representation that powers segmentation, propensity, and anomaly modeling. Along the way, we’ll call out pitfalls that we encountered, issues we faced when scaling up these methods, and some takeaways that your organization can apply when leveraging automated feature engineering.
Distinguished Data Scientist
12:10PM - Day 2