Topics: Personalization, Recommendation Systems, Digital Marketing, Marketing Analytics
Methodologies: Bayesian Econometrics, Machine Learning, Multi-Armed Bandits
Abstract: Real-time personalization engines help find the optimal offer to provide to specific customers. They thereby enable effective customization in E-commerce. Yet, the development of such engines is not trivial. It remains challenging to optimize an offer strategy in real time, especially in a dynamic environment where the set of available offers varies over time. The complexity is further enhanced when trying to utilize situational information next to customer characteristics. We provide an easy-to-implement personalization engine to quickly learn, and serve, optimal context-dependent offers in a situation where the offer set may change over time. We formalize this personalization problem in the multi-armed bandit framework, and propose a new contextual bandit algorithm boosted by the particle filtering estimation technique. Our method allows firms to flexibly introduce new personalized offers, calibrate their impact using prior knowledge from historical data and rapidly update these prior beliefs as new information arrives. With an application to news-article recommendation, we show that, relative to state-of-the-art competing methods, the proposed method improves lift in click-through-rate and is computationally efficient.
Abstract: Personalization strategies often build on a large set of customer-specific and/or contextual variables to optimally select among many available marketing actions. Contextual multi-armed bandit algorithms can help marketers to adaptively select optimal customized actions. However, conventional contextual bandit algorithms usually consider only a small set of variables, while in real-world problems there are many potentially relevant variables. Exploration is beneficial to identify relevant variables, yet, when faced with a surplus of variables, examining the impacts of all variables can lead to over-exploration and thus inefficiency. To address this challenge, it becomes crucial to leverage an adaptive modeling approach to support the exploration process and to effectively resolve the uncertainty in variable selection. We propose a new approach using variable selection techniques to learn both the optimal model specification and the action-selection strategy. We enhance model interpretability via feature decomposition, to effectively identify both irrelevant and relevant factors. Among relevant factors, we discern between two types: common factors, which have the same influence on consumer behavior for all actions, and hence do not impact the personalized policy, and action-specific factors, whose impact differs across the possible actions and hence do affect the policy. Our method allows firms to run cost-efficient and interpretable bandit algorithms with high-dimensional contextual data.
Optimal Targeting with Multi-Faceted and Time-Varying Rewards
Data analysis in progress
Abstract: In a changing world, marketers need to continuously monitor the effectiveness of their marketing campaigns. A campaign that may work well at first, may have adverse effects later on due to factors such as changes in competitors’ strategies or due to seasonality. This is even more important in personalized strategies that exploit relations between customer characteristics and the potential outcomes. Shifts in these relations affect the optimal personalized actions and their profitability. We document such time-varying effects of personalized promotions in the context of telecommunication marketing campaigns over a span of 5 months and next develop a personalization policy to accommodate such non-stationary reward distributions. A second innovation is that we integrate potential unintended side effects into the objective function of the policy optimization. A personalized policy optimized for one objective may potentially harm another objective. For instance, we find that a personalized policy designed for incentivizing contract renewal shows a 3.5% uplift in renewal success, but also leads to a 0.06% increase in churn rate, compared to a control policy. Our integrated approach optimally trades off performance across both dimensions.