igamingnext photo
In his first guest article for NEXT.io, Astral Forest founder and CEO Stanisław Szostak explores how predictive analytics can best be used for the prevention of player churn.


It is no secret to any iGaming operator: acquiring your new players is one thing, but making them stay is quite another.

Machine learning algorithms can come to the rescue for allowing the early detection of players who are likely to quit. This, in turn, allows operators to take preventive measures and avoid losing players. 

Problem statement 

When discussing marketing activities with iGaming operators, we usually navigate around campaigns and promotions aimed at acquisition, retention, or reactivation. Acquisition is pretty straightforward and easily measurable: you need to generate more and more new registrations and new depositors.  

It starts to become slightly more subtle when we discuss retention and reactivation. This is also where you would usually find much room for improvement in terms of building customer loyalty and retaining players within the operator’s brand. 

Typically, operators analyse the churn of their players by verifying how many players are still active after a given period since their first deposit. This could be three, six, 12 or 18 months.

Based on that data, you can expect to identify that, for example, 30% of your players are gone within the first quarter, the next 20% within the first six months, and the remaining half slowly quits until you reach a certain stable point of loyal players.

Hence, the question could be posed as follows: “I may be getting 3,000 new players every month, but why at the end of the day do I have only 1,400 active players in total?”

Applying ML to solve the problem 

Machine learning (ML), in simple words, is a subset of Data Science methods that leverage data in order to build predictions based off a predictive model. Typically, when developing the models, we follow the lifecycle depicted below, composed of eight crucial steps. 

  1. Task definition. This boils down to making the problem statement, that is to say: “Whose problem are we solving?” And “what is the issue we would like to tackle?”
  1. Data exploration. Once you get your hands on the data, it’s time for some exploration. This includes data profiling (identifying basic statistics for each data point) and performing visual exploration via tools such as Power BI.
  1. Data cleansing. The data you have is never suitable for building a robust ML model right away. It will require at least basic standardisation, feature extraction, encoding, or other transformations.  
  1. Model development. Once you have your set of features, it’s time to play with modelling. This activity includes various experiments and testing of different hypotheses, methods, scope of data, feature selection, sampling, and so on.
  1. Model validation. Before you show the newly created model to anyone, you need to make sure that the results are robust. A series of tests must follow, so you can validate the model and prove that, for example, your model is not too sensitive to data changes and that it accurately forecasts based on a test dataset. 
  1. Operationalisation. The sixth step requires a data engineering effort in order to automate your model and scale it to handle the full scope of data. Usually, your first model would be built in a separate environment which allows quick testing. This phase is all about applying your model at scale.
  1. Deployment. When all the previous steps are completed, you are ready to go! Time to deploy the model to a production environment and let the marketing team fully benefit from the predictions it brings.
  1. Monitoring. Finally, once the model is live, it is not going to remain valid forever. Therefore, you need to monitor its behaviour and potentially retrain it from time to time, especially when observing data drifts.  

Data challenges 

Machine learning may seem like a black box. However, it is not a magical tool where you can simply upload your data and great results appear by themselves all of a sudden. In order for your model to be accurate and robust, data science teams need to tackle very interesting data challenges. I will set out a few below.

The first challenge will include defining when your player was lost. We are not in the insurance industry, where the person simply does not extend their contract. Neither are we in video streaming, where you can cancel your subscription. In such scenarios, it’s very easy to tell on which date a customer left.

However, for iGaming operators, the great majority of people do not close the account on their own initiative. Most will be dormant players: their account is active, but they have not logged in for months. Now, the task is to determine exactly at which point in time they decided to quit. Was it, say, two weeks after their last bet? Three weeks since last connection? The answer is in the data. 

Another challenge is about duplicated and fake accounts. Whenever a player gets blocked, it happens fairly often that they will create another account within minutes. Such players will get banned soon, but this kind of behaviour spoils the quality of your data, and you will need to handle such cases during data cleansing. Multiple tools and techniques exist in order to serve the purpose of deduplication, such as master data platforms, fuzzy matching, etc. 

Added to that, you will also have players referred to as “bonus hunters”, who jump between platforms in order to get the most bonuses but without any intention to become a loyal, returning player. Also, you need to account for those who take a break, have hit their limits, or been banned due to irregularities.  


One of our clients wanted to evaluate how to retain more players. They were very strong in acquisition yet failed to bring in repeatable actions in order to keep new players within their platform. 

Since our works were part of a broader engagement, data exploration and cleansing was greatly facilitated thanks to a well-organised and tidy data lake. Model development and validation were by far the most demanding steps, due to complexities around feature selection and calibration of the model.  

Also, making sure that the model is persistent and maintains a high scoring accuracy despite changes in the data required several adaptations to the modelling approach.

Ultimately, we built a model that scored existing players based on their early activity. This means that after two weeks since registration, we allowed the client to accurately identify which players were the most likely to churn and who was there to stay. This, in turn, helped the marketing team to target specifically the players at risk, and increase the LTV of those cohorts by 20%. 

As per the diagram above, the lift curve indicates that taking the top 20% players of our list based on their score was 3.5x more effective than referring to a random distribution. 


Machine learning, part of the data and analytics sphere, can be applied to resolve multiple business challenges in the iGaming world. Churn is just one of the first examples to address whenever you are embarking on your data science journey. 

Remember one thing: always build your solid data foundation first and prioritise simple analytics before launching sophisticated initiatives. Otherwise, you will end up spending most of your data team’s time on data cleaning instead of the modelling, which should be addressed by a data warehouse or data lake beforehand.  

Stanisław Szostak (Stan) is the founder and CEO of Astral Forest, a consulting agency specialising in analytics, data & governance for the iGaming industry.

Stan leads the iGaming practice within Astral Forest, working since inception with online casino operators and other industry actors across Switzerland, France, Poland and UK. He specialises in leveraging data to drive business growth, marketing effectiveness, and ensure compliance and responsible gaming standards.

Similar posts