Use Case #1: A Glimpse Into The World of Retail (part 1)

No items found.
February 11, 2022
Photo by Jordan Nix on Unsplash

Use Case #1: A Glimpse Into The World of Retail

Part 1: Customer Segmentation

Written by Hajar AIT EL KADI, Koffi Cornelis

The Big Picture

How to create the best customer experience?

Know Your Customer, The Importance of Knowing Your Customer.

Nowadays, companies have more data than they know what to do with. And data are potential.

The more you know your customers, the more you understand their needs, the better you can anticipate them.

The most successful companies are those which know their customers the best, and use that knowledge to provide a high-quality customer experience, thus creating strong customer engagement. Providing a unique individual experience that is tailored to the customer’s preferences and needs is key to building life long customer relationships. Look no further than Spotify and Netflix whose recommendation algorithms are the at the core of their business.

The data are hence indispensable in providing insights into customer behaviour. And what better way to extract those insights than Machine Learning.

In the following article, through an example of a retail dataset, we will attempt to understand consumer behaviour and anticipate it, using a combination of data analysis and machine learning.

We will first start by flirting a little with the customer profiles, getting to know them a bit better, before we take them on a second date and look at their transactions and ask about their history. If all goes well and we build a deeper connection, we’ll be able to label our relationship. Then things will get serious and we’ll be comfortable enough to recommend their next purchases 😉😏.

Literature provide us with a variety of recommendation systems depending on the available data, whether it concerns users, items, or interactions between them:

  • Content-based recommendation, on the one hand, recommends items similar to what the user likes, based on their previous actions or explicit feedback.
  • Collaborative-filtering, on the other hand, relies on similarities of purchase — patterns between users.
  • Knowledge based, which is not covered here, relies on some explicit knowledge about a user (budget, preference for brands etc).
  • Hybrid systems combine two or more recommendation strategies.

In part 2 of this article, we cover two recommendation systems: content-based, and collaborative-filtering. We compare them to two baselines that made the most sense to us.

With a pinch of knowledge, a dash of luck and a whole lot of determination we are diving right in 🚀 (get it ?)!

Our Plan of attack

We will cover the following topics:

For those of you with commitment issues, we provided an estimated read time for each topic.

  1. Project Scope (Not even 1 min)
  2. Data Exploration (5 min)
  3. Customer Segmentation (RFM Analysis) (5min)
  4. Recommendation System Approaches (9 min)

Let’s get this party started!🤩

1. Project Scope

What we want :

  • Profile the customers (age, sex, city)
  • Understand their behaviour (purchases, returns)
  • Group them based on their behaviour (customer segmentation)
  • Recommend their next purchases

What we need :

  • Some data: We use a Kaggle dataset from a retail store. It describes the day-to-day transactions for a set of customers over a period of three years
  • A lot of luck, and a dash of patience

What we use:

  • Python (duh!):

Pandas, Numpy

Plotly, Seaborn, Matplotlib

Scikit-Learn, Implicit

ml_metrics, recmetrics

  • Jupyter Notebook
  • Google Colab
  • A dash of luck, and a whole lot of patience

2. Data Exploration

Henceforth, we will be focusing on the results rather than detailed code. The entirety of which can be found here (Google Colab) and here (GitHub).

Dataset :

We used a kaggle dataset containing three tables: Customers, Transactions and Products.

The dataset details retail transactions spanning from Jan 2011 to Feb 2014. We will look at the tables separately and then together.

a. Customers’ table

This table contains the general customer data, and accounts for 5647 unique customers.

Customer dataframe
  • Customers By Gender:

The customers are uniformly distributed across both genders.

Customers’ gender
  • Customers By Age:

We calculated the age of the customers as of the end of 2014 (the last recorded transactions were back in 2014) based on their date of birth. The average customer is 33 years of age.

The sweet spot, young enough to still be fun and adventurous with their buys but old enough to have a reliable job to finance those impulse purchases 😌.

Customers age in 2014
  • Customers By City:

There are 10 city codes in the dataset. And the customers are uniformly distributed across the 10 cities.

Customers per city

b. Products’ table

The dataset describes 23 products organised across 6 categories and 18 subcategories.

Products dataframe and description

c. Transactions’ table

This table details about 23k transactions that took place between Jan 2011 and Feb 2014. We consider the transactions with negative total amounts to be returns that the customers made following their purchases.

Transactions dataframe

We will have a little fun plotting the data.

  • Monthly Sales distribution:
Monthly sales

The sales are more or less uniformly distributed across the board. The highest revenues seem to be generated around March and April of each year. The highest recorded monthly revenue is 1.5M in January 2014.

  • Annual Sales per customer:

We group the transactions per customer and sum their purchases over the course of a year. The following plot shows the evolution of behaviour for customers over the course of the three years (keep in mind that we only have two months’ worth of data for 2014, hence the low revenue).

Some customers are consistent over the years like customer 7 who buys more or less the same amount each year. Others, buy less and less from year to year, like customer 2 and and customer 9. Some customers seem to have lost interest over the course of the years but bounced back in 2014 like customer 1 and customer 6, who also generate higher revenues compared to other customers. While other customers buy more and more each year like customer 4 and customer 3.

Annual revenue per customer
  • Annual Revenue per product category:

After merging the products dataframe with the transactions dataframe, we group the transactions by category and year. The plot bellow shows the most popular items: ‘Books’ (no, seriously! we checked the math, a couple of times over 🤷‍♀️) and ‘Electronics’ (no surprise there), which hold the top sales over the years. The ‘Clothings’ and ‘Bags’ categories generate the least revenue.

Annual revenue by category
  • Does a customer buy an item more than once?

Returns aside, let’s see if the customers buy the same item more than once. The histogram shows that most clients do not come back for seconds.

  • Store Type:

The e-Shop records the most transactions. The rest of the stores have about the same number of transactions.

Store types
  • Returns:

We consider the recorded negative total_amt as returns. About 10% of the transactions recorded are returns.

Return rates at physical stores range from 8% to 10%, but rise to approximately 20% for e-Commerce. For Amazon it can go as high as 40% for certain categories.

We should be glad that our customers are loyal (or just lazy) enough not to bother with the hassle of returns 👀.

The returns are consistent over the years. The low returns recorded for 2014 are due to having only 2 months of transactional data available.

‘Books’ and ‘Electronics’ have the most returns per year. Which is consistent with the fact that they record the most transactions overall.

Returns by store type

At this point, we are just out of the awkward phase. We know our customers a little better: their age (so we never forget to wish them happy birthday), where they live, their favourite store, when they make the most purchases and what they spend most of their money on: books (we just hope they’re not pretending to be avid readers to impress us. It will be awkward for the both of us when they get a book discount code for their birthday gift).

3. Customer Segmentation

Now that we know our customers’ collective behaviour. We want to get up close and personal with them. It is time to put a label on them (loyal👰🤵 or a player, forward 😉 or shy, likes to spend money 💸 or a cheapskate etc). Shine light on their qualities and pretend we’re okay with their bad choices.

Communication is the basis of any healthy relationship. And customer relationship is no different.

But how do you communicate effectively with 6000 customers?

Divide and conquer

In order to optimise communication, we will divide our customer base into groups with similar behaviour. The ultimate goal is to get to know the customers on a deeper level and separate them into categories so we can tailor the content to their needs. The fancy marketing term for that is customer segmentation.

Birds of a feather flock together

To build those groups, we will mostly rely on the customers’ purchase history. One of the most intuitive and flexible ways to do that is RFM analysis. It stands for Recency, Frequency, Monetary Analysis.

Recency (R): How recently did the customer make a transaction?

Frequency(F): How many transactions did he make?

Monetary Value(M): How much did he spend?

We compute those three criteria and consider them as scores. Those scores will, in turn, serve to cluster the customer base.

  • We use the following code to compute the RFM columns for each customer.
  • We get the following dataframe as a result:
RFM columns description

a. Recency

The recency averages at 282 days. The least active customer hasn’t made a transaction in 3 years. The recency distribution is positively skewed.

Recency histogram

b. Frequency

The average customer made about 4 transactions over the course of the three years.

Frequency histogram

c. Monetary value

The average customer spent 2224 in monetary value over the course of the three years. The monetary value is also right skewed.

Monetary value histogram

Now that we have the three columns on hand. We move on to clustering by feature. For this purpose, we will call on the magic of machine learning. We use K-means, an unsupervised learning algorithm that uses distance to determine the best suited cluster for each data point.

In order to determine the optimal number of clusters to input into the k-means algorithm, we use SSE plots (elbow plots).

Accio elbow plots!

SSE plots for recency, frequency and monetary value

Three is the magic number for the three features.

There is nothing left for us but to cluster them customers 👨‍👩‍👧‍👦.

The code below uses K-means to create the clusters based on recency. We use similar code for the frequency and the monetary value.

We get three clusters based on each criterion.

The cluster number serves as a customer score for each feature. In the case of recency for example: cluster ‘0’ represents the least active customers whereas cluster ‘2’ represents the most active.

The clusters’ descriptions for recency and frequency
The clusters’ description for monetary value

Now that we have a score (clusters) for each of the three features, we will compute an overall score by summing the three scores for each customer. We end up with 6 clusters. Below is the descriptive of the mean per cluster for each feature.

Mean per cluster per feature based on overall scores

In order to better visualise our clusters, we segment the 6 overall scores as follows:

0 and 1: Low-Value Customers

2, 3 and 4: Mid-Value Customers

5 and 6: High-Value Customers (we stan 💕)

This segmentation allows us to plot the three features against each other in terms of customer value.

Recency and monetary value by segment
Frequency and monetary value by segment

We can clearly deduct the behaviour of the three distinct groups of customers:

  • The low-value customers have mostly high recency, low frequency and low monetary value.
  • The mid-value customers are a mixed pot. Their behaviour ranges from low frequency, low recency and high monetary value, to medium recency, medium frequency and medium monetary value.
  • The high-value customers range from low recency, high frequency and medium monetary value, to low recency, medium frequency and high monetary value.
Recency, frequency and monetary value by segment

And voilà! Beautifully segmented clusters.

We can see that the most impactful factor is recency. It clearly separates between the three segments of clients. The lower the recency the higher the customer value. The low value customers also have the least monetary value.

What is left to do, now, is build marketing campaigns tailored to the needs of those groups of customers.

And there you have it!

We have, somewhat successfully, been able to profile our customers and segment them into groups that have similar history and behaviour.

But fear not…

In the next chapter, we will dig deeper into our customer relationship, and take things further by engaging our customers and recommending their next purchases.

In order to do so, we will compare two recommendation system approaches: content based and collaborative filtering.

In the words of Queen B, it’s time to put a 💍 on it!

For the brave of heart, join us in part 2.

Links (for those who weren’t paying attention)

Google Colab


Kaggle (dataset)

Part 2





CodeWorks, un modèle d'ESN qui agit pour plus de justice sociale.

Notre Manifeste est le garant des droits et devoirs de chaque CodeWorker et des engagements que CodeWorks a vis-à-vis de chaque membre.
Il se veut réaliste, implémenté, partagé et inscrit dans une démarche d'amélioration continue.

Rejoins-nous !

Tu veux partager tes connaissances et ton temps avec des pairs empathiques, incarner une vision commune de l'excellence logicielle et participer activement à un modèle d'entreprise alternatif, rejoins-nous.