RFM (recency, frequency, monetary) analysis is a behavior based technique used to segment customers by examining their transaction history such as
It is based on the marketing axiom that 80% of your business comes from 20% of your customers. RFM helps to identify customers who are more likely to respond to promotions by segmenting them into various categories.
To calculate the RFM score for each customer we need transaction data which should include the following:
rfm includes a sample data set rfm_data_orders which includes the above details:
## # A tibble: 4,906 x 3
##    customer_id         order_date revenue
##    <chr>               <date>       <dbl>
##  1 Mr. Brion Stark Sr. 2004-12-20      32
##  2 Ethyl Botsford      2005-05-02      36
##  3 Hosteen Jacobi      2004-03-06     116
##  4 Mr. Edw Frami       2006-03-15      99
##  5 Josef Lemke         2006-08-14      76
##  6 Julisa Halvorson    2005-05-28      56
##  7 Judyth Lueilwitz    2005-03-09     108
##  8 Mr. Mekhi Goyette   2005-09-23     183
##  9 Hansford Moen PhD   2005-09-07      30
## 10 Fount Flatley       2006-04-12      13
## # ... with 4,896 more rowsSo how is the RFM score computed for each customer? The below steps explain the process:
A recency score is assigned to each customer based on date of most recent purchase. The score is generated by binning the recency values into a number of categories (default is 5). For example, if you use four categories, the customers with the most recent purchase dates receive a recency ranking of 4, and those with purchase dates in the distant past receive a recency ranking of 1.
A frequency ranking is assigned in a similar way. Customers with high purchase frequency are assigned a higher score (4 or 5) and those with lowest frequency are assigned a score 1.
Monetary score is assigned on the basis of the total revenue generated by the customer in the period under consideration for the analysis. Customers with highest revenue/order amount are assigned a higher score while those with lowest revenue are assigned a score of 1.
A fourth score, RFM score is generated which is simply the three individual scores concatenated into a single value.
The customers with the highest RFM scores are most likely to respond to an offer. Now that we have understood how the RFM score is computed, it is time to put it into practice. Use rfm_table_order() to generate the score for each customer from the sample data set rfm_data_orders.
rfm_table_order() takes 8 inputs:
data: a data set with
customer_id: name of the customer id columnorder_date: name of the transaction date columnrevenue: name of the transaction amount columnanalysis_date: date of analysisrecency_bins: number of rankings for recency score (default is 5)frequency_bins: number of rankings for frequency score (default is 5)monetary_bins: number of rankings for monetary score (default is 5)analysis_date <- lubridate::as_date("2006-12-31", tz = "UTC")
rfm_result <- rfm_table_order(rfm_data_orders, customer_id, order_date, revenue, analysis_date)
rfm_result| customer_id | date_most_recent | recency_days | transaction_count | amount | recency_score | frequency_score | monetary_score | rfm_score | 
|---|---|---|---|---|---|---|---|---|
| Abbey O’Reilly DVM | 2006-06-09 | 205 | 6 | 472 | 3 | 4 | 3 | 343 | 
| Add Senger | 2006-08-13 | 140 | 3 | 340 | 4 | 1 | 2 | 412 | 
| Aden Lesch Sr. | 2006-06-20 | 194 | 4 | 405 | 3 | 2 | 3 | 323 | 
| Admiral Senger | 2006-08-21 | 132 | 5 | 448 | 4 | 3 | 3 | 433 | 
| Agness O’Keefe | 2006-10-02 | 90 | 9 | 843 | 5 | 5 | 5 | 555 | 
| Aileen Barton | 2006-10-08 | 84 | 9 | 763 | 5 | 5 | 5 | 555 | 
| Ailene Hermann | 2006-03-25 | 281 | 8 | 699 | 3 | 5 | 5 | 355 | 
| Aiyanna Bruen PhD | 2006-04-29 | 246 | 4 | 157 | 3 | 2 | 1 | 321 | 
| Ala Schmidt DDS | 2006-01-16 | 349 | 3 | 363 | 2 | 1 | 2 | 212 | 
| Alannah Borer | 2005-04-21 | 619 | 4 | 196 | 1 | 2 | 1 | 121 | 
rfm_table_order() will return the following columns as seen in the above table:
customer_id: unique customer iddate_most_recent: date of most recent visitrecency_days: days since the most recent visittransaction_count: number of transactions of the customeramount: total revenue generated by the customerrecency_score: recency score of the customerfrequency_score: frequency score of the customermonetary_score: monetary score of the customerrfm_score: RFM score of the customerThe heat map shows the average monetary value for different categories of recency and frequency scores. Higher scores of frequency and recency are characterized by higher average monetary value as indicated by the darker areas in the heatmap.
Use rfm_bar_chart() to generate the distribution of monetary scores for the different combinations of frequency and recency scores.
Use rfm_histograms() to examine the relative distribution of
Visualize the distribution of customers across orders.
The best customers are those who:
Now let us examine the relationship between the above.
Customers who visited more recently generated more revenue compared to those who visited in the distant past. The customers who visited in the recent past are more likely to return compared to those who visited long time ago as most of those would be lost customers. As such, higher revenue would be associated with most recent visits.
As the frequency of visits increases, the revenue generated also increases. Customers who visit more frquently are your champion customers, loyal customers or potential loyalists and they drive higher revenue.
Customers with low frequency visited in the distant past while those with high frequency have visited in the recent past. Again, the customers who visited in the recent past are more likely to return compared to those who visited long time ago. As such, higher frequency would be associated with the most recent visits.
Let us classify our customers based on the individual recency, frequency and monetary scores.
| Segment | Description | R | F | M | 
|---|---|---|---|---|
| Champions | Bought recently, buy often and spend the most | 4 - 5 | 4 - 5 | 4 - 5 | 
| Loyal Customers | Spend good money. Responsive to promotions | 2 - 5 | 3 - 5 | 3 - 5 | 
| Potential Loyalist | Recent customers, spent good amount, bought more than once | 3 - 5 | 1 - 3 | 1 - 3 | 
| New Customers | Bought more recently, but not often | 4 - 5 | <= 1 | <= 1 | 
| Promising | Recent shoppers, but haven’t spent much | 3 - 4 | <= 1 | <= 1 | 
| Need Attention | Above average recency, frequency & monetary values | 2 - 3 | 2 - 3 | 2 - 3 | 
| About To Sleep | Below average recency, frequency & monetary values | 2 - 3 | <= 2 | <= 2 | 
| At Risk | Spent big money, purchased often but long time ago | <= 2 | 2 - 5 | 2 - 5 | 
| Can’t Lose Them | Made big purchases and often, but long time ago | <= 1 | 4 - 5 | 4 - 5 | 
| Hibernating | Low spenders, low frequency, purchased long time ago | 1 - 2 | 1 - 2 | 1 - 2 | 
| Lost | Lowest recency, frequency & monetary scores | <= 2 | <= 2 | <= 2 | 
We can use the segmented data to identify
Once we have classified a customer into a particular segment, we can take appropriate action to increase his/her lifetime value.
Now that we have defined and segmented our customers, let us examine the distribution of customers across the segments. Ideally, we should have very few or no customer in segments such as At Risk or Needs Attention.
## # A tibble: 10 x 2
##    Segment            Count
##    <chr>              <int>
##  1 Loyal Customers      288
##  2 Lost                 185
##  3 At Risk              180
##  4 Potential Loyalist   147
##  5 About To Sleep        58
##  6 Others                48
##  7 Need Attention        34
##  8 Promising             21
##  9 Can't Lose Them       17
## 10 New Customers         17We can also examine the median recency, frequency and monetary value across segments to ensure that the logic used for customer classification is sound and practical.