Suppose we have some monthly exposures that we would like to add premium data to.
exposures_PM <- addExposures(records, type = "PM")
head(exposures_PM)| key | duration | policy_month | start_int | end_int | exposure | 
|---|---|---|---|---|---|
| B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | 
| B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 
| B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 
| B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | 
| B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | 
| B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | 
Simulated premium data “trans” comes with the package.
head(trans)| key | trans_date | amt | 
|---|---|---|
| B10251C8 | 2012-12-04 | 199 | 
| B10251C8 | 2013-12-28 | 197 | 
| B10251C8 | 2015-12-30 | 177 | 
| B10251C8 | 2019-05-07 | 192 | 
| B10251C8 | 2012-04-15 | 206 | 
| B10251C8 | 2019-04-02 | 220 | 
The addStart function adds the start date of the appropriate exposure interval to the transactions.
trans_with_interval <- addStart(trans, exposures_PM)
head(trans_with_interval)| start_int | key | trans_date | amt | 
|---|---|---|---|
| 2010-05-10 | B10251C8 | 2010-05-28 | 190 | 
| 2010-06-10 | B10251C8 | 2010-07-04 | 189 | 
| 2010-11-10 | B10251C8 | 2010-11-21 | 179 | 
| 2011-04-10 | B10251C8 | 2011-05-08 | 210 | 
| 2011-07-10 | B10251C8 | 2011-07-12 | 198 | 
| 2012-01-10 | B10251C8 | 2012-01-14 | 194 | 
We can group and aggregate by key and start_int to get unique transaction rows corresponding to intervals in exposures_PM.
trans_to_join <- trans_with_interval %>% group_by(start_int, key) %>% summarise(premium = sum(amt))
head(trans_to_join)| start_int | key | premium | 
|---|---|---|
| 2005-06-01 | D68554D5 | 97 | 
| 2005-10-01 | D68554D5 | 169 | 
| 2005-12-01 | D68554D5 | 96 | 
| 2006-01-01 | D68554D5 | 193 | 
| 2006-02-01 | D68554D5 | 107 | 
| 2006-03-01 | D68554D5 | 119 | 
Then we can join this to the exposures using a left join without duplicating any exposures.
premium_study <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(premium_study, 10)| key | duration | policy_month | start_int | end_int | exposure | premium | 
|---|---|---|---|---|---|---|
| B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA | 
| B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 | 
| B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 | 
| B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | NA | 
| B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | NA | 
| B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | NA | 
| B10251C8 | 1 | 7 | 2010-10-10 | 2010-11-09 | 0.08487 | NA | 
| B10251C8 | 1 | 8 | 2010-11-10 | 2010-12-09 | 0.08214 | 179 | 
| B10251C8 | 1 | 9 | 2010-12-10 | 2011-01-09 | 0.08487 | NA | 
| B10251C8 | 1 | 10 | 2011-01-10 | 2011-02-09 | 0.08487 | NA | 
Change the NA values resulting from the join to zeros using an if_else.
premium_study <- premium_study %>% mutate(premium = if_else(is.na(premium), 0, premium))
head(premium_study, 10)| key | duration | policy_month | start_int | end_int | exposure | premium | 
|---|---|---|---|---|---|---|
| B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | 0 | 
| B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 | 
| B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 | 
| B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | 0 | 
| B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | 0 | 
| B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | 0 | 
| B10251C8 | 1 | 7 | 2010-10-10 | 2010-11-09 | 0.08487 | 0 | 
| B10251C8 | 1 | 8 | 2010-11-10 | 2010-12-09 | 0.08214 | 179 | 
| B10251C8 | 1 | 9 | 2010-12-10 | 2011-01-09 | 0.08487 | 0 | 
| B10251C8 | 1 | 10 | 2011-01-10 | 2011-02-09 | 0.08487 | 0 | 
Now we are free to do any calculations we want. For a simple example we calculate the average premium in the first two policy months. Refer to the section on adding additional information for more creative policy splits.
premium_study %>% filter(policy_month %in% c(1,2)) %>% group_by(policy_month) %>% summarise(avg_premium = mean(premium))| policy_month | avg_premium | 
|---|---|
| 1 | 60.46 | 
| 2 | 66.88 | 
Suppose we were interested in what the last premium paid by a policy was for some predictive analytics project. Again we left join the premium to the exposure frame.
previous_premium_unfilled <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(previous_premium_unfilled)| key | duration | policy_month | start_int | end_int | exposure | premium | 
|---|---|---|---|---|---|---|
| B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA | 
| B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 | 
| B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 | 
| B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | NA | 
| B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | NA | 
| B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | NA | 
This time we fill in NA values with the previous paid premium instead of 0. The first interval is NA because there are no prior premiums.
previous_premium <- previous_premium_unfilled %>%
tidyr::fill(premium, .direction = "down")| key | duration | policy_month | start_int | end_int | exposure | premium | 
|---|---|---|---|---|---|---|
| B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA | 
| B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 | 
| B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 | 
| B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | 189 | 
| B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | 189 | 
| B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | 189 | 
Data manipulations similar to this can be used to engineer features for anything varying with time: account values, guarantees, planned premiums, etc…