CTR Analysis

Analyzing drivers of click-through-rate (CTR) for location-based marketing agency. The report provides insights and recommendations to target the right audience based on location and maximize the revenue from advertising.

Data file

INDEX:

1: Data processing
2: Descriptive statistics
3: Visualizations
4: Ligistic regression

5: Conclusion

This assignment broadly deals with location-based mobile marketing. We have data from a location-based marketing agency which handles geo-fencing campaigns on behalf of advertisers. Due to the very large volume of data, we are given a random sample for two campaigns of a single advertiser – AMC Theaters. The advertising impressions are inserted into the mobile app being used on the device. The data include the following elements: impression size (e.g., 320x50 pixels), app category (e.g., IAB1), app review volume and valence, device OS (e.g., iOS), geo-fence lat/long coordinates, mobile device lat/long coordinates, and click outcome (0 or 1).

Reading data file

In [59]:
data <- read.csv("Geo-Fence Analytics.csv")
head(data)
imp_sizeapp_idapp_nameapp_pubapp_topcatapp_review_volapp_review_valdevice_latdevice_londevice_zipdevice_osgeofence_latgeofence_longepfence_radiusdidclick
320x50 62 Dictionary.com Android APP - First look inventory IAB5 63228 4.6 33.84575 -117.8250 92807 Android 33.83165 -117.8487 11.263 0
320x50 62 Dictionary.com Android APP - First look inventory IAB5 63228 4.6 33.85843 -117.8605 92806 Android 33.88024 -117.8604 11.263 0
320x50 62 Dictionary.com Android APP - First look inventory IAB5 63228 4.6 33.77808 -117.8752 92705 Android 33.81814 -117.9010 11.263 0
320x50 62 Dictionary.com Android APP - First look inventory IAB5 63228 4.6 33.87387 -117.8174 92886 Android 33.86065 -117.8011 11.263 0
320x50 62 Dictionary.com Android APP - First look inventory IAB5 63228 4.6 33.94048 -117.9675 90631 Android 33.92266 -117.9600 11.263 0
320x50 62 Dictionary.com Android APP - First look inventory IAB5 63228 4.6 33.80514 -118.0167 90630 Android 33.84287 -118.0128 11.263 0

Importing libraries

In [73]:
library("dplyr")
library("aspace")
library("psych")
library("graphics")
library("stats")

Data processing

a) creating dummy variable imp_large for the large impression

In [61]:
data$imp_large<- ifelse(data$imp_size == "728x90",1,0)

b) Creating dummy variables cat_entertainment, cat_social and cat_tech for app categories and os_ios for iOS devices:

In [62]:
data$cat_entertainment<-ifelse(data$app_topcat %in% c("IAB1", "IAB1-6"),1,0)
data$cat_social<- ifelse(data$app_topcat  == "IAB14",1,0)
data$cat_tech<- ifelse(data$app_topcat  == "IAB19-6",1,0)                              
data$os_ios<-ifelse(data$device_os == "iOS",1,0)

c) Creating variable distance using Harvesine formula to calculate the distance for a pair of latitude/longitude coordinates.

In [63]:
data$distance <- 6371*acos(cos(as_radians(data$device_lat)) * cos(as_radians(data$geofence_lat)) * 
                                  cos(as_radians(data$device_lon) - as_radians(data$geofence_lon)) + 
                                  sin(as_radians(data$device_lat)) * sin(as_radians(data$geofence_lat)))

data$distance_squared<-data$distance^2
data$ln_app_review_vol<-log(data$app_review_vol)

d) Binning distance into groups

In [76]:
data$distance_group<-as.list(data$distance)
newdf <- data %>%
  mutate(distance_group = case_when(
    distance_group > 0 & distance_group<= 0.5 ~ "1",
    distance_group > 0.5 & distance_group <=1 ~ "2",
    distance_group > 1 & distance_group <=2   ~ "3",
    distance_group > 2 & distance_group <= 4  ~ "4",
    distance_group > 4 & distance_group <= 7  ~ "5",
    distance_group > 7 & distance_group <= 10 ~ "6",
    distance_group > 10  ~ "7",
    TRUE                                      ~ "NA"
    ))

<h3 id=Descriptive-statistics">Descriptive statistics<a class="anchor-link" href="#"Descriptive-statistics">¶</a></h3>

In [74]:
variables<-cbind(data$didclick,data$distance, data$imp_large, data$cat_entertainment, data$cat_social, data$cat_tech, data$os_ios, data$ln_app_review_vol, data$app_review_val)
describe(variables)
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
X11 121567 0.0068110590.08224794 0.000000 0.00000000 0.00000 0.00000000 1.00000 1.000000 11.99263761 141.8245234 0.0002358942
X22 121567 2.9837371392.64852620 2.020864 2.50377167 1.77427 0.02075894 11.78666 11.765904 1.47823715 1.5108904 0.0075962006
X33 121567 0.2308768000.42139550 0.000000 0.16360084 0.00000 0.00000000 1.00000 1.000000 1.27728459 -0.3685471 0.0012085985
X44 121567 0.2839257360.45090308 0.000000 0.22991106 0.00000 0.00000000 1.00000 1.000000 0.95839881 -1.0814806 0.0012932287
X55 121567 0.1251244170.33086130 0.000000 0.03141227 0.00000 0.00000000 1.00000 1.000000 2.26604020 3.1349640 0.0009489386
X66 121567 0.5178461260.49968347 1.000000 0.52230734 0.00000 0.00000000 1.00000 1.000000 -0.07142914 -1.9949143 0.0014331351
X77 121567 0.2503639970.43322443 0.000000 0.18795949 0.00000 0.00000000 1.00000 1.000000 1.15244631 -0.6718730 0.0012425249
X88 121567 10.0567989040.63696194 10.087225 9.98384595 0.00000 7.08086790 12.93770 5.856834 1.78926780 6.1773255 0.0018268615
X99 121567 3.6548726220.36081251 3.400000 3.62465889 0.00000 1.40000000 4.70000 3.300000 0.01724566 1.9459931 0.0010348413
In [75]:
#Looking at the relationships between variables. Strong positive correlation between variables is not obseved
cor(variables)
1.000000000-0.006628356-0.004786218-0.007117972-0.005623417 0.01245437 -0.002147325 0.003982875-0.006523592
-0.006628356 1.000000000 0.020024918-0.028992663 0.060484490 0.02349954 -0.060281389-0.157864184 0.022481133
-0.004786218 0.020024918 1.000000000-0.254731873-0.185311155 0.41404927 -0.190194050 0.049929790-0.321439020
-0.007117972-0.028992663-0.254731873 1.000000000-0.238133905-0.65257568 0.312647684-0.105545185 0.642212363
-0.005623417 0.060484490-0.185311155-0.238133905 1.000000000-0.39192721 0.513672844-0.115376574 0.194394425
0.012454366 0.023499545 0.414049273-0.652575678-0.391927215 1.00000000 -0.598919227 0.049503835-0.732067145
-0.002147325-0.060281389-0.190194050 0.312647684 0.513672844-0.59891923 1.000000000-0.013523794 0.366139311
0.003982875-0.157864184 0.049929790-0.105545185-0.115376574 0.04950383 -0.013523794 1.000000000 0.014457854
-0.006523592 0.022481133-0.321439020 0.642212363 0.194394425-0.73206714 0.366139311 0.014457854 1.000000000

Visualizations

In [68]:
#The highest CTR is observed in 1 and 2 distance groups, the lowest in group 4 which is logical 
#that users are more likely click on an ad if they are close to the advertised location
newdf$impressions<-c(1)
plot(newdf %>% 
  group_by(distance_group) %>% 
  summarise(ctr = sum(didclick)/sum(impressions)))
In [69]:
#The higher app review valence, the higher CTR is
plot(newdf %>% 
       group_by(app_review_val) %>% 
       summarise(ctr = sum(didclick)/sum(impressions)))
In [70]:
#The mean value of app review volume is around 10. The highest click-through-rate is observed in the range 10-11.5 of 
#app review volume
plot(newdf %>% 
       group_by(ln_app_review_vol) %>% 
       summarise(ctr = sum(didclick)/sum(impressions)))
In [77]:
#Normalizing distance variables to improve data integrity
newdf$norm_dist<-(distance-mean(distance))/sd(distance)
newdf$norm_dist_squared<-newdf$norm_dist^2

Logistic regression

In [72]:
summary(glm(didclick ~ norm_dist + norm_dist_squared + imp_large + cat_entertainment + cat_social + cat_tech +
      os_ios + ln_app_review_vol + app_review_val, family="binomial", data=newdf))
Call:
glm(formula = didclick ~ norm_dist + norm_dist_squared + imp_large + 
    cat_entertainment + cat_social + cat_tech + os_ios + ln_app_review_vol + 
    app_review_val, family = "binomial", data = newdf)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1510  -0.1272  -0.1148  -0.1042   3.4025  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -6.88708    0.89185  -7.722 1.14e-14 ***
norm_dist         -0.16746    0.05761  -2.907 0.003652 ** 
norm_dist_squared  0.06430    0.03059   2.102 0.035583 *  
imp_large         -0.35216    0.09178  -3.837 0.000125 ***
cat_entertainment -0.09614    0.17894  -0.537 0.591069    
cat_social        -0.22669    0.21139  -1.072 0.283550    
cat_tech           0.68766    0.17631   3.900 9.61e-05 ***
os_ios             0.38589    0.12636   3.054 0.002259 ** 
ln_app_review_vol  0.03051    0.06304   0.484 0.628368    
app_review_val     0.32383    0.18666   1.735 0.082757 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9912.5  on 121566  degrees of freedom
Residual deviance: 9857.1  on 121557  degrees of freedom
AIC: 9877.1

Number of Fisher Scoring iterations: 8

Conclusion

Distance (variable “norm_dist”) has inverse relationships with clicks, because its estimated coefficient is negative. Its p-value is less than 0.05 which indicates that it is significant in the model. Next variable “norm_dist_squared” has linear relationships with the dependent variable “didclick”, its coefficient is positive and p-value is less significant than for the distance. Imp_large variable is significant in the model and its relationship with the dependent variable is inverse, which means that for a one-unit increase in the variable “imp_large”, we expect a 0.35216 decrease in the log-odds of the dependent variable didclick. Cat_entertainment, ln_app_review_vol have very high p-values, therefore they are not significant in this model. Cat_tech and os_ios are significant due to their low p-values and they have linear relationships with the dependent variable “didclick” which means that for a one-unit increase in these independent variables we expect a 0.68766 and 0.38589 increase in the log-odds of the dependent variable didclick.
With all said, we can conclude that number of clicks depend on the following factors: 1) How far the user is from the AMC Theaters. The farther he is from the advertiser, the less likely he will click on the ad. 2) The size of the impression. The larger the impression size is the less chances the user will click on it. 3) Device type. iOS users click on an ad more often than Android users. 4) Pinger app users demonstrate higher click rate than other app users. Based on this analysis I suggest some recommendations to AMC Theaters: target iOS users in the close proximity to the Theaters, use small to medium impression size. Pinger app proved to be a good advertising platform for their campaigns. Having all this as focus, it is also recommended to customize and target the ads to other segments of their customers as well.