CTR Analysis

Analyzing drivers of click-through-rate (CTR) for location-based marketing agency. The report provides insights and recommendations to target the right audience based on location and maximize the revenue from advertising.

Data file

INDEX:

1: Data processing
2: Descriptive statistics
3: Visualizations
4: Ligistic regression

5: Conclusion

This assignment broadly deals with location-based mobile marketing. We have data from a location-based marketing agency which handles geo-fencing campaigns on behalf of advertisers. Due to the very large volume of data, we are given a random sample for two campaigns of a single advertiser – AMC Theaters. The advertising impressions are inserted into the mobile app being used on the device. The data include the following elements: impression size (e.g., 320x50 pixels), app category (e.g., IAB1), app review volume and valence, device OS (e.g., iOS), geo-fence lat/long coordinates, mobile device lat/long coordinates, and click outcome (0 or 1).

Reading data file

data <- read.csv("Geo-Fence Analytics.csv")
head(data)

Importing libraries

library("dplyr")
library("aspace")
library("psych")
library("graphics")
library("stats")

Data processing¶

a) creating dummy variable imp_large for the large impression

data$imp_large<- ifelse(data$imp_size == "728x90",1,0)

b) Creating dummy variables cat_entertainment, cat_social and cat_tech for app categories and os_ios for iOS devices:

data$cat_entertainment<-ifelse(data$app_topcat %in% c("IAB1", "IAB1-6"),1,0)
data$cat_social<- ifelse(data$app_topcat  == "IAB14",1,0)
data$cat_tech<- ifelse(data$app_topcat  == "IAB19-6",1,0)                              
data$os_ios<-ifelse(data$device_os == "iOS",1,0)

c) Creating variable distance using Harvesine formula to calculate the distance for a pair of latitude/longitude coordinates.

data$distance <- 6371*acos(cos(as_radians(data$device_lat)) * cos(as_radians(data$geofence_lat)) * 
                                  cos(as_radians(data$device_lon) - as_radians(data$geofence_lon)) + 
                                  sin(as_radians(data$device_lat)) * sin(as_radians(data$geofence_lat)))

data$distance_squared<-data$distance^2
data$ln_app_review_vol<-log(data$app_review_vol)

d) Binning distance into groups

data$distance_group<-as.list(data$distance)
newdf <- data %>%
  mutate(distance_group = case_when(
    distance_group > 0 & distance_group<= 0.5 ~ "1",
    distance_group > 0.5 & distance_group <=1 ~ "2",
    distance_group > 1 & distance_group <=2   ~ "3",
    distance_group > 2 & distance_group <= 4  ~ "4",
    distance_group > 4 & distance_group <= 7  ~ "5",
    distance_group > 7 & distance_group <= 10 ~ "6",
    distance_group > 10  ~ "7",
    TRUE                                      ~ "NA"
    ))

<h3 id=Descriptive-statistics">Descriptive statistics<a class="anchor-link" href="#"Descriptive-statistics">¶</a></h3>

variables<-cbind(data$didclick,data$distance, data$imp_large, data$cat_entertainment, data$cat_social, data$cat_tech, data$os_ios, data$ln_app_review_vol, data$app_review_val)
describe(variables)

#Looking at the relationships between variables. Strong positive correlation between variables is not obseved
cor(variables)

Visualizations¶

#The highest CTR is observed in 1 and 2 distance groups, the lowest in group 4 which is logical 
#that users are more likely click on an ad if they are close to the advertised location
newdf$impressions<-c(1)
plot(newdf %>% 
  group_by(distance_group) %>% 
  summarise(ctr = sum(didclick)/sum(impressions)))

#The higher app review valence, the higher CTR is
plot(newdf %>% 
       group_by(app_review_val) %>% 
       summarise(ctr = sum(didclick)/sum(impressions)))

#The mean value of app review volume is around 10. The highest click-through-rate is observed in the range 10-11.5 of 
#app review volume
plot(newdf %>% 
       group_by(ln_app_review_vol) %>% 
       summarise(ctr = sum(didclick)/sum(impressions)))

#Normalizing distance variables to improve data integrity
newdf$norm_dist<-(distance-mean(distance))/sd(distance)
newdf$norm_dist_squared<-newdf$norm_dist^2

Logistic regression¶

summary(glm(didclick ~ norm_dist + norm_dist_squared + imp_large + cat_entertainment + cat_social + cat_tech +
      os_ios + ln_app_review_vol + app_review_val, family="binomial", data=newdf))

Call:
glm(formula = didclick ~ norm_dist + norm_dist_squared + imp_large + 
    cat_entertainment + cat_social + cat_tech + os_ios + ln_app_review_vol + 
    app_review_val, family = "binomial", data = newdf)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1510  -0.1272  -0.1148  -0.1042   3.4025  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -6.88708    0.89185  -7.722 1.14e-14 ***
norm_dist         -0.16746    0.05761  -2.907 0.003652 ** 
norm_dist_squared  0.06430    0.03059   2.102 0.035583 *  
imp_large         -0.35216    0.09178  -3.837 0.000125 ***
cat_entertainment -0.09614    0.17894  -0.537 0.591069    
cat_social        -0.22669    0.21139  -1.072 0.283550    
cat_tech           0.68766    0.17631   3.900 9.61e-05 ***
os_ios             0.38589    0.12636   3.054 0.002259 ** 
ln_app_review_vol  0.03051    0.06304   0.484 0.628368    
app_review_val     0.32383    0.18666   1.735 0.082757 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9912.5  on 121566  degrees of freedom
Residual deviance: 9857.1  on 121557  degrees of freedom
AIC: 9877.1

Number of Fisher Scoring iterations: 8

Conclusion¶

Distance (variable “norm_dist”) has inverse relationships with clicks, because its estimated coefficient is negative. Its p-value is less than 0.05 which indicates that it is significant in the model. Next variable “norm_dist_squared” has linear relationships with the dependent variable “didclick”, its coefficient is positive and p-value is less significant than for the distance. Imp_large variable is significant in the model and its relationship with the dependent variable is inverse, which means that for a one-unit increase in the variable “imp_large”, we expect a 0.35216 decrease in the log-odds of the dependent variable didclick. Cat_entertainment, ln_app_review_vol have very high p-values, therefore they are not significant in this model. Cat_tech and os_ios are significant due to their low p-values and they have linear relationships with the dependent variable “didclick” which means that for a one-unit increase in these independent variables we expect a 0.68766 and 0.38589 increase in the log-odds of the dependent variable didclick.
With all said, we can conclude that number of clicks depend on the following factors: 1) How far the user is from the AMC Theaters. The farther he is from the advertiser, the less likely he will click on the ad. 2) The size of the impression. The larger the impression size is the less chances the user will click on it. 3) Device type. iOS users click on an ad more often than Android users. 4) Pinger app users demonstrate higher click rate than other app users. Based on this analysis I suggest some recommendations to AMC Theaters: target iOS users in the close proximity to the Theaters, use small to medium impression size. Pinger app proved to be a good advertising platform for their campaigns. Having all this as focus, it is also recommended to customize and target the ads to other segments of their customers as well.

imp_size	app_id	app_name	app_topcat	app_review_vol	app_review_val	device_lat	device_lon	device_zip	device_os	geofence_lat	geofence_lon	gepfence_radius
320x50	62	Dictionary.com Android APP - First look inventory	IAB5	63228	4.6	33.84575	-117.8250	92807	Android	33.83165	-117.8487	11.263
320x50	62	Dictionary.com Android APP - First look inventory	IAB5	63228	4.6	33.85843	-117.8605	92806	Android	33.88024	-117.8604	11.263
320x50	62	Dictionary.com Android APP - First look inventory	IAB5	63228	4.6	33.77808	-117.8752	92705	Android	33.81814	-117.9010	11.263
320x50	62	Dictionary.com Android APP - First look inventory	IAB5	63228	4.6	33.87387	-117.8174	92886	Android	33.86065	-117.8011	11.263
320x50	62	Dictionary.com Android APP - First look inventory	IAB5	63228	4.6	33.94048	-117.9675	90631	Android	33.92266	-117.9600	11.263
320x50	62	Dictionary.com Android APP - First look inventory	IAB5	63228	4.6	33.80514	-118.0167	90630	Android	33.84287	-118.0128	11.263

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
X1	1	121567	0.006811059	0.08224794	0.000000	0.00000000	0.00000	0.00000000	1.00000	1.000000	11.99263761	141.8245234	0.0002358942
X2	2	121567	2.983737139	2.64852620	2.020864	2.50377167	1.77427	0.02075894	11.78666	11.765904	1.47823715	1.5108904	0.0075962006
X3	3	121567	0.230876800	0.42139550	0.000000	0.16360084	0.00000	0.00000000	1.00000	1.000000	1.27728459	-0.3685471	0.0012085985
X4	4	121567	0.283925736	0.45090308	0.000000	0.22991106	0.00000	0.00000000	1.00000	1.000000	0.95839881	-1.0814806	0.0012932287
X5	5	121567	0.125124417	0.33086130	0.000000	0.03141227	0.00000	0.00000000	1.00000	1.000000	2.26604020	3.1349640	0.0009489386
X6	6	121567	0.517846126	0.49968347	1.000000	0.52230734	0.00000	0.00000000	1.00000	1.000000	-0.07142914	-1.9949143	0.0014331351
X7	7	121567	0.250363997	0.43322443	0.000000	0.18795949	0.00000	0.00000000	1.00000	1.000000	1.15244631	-0.6718730	0.0012425249
X8	8	121567	10.056798904	0.63696194	10.087225	9.98384595	0.00000	7.08086790	12.93770	5.856834	1.78926780	6.1773255	0.0018268615
X9	9	121567	3.654872622	0.36081251	3.400000	3.62465889	0.00000	1.40000000	4.70000	3.300000	0.01724566	1.9459931	0.0010348413

1.000000000	-0.006628356	-0.004786218	-0.007117972	-0.005623417	0.01245437	-0.002147325	0.003982875	-0.006523592
-0.006628356	1.000000000	0.020024918	-0.028992663	0.060484490	0.02349954	-0.060281389	-0.157864184	0.022481133
-0.004786218	0.020024918	1.000000000	-0.254731873	-0.185311155	0.41404927	-0.190194050	0.049929790	-0.321439020
-0.007117972	-0.028992663	-0.254731873	1.000000000	-0.238133905	-0.65257568	0.312647684	-0.105545185	0.642212363
-0.005623417	0.060484490	-0.185311155	-0.238133905	1.000000000	-0.39192721	0.513672844	-0.115376574	0.194394425
0.012454366	0.023499545	0.414049273	-0.652575678	-0.391927215	1.00000000	-0.598919227	0.049503835	-0.732067145
-0.002147325	-0.060281389	-0.190194050	0.312647684	0.513672844	-0.59891923	1.000000000	-0.013523794	0.366139311
0.003982875	-0.157864184	0.049929790	-0.105545185	-0.115376574	0.04950383	-0.013523794	1.000000000	0.014457854
-0.006523592	0.022481133	-0.321439020	0.642212363	0.194394425	-0.73206714	0.366139311	0.014457854	1.000000000