In today post I will try to showcase how to get geographical data from Google Analytics (longitude, latitude), manipulate the data to the form that we can use it to visualize with help of R.
1. Project Description
This project came to my mind when I was thinking how to better understand our data in Google analytics and what I can do better in order to optimize performance of marketing channels. Likewise, the project is a great way to test and improve my R skills, data manipulation and at the end data visualization. As a data analyst, I always seek for ways to learn and practice, but in many cases it is up to take an action and create own practice project.
What I am seeking as an end result is this:
This is an example geospatial analysis for US, and I will be trying to the same for Slovak Administrative regions and plot our data from Google Analytics.
I grabbed a dataset from my friend`s google analytics account with a help of R. Dataset contains geospatial data like longitude and latitude; moreover, as a metric I choose for the clarity SESSIONS, but further we will use different metrics to visualize.
A. Download data from Google Analytics to R
Let` s jump to RStudio and load neccessary libraries for this project. We will be using 3 packages, namely googleanalyticsR(fantastic package from Mark Edmondson), ggmap and tidyverse packages from data manipulation and visualization. Then we have there library called Raster that will provide us with a shapefile of administrative regions of Slovakia.
# Start by loading the libraries we'll want to use. install.packages("googleAnalyticsR") library(googleAnalyticsR) library(ggmap) library(tidyverse) library(raster) #account authentication ga_auth(new_user = TRUE) # Set the view ID that we'll be using view_id <- 181351934
If all went fine, you should have loaded libraries, authenticated your google account as well as stored our google analytics view id in variable view_id. Let`s continue to download our data from Google Analytics to R. In order to do that, we will use this code that will do the magic ( for now, I will use simple session metric from Google Analytics in order to keep things simple):
# Pull the data. This is set to pull the last 400 days of data. gadata <- google_analytics_4(view_id, date_range = c("2017-08-01", "2019-07-10"), metrics = c("sessions"), dimensions = c("date", "longitude", "latitude"), anti_sample = TRUE )
Okay, now we have downloaded data from google analytics into R and stored it as gadata. The table should look similar to one below where we have, in our case, 4 variables (columns) and 11492 observations (rows).
B. Get geospatial data and borders of administrative regions of Slovakia
It was a long search to figure it out how to do this one, but thanks to stackoverflow and google I was happy to find an answer. First of all, there are multiple ways how to get polygon map of certain location. You can even generate a shapefile online or to use a package called RASTER where you will specify the location and level of detail you need.
We have already loaded a RASTER library so now I can show you how you can get a map of Slovakia ( or any other country) and its particular regions or districts. Please, see the code below which will generate for you the map you are searching for. In my case, I am trying to get data for country Slovakia and just for learning purpose I want get administrative regions as well as districts.
slovakia_level_1 <- getData('GADM', country='SVK', level=1) slovakia_level_2 <- getData('GADM', country='SVK', level=2)
slovakia_level_1 is where we stored our polygon dataframe at the level=1 which means the highest level of administrative districts. If we use this code plot(slovakia_level_1), we will get the map below that nicely showcase the administrative regions of slovakia. We got this level of detail due to defining level=1 in our code above.
If we want our map to be more detailed, meaning to be split into smaller districts, we will define it like level=2. If we use again plot function like plot(slovakia_level_2) then we will get the following map below:
If you are seeking just to visualize certain parts of Slovakia, let`s say in my case, I want to now only incoming visits for Bratislavsky, Trenciansky and Trnavsky region, we can use following code to subset our data just for this parts.
western_regions<-subset(slovakia_level_2,NAME_1=="Bratislavský" | NAME_1=="Trenčiansky" | NAME_1=="Trnavský") plot(western_regions)
After plotting western_regions we will get following map. This way we have got only western regions with specific districts. Later on I will show you also how to visualize the data on the map.
To sum up this part, this way we are able to borders of region on polygon map and I found it pretty valuable to get the deeper knowledge what region brings the most traffic. Now we can finally to proceed to plotting our data from google analytics into this polygon maps and see via heat-map which parts of Slovakia works best.
C. Plot Google Analytics data into polygon map
Yes, the time is here to see our data on polygon maps that will show us different regions of Slovakia, how they bring traffic to the website. Before we jump into the code and visualization, there is one thing we need to do the before. The thing we need to check is in what format we have stored our data after we have downloaded it from Google Analytics. We would assume that longitude and latitude are stored in the right numeric format as we can see they are build up from numbers.
First time when I was doing this project, I could not make it run for a nice while just because I did not check a format – but we are here to learn, even tough the hard way. Now I know and it is the first think I do from now on to be sure all my variables are in the format I need. In order to check the format of our variables, we can use the function glimpse(). In our case glimpse(gadata) and you will receive the similar output as on the image below. As the image shows, the longitude and latitude is stored as character. This way we cannot plot our google analytics data to the polygon map because due to the fact data are stored as character it will not be read.
For this reason, we need to convert data frame column from character to numeric. We will do this by using as.numeric().
gadata$longitude <- as.numeric(gadata$longitude) gadata$latitude <- as.numeric(gadata$latitude)
Now, it is smart to check if the conversion came through successfully. We will use the same function glimpse() as before.
As we can see, our variables have proper format and we can start to plot them. Below you can see the code that will use to plot our google analytics data to polygon map.
spdf <- SpatialPointsDataFrame(coords=gadata[2:3], data=gadata, proj4string=CRS(proj4string(slovakia_level_1))) ppl.sum <- aggregate(x=spdf["sessions"], by=slovakia_level_1, FUN=sum) spplot(ppl.sum, "sessions", main="Sessions in Slovakia")
And what the code actually above does? First of all, we are defining what column to use for longitude and latitude. To be honest, I believe they must be strictly in this order longitude and latitude, not vice versa as it did not worked for me that way. Then data=gadata – here we are defining data we want to use, in our case sessions. Function spplot (here you can find description) is a wrapper that help us to plot multiple layers over the map. By using this spplot function we will get our wanted map. What I like about this map is that it gives me aggregated view of sessions to the website based on the location and administrative region in Slovakia.
For example, if company wants to sell all over the Slovakia, then it is probably good to know why certain regions bring less visits than others.
Questions to be asked:
1. Is there a stronger local competition?
2. Are we spending marketing budget on stronger purchasing power regions?
3. Are we doing enough for our local search rankings?
If you are only interested to see western regions of Slovakia, you can do it following way. What we will change is that we will substitute previous polygon data frame with our new subsetted data frame of western_regions.
spdf3 <- SpatialPointsDataFrame(coords=ses[2:3], data=ses, proj4string=CRS(proj4string(western_regions))) ppl.sum1 <- aggregate(x=spdf3["sessions"], by=western_regions, FUN=sum) spplot(ppl.sum1, "sessions", main="Sessions in Slovakia")
After using function spplot, we will get following map.
The goal of this post was to download geographical Google Analytics data and visualize them based on districts in Slovakia. I have learnt ( and hope you as well) how to get administrative regions of particular country. There are other ways how to get polygon map of administrative parts, for example from shapefile. For this reason, I am creating follow up posts how to do it with shapefile.
In conclusion, I was trying to do:
- download data from google analytics with help of googleAnalyticsR package
- download polygon map of administrative regions of Slovakia
- manipulate data to format we can further work with ( eg. check the format of google analytics data if they are in proper format)
- plot our google analytics data to polygon map
- visualize with help of heat-map our traffic data in aggregated form on the polygon map
I hope it is some way useful for you to visualize your data. For the sake of simplicity I have used metric sessions from google analytics, but surely there are smarter metrics or KPIs that you want to know. If you are heavily investing into Google Ads for example, it would be smart to visualize ROI metric or at least COST metric for particular district and optimize according to it. The process will be the same just you will use different metrics. However, there is a disadvantage for Google Ads as you can query from google analytics data like longitude, latitude and cost in one report as they are two different sources. But the way to overcome this issue is to collect this data in custom dimension and then query it alongside google ads metrics you need.
Please, if you have any question, do not hesitate to contact me. If you spot mistake, feel free to notify me, I am a human being with mistakes ? I will quickly fix any mistake. Lastly, if you have any further recommendation how to push forward this visualization, I will be happy to hear.
With respect Mayo