Search Engine Optimization

Utilizing Python to get well web optimization web site visitors (Half one)

Serving to a consumer get well from a nasty redesign or web site migration might be one of the crucial jobs you possibly can face as an web optimization.

The standard strategy of conducting a full forensic web optimization audit works effectively more often than not, however what if there was a option to pace issues up? You would probably save your consumer some huge cash in alternative price.

Final November, I spoke at TechSEO Increase and introduced a way my group and I recurrently use to investigate visitors drops. It permits us to pinpoint this painful drawback rapidly and with surgical precision. So far as I do know, there are not any instruments that at present implement this method. I coded this resolution utilizing Python.

That is the primary a part of a three-part collection. Partly two, we’ll manually group the pages utilizing common expressions and partly three we’ll group them routinely utilizing machine studying strategies. Let’s stroll over half one and have some enjoyable!

Winners vs losers

SEO traffic after a switch to shopify, traffic takes a hit

Final June we signed up a consumer that moved from Ecommerce V3 to Shopify and the web optimization visitors took an enormous hit. The proprietor arrange 301 redirects between the previous and new websites however made a lot of unwise adjustments like merging a lot of classes and rewriting titles in the course of the transfer.

When visitors drops, some components of the location underperform whereas others don’t. I wish to isolate them to be able to 1) focus all efforts on the underperforming components, and a couple of) be taught from the components which are doing effectively.

I name this evaluation the “Winners vs Losers” evaluation. Right here, winners are the components that do effectively, and losers those that do badly.

visual analysis of winners and losers to figure out why traffic changed

A visualization of the evaluation appears to be like just like the chart above. I used to be capable of slim down the difficulty to the class pages (Assortment pages) and located that the principle difficulty was brought on by the location proprietor merging and eliminating too many classes in the course of the transfer.

Let’s stroll over the steps to place this sort of evaluation collectively in Python.

You may reference my fastidiously documented Google Colab pocket book right here.

Getting the information

We wish to programmatically evaluate two separate time frames in Google Analytics (earlier than and after the visitors drop), and we’re going to make use of the Google Analytics API to do it.

Google Analytics Question Explorer supplies the best strategy to do that in Python.

Head on over to the Google Analytics Question Explorer
Click on on the button on the prime that claims “Click on right here to Authorize” and comply with the steps supplied.
Use the dropdown menu to pick out the web site you wish to get knowledge from.
Fill within the “metrics” parameter with “ga:newUsers” to be able to observe new visits.
Full the “dimensions” parameter with “ga:landingPagePath” to be able to get the web page URLs.
Fill within the “phase” parameter with “gaid::-5” to be able to observe natural search visits.
Hit “Run Question” and let it run
Scroll right down to the underside of the web page and search for the textual content field that claims “API Question URI.”
Verify the field beneath it that claims “Embrace present access_token within the Question URI (will expire in ~60 minutes).”
On the finish of the URL within the textual content field it’s best to now see access_token=string-of-text-here. You’ll use this string of textual content within the code snippet under as  the variable referred to as token (ensure to stick it contained in the quotes)

Now, scroll again as much as the place we constructed the question, and search for the parameter that was stuffed in for you referred to as “ids.” You’ll use this within the code snippet under because the variable referred to as “gaid.” Once more, it ought to go contained in the quotes.
Run the cell when you’ve stuffed within the gaid and token variables to instantiate them, and we’re good to go!

First, let’s outline placeholder variables to move to the API

metrics = “,”.be part of([“ga:users”,”ga:newUsers”])

dimensions = “,”.be part of([“ga:landingPagePath”, “ga:date”])

phase = “gaid::-5”

# Required, please fill in with your personal GA info instance: ga:23322342

gaid = “ga:23322342”

# Instance: string-of-text-here from step eight.2

token = “”

# Instance or

base_site_url = “”

# You may change the beginning and finish dates as you want

begin = “2017-06-01”

finish = “2018-06-30”

The primary operate combines the placeholder variables we stuffed in above with an API URL to get Google Analytics knowledge. We make extra API requests and merge them in case the outcomes exceed the 10,000 restrict.

def GAData(gaid, begin, finish, metrics, dimensions, 

           phase, token, max_results=10000):

  “””Creates a generator that yields GA API knowledge 

     in chunks of measurement `max_results`”””

  #construct uri w/ params

  api_uri = “”




  # insert uri params

  api_uri = api_uri.format(










  # Utilizing yield to make a generator in an

  # try and be reminiscence environment friendly, since knowledge is downloaded in chunks

  r = requests.get(api_uri)

  knowledge = r.json()

  yield knowledge

  if knowledge.get(“nextLink”, None):

    whereas knowledge.get(“nextLink”):

      new_uri = knowledge.get(“nextLink”)

      new_uri += “&access_token=”.format(token=token)

      r = requests.get(new_uri)

      knowledge = r.json()

      yield knowledge

Within the second operate, we load the Google Analytics Question Explorer API response right into a pandas DataFrame to simplify our evaluation.

import pandas as pd

def to_df(gadata):

  “””Takes in a generator from GAData() 

     creates a dataframe from the rows”””

  df = None

  for knowledge in gadata:

    if df is None:

      df = pd.DataFrame(


          columns=[x[‘name’] for x in knowledge[‘columnHeaders’]]



      newdf = pd.DataFrame(


          columns=[x[‘name’] for x in knowledge[‘columnHeaders’]]


      df = df.append(newdf)

    print(“Gathered rows”.format(len(df)))

  return df

Now, we are able to name the features to load the Google Analytics knowledge.

knowledge = GAData(gaid=gaid, metrics=metrics, begin=begin, 

                finish=finish, dimensions=dimensions, phase=phase, 


knowledge = to_df(knowledge)

Analyzing the information

Let’s begin by simply getting a have a look at the information. We’ll use the .head() technique of DataFrames to try the primary few rows. Consider this as glancing at solely the highest few rows of an Excel spreadsheet.


This shows the primary 5 rows of the information body.

A lot of the knowledge is just not in the suitable format for correct evaluation, so let’s carry out some knowledge transformations.

First, let’s convert the date to a datetime object and the metrics to numeric values.

knowledge[‘ga:date’] = pd.to_datetime(knowledge[‘ga:date’])

knowledge[‘ga:users’] = pd.to_numeric(knowledge[‘ga:users’])

knowledge[‘ga:newUsers’] = pd.to_numeric(knowledge[‘ga:newUsers’])

Subsequent, we’ll want the touchdown web page URL, that are relative and embrace URL parameters in two extra codecs: 1) as absolute urls, and a couple of) as relative paths (with out the URL parameters).

from urllib.parse import urlparse, urljoin

knowledge[‘path’] = knowledge[‘ga:landingPagePath’].apply(lambda x: urlparse(x).path)

knowledge[‘url’] = urljoin(base_site_url, knowledge[‘path’])

Now the enjoyable half begins.

The aim of our evaluation is to see which pages misplaced visitors after a selected date–in comparison with the interval earlier than that date–and which gained visitors after that date.

The instance date chosen under corresponds to the precise midpoint of our begin and finish variables used above to assemble the information, in order that the information each earlier than and after the date is equally sized.

We start the evaluation by grouping every URL collectively by their path and including up the newUsers for every URL. We do that with the built-in pandas technique: .groupby(), which takes a column identify as an enter and teams collectively every distinctive worth in that column.

The .sum() technique then takes the sum of each different column within the knowledge body inside every group.

For extra info on these strategies please see the Pandas documentation for groupby.

For individuals who is perhaps acquainted with SQL, that is analogous to a GROUP BY clause with a SUM within the choose clause

# Change this relying in your wants

MIDPOINT_DATE = “2017-12-15”

earlier than = knowledge[knowledge[‘ga:date’] < pd.to_datetime(MIDPOINT_DATE)]

after = knowledge[knowledge[‘ga:date’] >= pd.to_datetime(MIDPOINT_DATE)]

# Visitors totals earlier than Shopify swap

totals_before = earlier than[[“ga:landingPagePath”, “ga:newUsers”]]


totals_before = totals_before.reset_index()

                .sort_values(“ga:newUsers”, ascending=False)

# Visitors totals after Shopify swap

totals_after = after[[“ga:landingPagePath”, “ga:newUsers”]]


totals_after = totals_after.reset_index()

               .sort_values(“ga:newUsers”, ascending=False)

You may test the totals earlier than and after with this code and double test with the Google Analytics numbers.

print(“Visitors Totals Earlier than: “)

print(“Row rely: “, len(totals_before))

print(“Visitors Totals After: “)

print(“Row rely: “, len(totals_after))

Subsequent up we merge the 2 knowledge frames, in order that we have now a single column similar to the URL, and two columns similar to the totals earlier than and after the date.

We have now totally different choices when merging as illustrated above. Right here, we use an “outer” merge, as a result of even when a URL didn’t present up within the “earlier than” interval, we nonetheless need it to be part of this merged dataframe. We’ll fill within the blanks with zeros after the merge.

# Evaluating pages from earlier than and after the swap

change = totals_after.merge(totals_before, 



                            suffixes=[“_after”, “_before”], 


change.fillna(zero, inplace=True)

Distinction and proportion change

Pandas dataframes make easy calculations on entire columns straightforward. We are able to take the distinction of two columns and divide two columns and it’ll carry out that operation on each row for us. We are going to take the distinction of the 2 totals columns, and divide by the “earlier than” column to get the p.c change earlier than and after out midpoint date.

Utilizing this percent_change column we are able to then filter our dataframe to get the winners, the losers and people URLs with no change.

change[‘difference’] = change[‘ga:newUsers_after’] – change[‘ga:newUsers_before’]

change[‘percent_change’] = change[‘difference’] / change[‘ga:newUsers_before’]

winners = change[change[‘percent_change’] > zero]

losers = change[change[‘percent_change’] < zero]

no_change = change[change[‘percent_change’] == zero]

Sanity test

Lastly, we do a fast sanity test to guarantee that all of the visitors from the unique knowledge body remains to be accounted for in spite of everything of our evaluation. To do that, we merely take the sum of all visitors for each the unique knowledge body and the 2 columns of our change dataframe.

# Checking that the whole visitors provides up

knowledge[‘ga:newUsers’].sum() == change[[‘ga:newUsers_after’, ‘ga:newUsers_before’]].sum().sum()

It needs to be True.


Sorting by the distinction in our losers knowledge body, and taking the .head(10), we are able to see the highest 10 losers in our evaluation. In different phrases, these pages misplaced probably the most whole visitors between the 2 intervals earlier than and after the midpoint date.


You are able to do the identical to overview the winners and attempt to be taught from them.

winners.sort_values(“distinction”, ascending=False).head(10)

You may export the shedding pages to a CSV or Excel utilizing this.


This looks like loads of work to investigate only one web site–and it’s!

The magic occurs whenever you reuse this code on new purchasers and easily want to exchange the placeholder variables on the prime of the script.

Partly two, we’ll make the output extra helpful by grouping the shedding (and successful) pages by their sorts to get the chart I included above.

Wish to keep on prime of the most recent search tendencies?

Get prime insights and information from our search specialists.

Associated studying

SEO tips tools guides 2018

A roundup of fan favourite articles on web optimization ideas and instruments from 2018. Feat: Google Analytics guides, meta tag tutorials, SPAs, key phrases, native web optimization, and extra.

guide to google analytics terms

An outline of all the principle Google Analytics phrases it is advisable know for web optimization. Numerous sources included for additional studying!

A better have a look at the highest Google Search tendencies for 2018 reveals that queries are extra direct, particular, private and even conversational.

SEO travel mistakes to avoid in 2019

In-depth information for all issues web optimization within the journey business for 2019. Index bloat, on-site search, 404 pages, meta titles, and extra. Widespread errors and fixes.

Wish to keep on prime of the most recent search tendencies?

Get prime insights and information from our search specialists.

Supply hyperlink

Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *