If you’ve ever seen a big spike in traffic to your site only to find that your referral reports show it’s mostly from low-quality or spammy sites, you may have been hit with referrer spam or ghost referrals.
Because this type of spam has a significant effect on the information you receive, keeping you from getting a clear picture of the health of your website, it’s important to filter out as much of this referral spam as possible. Here’s a primer on what it is and what you can do about it.
What are referrer spam and ghost referrals?
Here’s the most basic definition of this problem:
referrer spam (n.): traffic to your site from bots and crawlers that impersonate a referral visit or referral link
There are two main types of referrer spam:
Crawlers and bots: These are the web crawlers and robots that visit sites. Normally, they’re harmless and are used to index your site’s pages and content. Good bots and crawlers typically identify themselves and as such do not show up in your analytics reports. But the not-so-good guys? The spammy bots do not identify themselves and your analytics records them as a visit with a 100% bounce rate—leaving without spending any time (less than a second) on your site.
Ghost referrals: These aren’t like Casper the Friendly Ghost. Evil might be too strong a word for these types of referrals, but they’re certainly bad. Ghost referrals never actually visit your site (hence the “ghost” moniker). Spammers have found a way to exploit Google Analytics servers and spoof a session on your site—they’re even able to spoof organic visits and events. They’re able to do this by guessing your analytics code (UA-XXXXXX-1) and tricking analytics into thinking they hit your site even though they never went near you.
Why would anyone want to send you fake traffic?
The general idea is to trick anyone who reads their analytics reports into clicking through and visiting the spammer’s site. The referral works similar to how a link would work.
Here’s how Matt Cutts, Head of Google Webspam, describes it:
“A referrer is just a simple HTTP header that is passed along when a browser goes from one page to another page, and is normally used to indicate where the user is coming from. Now people can use that and change the referrer to be anything they want … some people will set the referrer to be a page they want to promote and then they will just visit tons of pages around the web—all the people who look at their referrers see that and say ‘Oh, maybe I should go check that out’ … [and] whenever there’s a link, it doesn’t mean there was necessarily a link … there are some people who try to drive traffic by visiting a ton of websites, even with an automated script, and setting the referrer to be the URL they want to promote …”
So ghost and referrer spam may be promoting a site and trying to drive clicks and traffic to it. That’s not the only way a spammer can benefit, though. A more vile reason for sending fake traffic is to possibly harm a competitor’s site—think how angry you’re going to be when you see the URLs coming into your analytics from referrer spam. Are you going to have good thoughts about those sites?
The bad thing is, referrer spam cannot be authenticated—you cannot track it back to its actual source. Spammers can mask the real URL and make it look as if it’s someone else’s. With that in mind, think about how it could be used to harm the site and reputation of a spammer’s competitor.
Another more likely and more damaging reason that spammers use referrals is to expose you to malware and other software designed to steal your information.
The moral of the story: Don’t click through to the sites listed in your referral reports.
How to block referrers in your analytics
OK, we know why you’re still reading this … you’re dying to know how to block these annoying spammers and keep them from jacking up your analytics.
The answer: Block it before it can get to your site.
There are really two ways to get rid of referral traffic from your analytics: One is to use filters in your analytics, and the other is to block it altogether.
Consider this: Filters are simply denying the data to you in your reports. This doesn’t mean the traffic and referrals aren’t still visiting your site; it just means that after a visit occurs, the referral isn’t recorded in your reports.
This is more like a band-aid solution rather than a true solution, however. The best solution is to block the referral sites altogether.
Using .htaccess to block referrer spam
You’ll have to edit your .htaccess file with the following code:
## STOP REFERRER SPAM
RewriteCond %{HTTP_REFERER} example\.com [NC,OR]
RewriteCond %{HTTP_REFERER} example2\.com [NC,OR]
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteRule .* – [F]
Copy that code above, replace ‘example\.com’ with the URL you want to block and add that to your .htaccess file. You can copy and add a “RewriteCond” line for every URL you want to block (make sure to include the “\” before the dot).
Using filters to block ghost referrals
While blocking via .htaccess may be the best solution against crawlers accessing your site, most of these spammers have gotten smart and use ghost referrers now.
If you recall, ghosts aren’t actually visiting your site, and if they’re not making a visit and crawling your site, there’s nothing to block via .htaccess.
This is where filtering comes in. There are quite a few tutorials online discussing different filters and methods for getting rid of ghost referrals but, unfortunately, many of them are confusing or contradictory, and some are just plain wrong.
The easiest and quickest method to implement is by excluding invalid Hostnames.
The first thing to do is to filter your analytic traffic by Hostname. Ghost referrals show up as an invalid Hostname in your analytics—typically a valid Hostname is going to be your domain name (i.e., yourdomain.com).
To find out which hostnames are associated with your site (and determine valid and invalid hostnames), log in to your analytics account.
- Click on the “Reporting” tab
- Click on “Audience”
- Click the dropdown for “Technology” then click “Network”
- Under “Primary Dimension” choose “Hostname”
You will now see a list of hostnames that sent traffic to your site during the period you are checking. The hostnames you determine are invalid (essentially anything not associated with your domain) are what you are going to block with filters.
Hostname exclusion filter
Please note: We recommend that before you fully apply these filters, you perform a test to ensure you’re blocking the right information. Set up a test view or copy of your data in analytics and apply these filters to the test view first. Let them run for a few weeks to compare and contrast the traffic being blocked. Once you’re sure your filter is set up correctly, apply it to your regular analytics view.
This is probably the best and only filter you really need.
The big issue with this filter is twofold: Some spammers are able to spoof your Hostname and will still be able to bypass it, and if you don’t properly set up your filter, you may exclude valid Hostnames (and data) accidentally.
But, it’s by far the best option for getting rid of ghost referrals to date.
- Access your Google Analytics account
- Click “Admin” tab
- Click “View”
- Click the dropdown for your views. If you do not have multiple views set up, we recommend setting up a “No Filter” view and a “Ghost Referral Filter” or “Test View” (see our note at the top of this section).
- Choose the “Test” or Ghost Referrer Filter” view to work with
- Click on “Filters”
- Click “+ Add Filter” to create a new filter
- Name it something like “Referrrer Spam,” “Valid Hostnames,” or “Hostname Filter”
- For Filter Type, select “Custom”
- Click “Include” and for Filter Field use “Hostname” (Note: The reason you are going to use an “include” filter as opposed to “exclude” is because with an inclusion filter you won’t have to constantly update this as more spammy hostnames come online—your inclusive filters should block out all the ghosts)
- In “Filter Pattern” you are going to use Regular Expression (RegEx) to list all of the inclusive Hostnames which primarily are your canonical domain and any subdomain if you have one
- Your RegEx pattern should look similar to: yoursite.com|sub.yoursite.com (basically any domain that sends valid host traffic to your site)
- Do not add spaces between terms. Instead, use a pipe (|) to separate terms.
- Click “Verify” to make sure the filter is working properly
- Verification will show you a table with the last seven days’ worth of data and the Hostnames appearing before and after the filter is applied. Make sure the Hostnames being blocked are correct before saving.
- Click “Save”
A big note here: this filter will only affect your reporting going forward—it does not apply to historical data. To see your past analytics with Ghost Referrals filtered out, you will have to do so by using advanced edits on the Non-Filtered view of your site in your reporting screen.
Now, check this filter view over the next few weeks to make sure the proper traffic is being filtered out. Once you are comfortable the Hostname filter is doing it’s job properly, you can apply the filter to your regular view or simply leave this view in place and use it for determining valid analytic data.
There you go: The two best ways to block all the spammy referral traffic your site is getting hit with on a daily basis.
It should be noted that this will not solve the issue entirely, as that is out of all of our hands. The only real solution can come from Google, as it’s their analytics that is being hit, but until they come up with a true solution, this is your best option for cleaning up your reporting.
Brian Valentin
Latest posts by Brian Valentin (see all)
- Dominate Your Local Market: A Comprehensive Guide to Local SEO for Plastic Surgeons - February 23, 2023
- Google Business Profiles for Lawyers - December 29, 2022
- Understanding Where Your Traffic is Coming From with UTM Tracking Codes - January 5, 2022
Myilraj G says
Hello admin,
Thanks for this post. I have employed this method on my blog.
Generally, one domain per line. Think if we have more than 300domains, probably the file size of the htaccess will be more.
Is there any alternate solution for this issue?