Blog! Thoughts, news, reviews, and interviews.

02

Dec 2016

Google Analytics for Webcomics: Part 3 – Excluding Specific Spammers and “Secret.Google.com” Language Spam

Posted by / in Blog / 7 comments

Alright, time for a follow up to my previous tutorial about removing “Ghost Spam” from your google analytics. This time we’ll talk about excluding specific spammers from your analytics, as well as the very recent phenomena of spammers spoofing google’s referral magic and setting the language to a weird url.

The other day I was checking my analytics, as I totally don’t always obsessively do, and noticed something strange.  I was getting a ton of visits in a short time from some weird site, lifehacker.com.

lifehacker-com

A quick check over revealed that in no was I getting featured there, and a google search revealed it was the latest in new spammers, spoofing a fake source to try to get you to click on a maliscious link of some kind. In this case, lifehacker was mispelled with an odd character that would have directed your url to an unhappy end.

I did a little bit of digging, and realized that not only were almost all of the hits listing Russia as the country of origin, but there was something VERY strange about the language listed.

lifehackerrussiasecretgoogle

That’s… not a language. And it’s a weird political statement. And it’s another carefully altered url that is probably malicious. Don’t test it ;P

For some reason, over the past month some spammer has been testing google analytics using this weird language and hopefully getting someone to try using the phony url. It was only a few here or there, and I found it mostly amusing and didn’t really think too much about it since it wasn’t seriously messing with my numbers. But a few days ago it went from a couple of hits a day to HUNDREDS of hits a day, and it started spoofing a real useful referrer of mine: reddit.

secretgooglesources

This is annoying, because not only is the number of hits reaching levels of significantly messing with my analytics, it’s overlapping with and messing with a sometimes legitimate source of traffic I get. Time to act.

So I set to creating some filters to block tracking this incoming fake traffic.

 

Exclusionary Spam Filters

filters

I talked about hosts and ghost spam in a my previous tutorial, so we won’t focus on that atm. Crawler Spam #1 and #2 I created out of a known database of common spammers to exclude.

crawlerspam1

Basically, set Custom->Exclude->Campaign Source, and then you can paste in a list of spam sources that blocking ghost spam didn’t cover for you. You know better than anyone else what spam sources you are seeing, or you can google a suggested list or two. I’ve done both, with my filter “More Spam Exclusions” detailing a couple that I saw coming through, like lifehacker.

exclude-lifehacker

Once I set that up, my analytics stopped counting any visits (see the sharp cutoff in the very first image in this post) from lifehacker, phony or real (although I doubt the real site will be bringing me any traffic so no loss there).

That felt great, but, then the spammer switched to spoofing reddit! I’m not just going to block all traffic coming from reddit. So I had to get a bit creative. Noticing that all of these spam visits were using that fake language setting, I made a filter around that.

secretgooglefilter

I just set that up, and will update with a nice picture of it’s success tomorrow, but I can tell from monitoring my real-time data that it’s working. However, all it would take is for the spammer to change it a little bit and it wouldn’t work anymore.  So, let’s make an INCLUDE filter, with only real language codes.

Update: Success! It killed all the hits from yesterday somehow even, and none today!

secretgoogleupdate

 

Inclusionary Language Filters – real languages only please

This took a bit of work, but here are the 69 language codes I’ve seen on my site over the past month across 17K sessions. I had to break it into two because the Filter Pattern field has a 255 character max.

en-us|en-gb|de|da|de-de|fr|ru|pl|en-ca|nl|es|en|en-au|ru-ru|sv|fi-fi|nb|c|cs|pt-br|nl-nl|sv-sv|hu-hu|it|nb-no|es-es|fr-fr|it-it|fi|en-ie|hu|el|en-nz|en-za|es-419|nl-be|

|es-mx|pt-pt|da-dk|es-cl|sl|tr|de-at|es-do|id|en-ai|ko|pl-pl|(not-set)|english|th|uk|zh-cn|ar|cs-cz|hr|ko-kr|zh-tw|bg|ca|de-ch|el-gr|en_us|en-sg|fr-ca|id-id|if|ja|ro|

reallnaguages1

Copy those in, add your view, and click save, and going forward you should only be accepting analytics with those languages. Of course, there’s the problem there that I may very well in the future be missing some views from people using other language codes, so if that worries you, you can create a new “View” without any of the filters, and occasionally check it and compare to see if you’re missing anything significant.

create-new-view

To switch what View you’re looking at is pretty simple also, just select the dropdown at the top left of any analytics backend page.

switch-view

Here’s a side by side comparison of my two views, showing my filters in action blocking the tracking of these fake hits. Pics taken a few seconds apart, but you can see that those spoofed “reddit” visits from Russia are no longer being tracked.

unfiltered-real-time
filtered-real-time

 

EDIT: Some Analytics SEO blogger online contacted me and told me about a simpler way to exclude all non-normal Languages from the settings using a REGEX filter string. You can read more about it here, but it seems to work, and you don’t have to worry about missing a language.

 

Using Segments to Ignore Past Spam

Now that we’ve blocked the spam, all’s well, right? Well, mostly. Unfortunately you may have thousands of spam hits in your analytics history clogging things up horribly. Not to worry, we can fix that by adding a new Segment to view in our analytics.

addsegment-1

You may not have ever paid attention to that “All Users – 100%” wheel at the top of the analytics read outs, but that there is the default Segment of your views that you’re looking at. You can actually make all sorts of Segments based on demographics, source of traffic, whatever fits your fancy. Today, I made a Segment excluding “Secret Google” language visitors.

addsegment2

Select Add Segment->+ New Segment->Language, then copy in that horrible fake language, which I’ll reproduce here for ease but please no one follow this link:

Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!

Now, when we save and apply this Segment, and compare it to All Users, we can see that it has removed all that annoying spam and given us a cleaner view of our analytics.

segmentcomparison

Now the whole Segment thing for this purpose is a little annoying, but, it will let me get a more accurate picture of my stats for this period. But a month from now, with my filters hopefully preventing the spam from showing up in the first place, I can go back to just looking at “All Users”, or using Segments for something else interesting, like tracking differences in how visitors from different countries or whatever read XD

 

Anyways, I hope this has been helpful, and can help you make your analytics a powerful tool to manage your website. If you found this useful, check out the previous Google Analytics tutorials or my tutorials on Promoting Your Webcomic!

  • Ugh, thanks for the info about this. I haven’t checked my analytics too in-depth to know if I’m getting those secret dot google messages, but I did see a spate of .xyz domains which I’m sure are spam. Recently I noticed that GA stopped tracking half the URLs from my site and I had to update my analytics code. I think they may have switched to using their universal analytics, but it probably means I can mess with event tracking for specific button clicks and such on my comic reader and around my site.

  • NickDA

    Groso!

  • Awesome, thanks for posting this. I realized today that I’ve been getting the exact same spam language in my analytics. D:

    • Man, they’re giving the secret google prize to everyone! ;)

  • Melissa J Massey

    Worked once I did the audience view thing. I ended up importing a solution from the Google Analytics Solutions gallery that worked pretty well, though sadly all but 10.5% of my traffic from the last month is spam. What a bummer. Thanks for the helpful tutorials, Dan!

    • Oof, that mega sucks, but better to rip that bandaid off and look at real numbers, methinks.

      • Melissa J Massey

        Absolutely! Now I have confidence that my reports are accurate, and that’s the most important thing.