Skip to main content
Solved

Search spider requests are being logged (Yandex)

  • February 11, 2022
  • 3 replies
  • 1101 views

I’ve noticed a strange behaviour - we’ve started to receive lots of events (around 40K per day) from IP’s belonging to Yandex (russian search engine), and it looks like that their spiders run javascript code and trigger a corresponding events. And it’s ruining our analytics :(

This happens only with Yandex spider. Do you have any idea why this happened and how can we fix it

 

PS I know, that I can ban events coming from a certain IPs, but I think that the ignoring search bots should be done automatically.

 

Best answer by sydney.koh

Hi!


Currently, we only block Google’s web crawler. It identifies itself in the User-Agent string with “Googlebot”. We don’t block anything else automatically. I recommend blocking and filtering by IP Address since we do not filter or block from that search engine at this time. More information on how to block/filter data can be found here: https://help.amplitude.com/hc/en-us/articles/360016338212#h_88c3bdf1-84fd-4e14-8d00-c540d1596569

 

There are some other options that could help you but would involve changing your implementation:
 

  • Sending data into Amplitude server-side is the most direct alternative. 
  • Some customers have also routed their data through a proxy server before sending the data to Amplitude. Some documentation on how to set that up can be found here. This will remove some end user data  (remove originating IP address, location, userID, etc)

Best,

Sydney

View original
Did this topic help you find an answer to your question?

sydney.koh
Team Member
Forum|alt.badge.img+8
  • Amplitude Support
  • February 15, 2022

Hi!


Currently, we only block Google’s web crawler. It identifies itself in the User-Agent string with “Googlebot”. We don’t block anything else automatically. I recommend blocking and filtering by IP Address since we do not filter or block from that search engine at this time. More information on how to block/filter data can be found here: https://help.amplitude.com/hc/en-us/articles/360016338212#h_88c3bdf1-84fd-4e14-8d00-c540d1596569

 

There are some other options that could help you but would involve changing your implementation:
 

  • Sending data into Amplitude server-side is the most direct alternative. 
  • Some customers have also routed their data through a proxy server before sending the data to Amplitude. Some documentation on how to set that up can be found here. This will remove some end user data  (remove originating IP address, location, userID, etc)

Best,

Sydney


Forum|alt.badge.img

I have same issue russian search engine on this webite https://apkomex.com/ but the sydney.koh reply solved my issue. thanks


Thank, you solved my problem  I also finding it for a long time. its a great problem solving post. 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings