The editorial department of well-known Dutch broadcaster RTL News recently asked for my assistance. Multiple tax files of Dutch citizens had been published via www.docplayer.nl and no one could explain – where did these files come from? Who would have thought that my research would lead to the discovery of one of the world’s most frequently visited websites (!).

TL;DR

Watch the 4 minute RTL News item (in Dutch) on our joint investigation.

Exploring the site

I started looking at the site, which basically contains a lot of PDF files and a search form:

When navigating through the site, I noticed that it contained a lot of PDF files that shouldn’t be there and the diversity of files is enormous. Almost each file on the site is uploaded by a different user. The users seem to be fake. The site is very simple. You can search and view a file, you can register yourself and upload a file. That’s it. Nothing more, nothing less. You can build such a site in a week.

Advertisements everywhere

Each document on the site is accompanied by advertisements:

Registering as a new user

To learn more about the site I registered myself as a new user. I filled in a non-existent mail address, password a, and I was all set! Afterwards a very minimalistic user-interface is displayed:

I tried uploading Word, PDF and PowerPoint files in different browsers but couldn’t  upload them. The interface and functionality is very basic and partially broken. I get the impression that the owner doesn’t want users to actually use it. It just contains an upload button after you log in, and then you’ll get intentionally demotivated by a malfunctioning upload button.

Documents scraped from other sites

I was wondering how many documents were stored on the site, so I asked Google.  It seemed that Google indexed 375.000 web pages. That’s quite a lot! From looking through these documents it was clear that this site was copying (scraping) these documents from other sites.

I even found my own hacking guide that I wrote in 2004 when I was in high school! It has been viewed 290 times in the past 2 years. So that’s 290 visitors that haven’t visited my weblog. This is now getting personal.

Business case

If you host a search engine optimized site with 375.000 PDF files, then you’ll attract a lot of visitors. The average click-through-rate for advertisements on the Google Adwords display network is 0,35%. That means that 3.5 clicks will be generated per 1,000 visitors per advertisement. With 4 advertisements placed on docplayer.nl, it might drive up the click-through percentage towards 1%.

The average price-per-click for advertisers on the Adwords network is between $0.5 and $1. This revenue is split between Google and the website owner that hosted the advertisement.

Total estimated visitors & ad revenue per month

According to Alexa the site is ranked as the 209,334 most visited site in the world, and the 3,945 most popular site in The Netherlands. Not bad! 59% of the visitors seem to be Dutch and 24% Belgium. This is logical because the site contains mostly Dutch content. Unfortunately Alexa doesn’t have intelligence on the amount of visitors for this site.

Another site that estimates traffic data is Informer.com. They estimate that docplayer.nlreceives 160,230 unique visitors a month, while ChkWorth.com states they receive 264,753 and SimilarWeb.com states 416,640 visitors a month. It will probably be something in between these numbers.

The estimated advertisement revenue is $988 according to ChkWorth.com. Not a lot.

So who’s behind the site?

Besides a lot of PDF files, the site contains a privacy policy, terms of service and feedback form. The only contact information on the site is found in the terms of service:

And a bit further:

According to the terms of service the website owner is DocPlayer Inc. and based in Virginia. I started googling but couldn’t find a company called DocPlayer Inc. and no one is talking about this company, like it doesn’t exist.

Whois to the rescue!

If your register a domain name, then you have to supply information about who your are and where you live. This information will then be submitted to an open domain name ownership registration database which can be queried. Registration information of docplayer.nl revealed that someone called Vladimir Nesterenko living in Moscow is owner. Doesn’t sound like an American company to me!

The website domaintools.com offers the neat possibility (for paid users) to search for which domain names someone owns. So I searched for all the domain names that belong to Vladimir Nesterenko. Together with some further digging a lot of new domain names appeared related to this platform:

That are quite a lot of domain names! 54 to be precise, in 19 different countries. This enterprise is way bigger than I initially thought!

DocPlayer & SlidePlayer

I started visiting each site and eventually understood that there were two platforms here. One for displaying PDF files called DocPlayer, and one that displays PowerPoint presentations called SlidePlayer.

Each platform spiders websites in a specific country and looks for PDF and PowerPoint files, copies them, and orders all the files based on the language they’re written in. All the Dutch files will be broadcasted via docplayer.nl, all France content via docplayer.fr, etc. This strategy is excellent for getting these sites to score high in search engines: it’s localized per country, contains a lot of content in the same language and no content is duplicated across these sites.

Obfuscated Google Analytics code

None of the individual sites linked to another. The owner behind the platform took careful steps to mask the international reach of his platform. I even found obfuscated Google Analytics code to hide the analytics IDs and domain names that were in use by Doc-/SlidePlayer:

If you de-obfuscate the JavaScript code above, you’ll see that it contains Google Analytics configuration code, including IDs:

I’ve omitted some of the JavaScript above to not make the list too large. The following Analytics IDs per domain could be extracted from the de-obfuscated code:

I checked if I could fill in the missing IDs in above list by performing a reverse look-up on other sites belonging to Google Analytics account UA-34773609, but unfortunately I couldn’t find anything useful.

Oh, so you want to download a presentation? Hold on!

I explored the SlidePlayer website further. What made me laugh is that if you want to download a PowerPoint file, you first have to click on a share button:

Next, the ‘download’ button becomes active. When you click it, you have to solve a puzzle:

And to finish you need to wait for 60 seconds:

If you’ve made it through the whole process, you’re rewarded with a downloadable PowerPoint file. Now that’s a hell of a customer journey!

Of course these hurdles are there to make you go away. They purposefully create a bad customer experience so you don’t pay attention to the site.

The ideal visitor according to DocPlayer

From the website perspective, the ideal visitor comes from Google and lands on a webpage that hosts a PDF or PowerPoint file. Hopefully the visitor clicks on an advertisement surrounding it and leaves the site. The worst case scenario is that the visitor stays on the site and creates an account and starts using the platform. This will generate attention to the site. Visitors might become aware with what’s going on, and that’s the last thing the owner wants. His platform is full of PDF files that are copied from other sites and re-hosted. This is illegal and if someone creates fuzz about this, it could be the end of his business.

Reported sensitive documents are taken down

Doc- and SlidePlayer have a complaint form attached to each document. If the spider copied documents from other sites that shouldn’t be on the internet in the first place, the owner of those documents won’t be happy if these documents are re-hosted on the internet by Doc-/SlidePlayer, made searchable by Google and archived by the Internet Archive.

RTL News found out that the staff behind Doc- and SlidePlayer respond quickly to requests to take down sensitive content from the sites. They seem to have absolutely no interest in hosting sensitive files, as this draws negative attention to them that could blow the whole cover of their operation.

And that just happened, because RTL News got a complaint from someone that sensitive tax files were hosted on the site, they decided to ask me to join the team to get to the bottom of this.

An empire arises

So how many visitors a month and thus how much money are these two platforms generating exactly?

To get a somehow reliable estimation of the impact of this operation, I started noting down statistics in a spreadsheet that I copied from other sites such as Google, Alexa, Informer, ChkWorth and SimilarWeb that analyze and track the popularity of websites:

 


Very interesting statistics arise

  • 45 domain names are in active use in 19 different countries.
  • 42 dedicated servers in Germany run the whole operation.
  • 24,3 million PDF and PowerPoint files are hosted on all sites combined.
  • These sites have at least 12.843 incoming links.
  • 23 to 29 million unique monthly visitors for all the sites combined.
  • Estimated 100 million page views per month.
  • The sites generate a roughly estimated add revenue of $92,210 each month.
  • slideplayer.com is ranked as the 6,047 and myshared.ru is ranked as 11,806 most visited site in the world. 11 other sites are also ranked in the top 100,000 list

Looking at the amount of dedicated servers that support the infrastructure, and the fact that multiple sources all roughly report the same statistics, it’s safe to say that some serious money is being made with this simple but very scalable and effective infrastructure.

Meet Vladimir Nesterenko

When looking at the ownership information of all the domain names, on name keeps popping up:

Some whois information is anonymized, but most isn’t. The same phone number and address in Moscow is listed on all domains where these properties are visible. According to another source, a public address book at locatefamily.com, someone under the name Vladimir S. Nesterenko is living at Snayperskaya st, 2-1-31 in Moscow. Vladimir lives according to Google Streetview it’s an apartment complex far away from the Moscow city center:

A correspondent from RTL News paid him a visit in Moscow, but he wasn’t home. People around there confirmed he lived there.

Back to the terms of agreement and privacy policy

The privacy policy and terms of agreement on Doc-/SlidePlayer is extensive and looks professional. I bet they copied that one also! I copied a few lines from the privacy policy and terms of agreement and found out that they copied those from slideboom.com. They also copied their logo and slightly modified it:

The resemblance between the SlideBoom and Doc-/SlidePlayer logos is remarkable:

I bet our Vladimir got the idea of creating a website that hosts PowerPoint files from visiting SlideBoom.com. But instead of waiting for a long time for users to upload content, Vladimir took the shortcut and just copied all the PowerPoints he could find on the internet.

Has He Been Pwned?

The whois information contained two e-mail addresses mustaf@list.ru and seorent@gmail.com. I searched for hits on these addresses in known data breaches on haveibeenpwned.com.

mustaf@list.ru is hit in the Exploit.In and VK data breach, and seorent@gmail.com is also hit in the Exploit.In, Onliner Spambot and more interestingly: the Bitcoin Forum and BTC leak. The Bitcoin exchange BTC-E was hacked in 2014 and 568k accounts were exposed. The data included email and IP addresses, wallet balances and hashed passwords.

If you earn $92,210 each month by hosting illegally 24.3 million PDF files, that’s a hard story to sell to the tax authorities. Bitcoin is a way stealthier way of storing wealth, and our Vladimir seems to be well aware of that.

Google: partners in crime

Google is partners in crime with Vladimir. They bring him all the visitors and split the cut in advertisement revenue. Google also profits when people click on their advertisements. They’ve made millions in the last few years hosting their ads on Doc-/SlidePlayer.

RTL News contacted Google spokespersons but they didn’t want to look into this matter and seems to be fine with the current situation. If nobody complains further, they earn half a million dollar a year, so why take these reports serious? Media is not law enforcement.

As RTL couldn’t get through, I also tried contacting Google. Their spokesperson doesn’t want to comment on the matter. It seems Google is fine with the situation. Why bother? It’s very profitable!

Rounding up

What started with a few tax files that were hosted on docplayer.nl, let to the discovery of an empire that makes a million dollar a year by illegally hosting 24.3 million files copied from other sites. This cover/fake site is an elephant in the room that nobody is aware off.

I think it’s wrong what these guys are doing. They’re basically stealing 30 million visitors a month from sites that authored the original content, which results in at least 1 million dollar combined that is stolen per year from all those websites that got copied.

Now the mystery behind the site is solved, I reported back to RTL. Today they presented our research on Dutch national TV and their website.

Update October 9, 2017

Dutch political party D66 asked Dutch minister to take action
Questions from member Verhoeven (D66) to the Minister of the Interior and Kingdom Relations about the news item ‘Russian whizkid gets rich from your documents’:

  1. Bent u bekend met het bericht ‘Russische whizzkid wordt rijk door jouw documentjes’?
  2. Klopt het dat de eigenaar de bestanden op een illegale manier heeft verkregen?
  3. Welke acties bent u Voorhees te nemen tegen deze website?
  4. Bent u bereid om met Google in overleg te gaan om actie te ondernemen tegen deze website?
  5. Bent u zich ervan bewust dat in sommige Kamerstukken gelinkt wordt naar de betreffende website? Bent u bereid ervoor te zorgen dat dit in de toekomst niet meer gebeurt?

Update October 11, 2017

Another Dutch political party asks minister questions
Questions from member Bruins Slot (CDA) to the Minister of the Interior and Kingdom Relations about the news item ‘Russian whizkid gets rich from your documents’

  1. Heeft u het item van RTL nieuws over docplayer.nl gezien?
  2. Hoe kan het dat op deze site verschillende belastingaangiftes met burgerservicenummer staan? In hoeverre mag iemand andermans persoonlijke informatie op zijn eigen site zetten?
  3. Hoe komt Nederlandse content op een klaarblijkelijk door een Rus beheerde site terecht?
  4. Welk gevaar bestaat er dat de beschikbare documenten met burgerservicenummer en namen tot identiteitsfraude leiden?
  5. Welke risico’s zijn er dat via deze site snel virussen kunnen worden verspreid? Hoe wenselijk is het dat de overheid deze site ook gebruikt bij het maken van verwijzingen in Kamerstukken (bijvoorbeeld Kamerstuk 34595, nr. 33, p. 5)?
  6. Welke mogelijkheden zijn er vanuit de overheid om ongewenste content, zoals ingevulde belastingaangiftes met burgerservicenummer of het personeelsblad van de inlichtingentak van de Militaire Inlichtingen- en Veiligheidsdienst, van de site te halen?
  7. Welke mogelijkheden zijn er voor individuen, zo mogelijk gesteund door de overheid, om ongewenste content, zoals ingevulde belastingaangiftes met burgerservicenummer, van de site te halen?
  8. Welke verantwoordelijkheid kan hier van Google verwacht worden? Heeft de Nederlandse overheid mogelijkheden om Google tot actie over te laten gaan? Zo nee, waarom niet?

Bron: Blog Sijmen Ruwhof

About the Author: Dennis Nuijens

Dennis Nuijens