VideoLock increases AI use within it's platform to tackle the new wave of sophisticated pirate technology.

2019-07-08

Our systems rely on up-to-date accurate information about the domains and sites they process. This comes from our in house 'Sites System' which is a database of known domains. At a simple level, it stores which domains are infringing as well as the specific data as to their type. Deeper down, there is a wealth of data about the context of the domains. Such contextual data identifies the relationships between domains and others, historic DNS information, traffic information with the data extending right down to details about the underlying hosting architecture.

With so many new domains being created daily, primarily due to the wide-scale availability of very cheap out of the box content linking sites (DooPlay, Psyplay for example) keeping the data accurate and processing newly found domains previously required a vast amount of manual resource.

We tackled this by deploying the very first AI to be used within our systems in 2017. This technology enabled the processing of newly found domains without delay and unlimited horizontal scalability. The processing of new domains is not as simple as it might first appear, partially due to the broad range of defences on the infringing sites/platforms to obfuscate and mislead automated systems. These defences are continually updated and used by these entities solely to stop us from achieving our objective. VideoLock has seen more recently a growing number of instances whereby sites use a technique of what is best described as a 'sacrificial' content link. This is a plain content link which can very easily be found within the basic page HTML which automated systems may discover and move on without finding others that are hidden. This may happen if such a bot is expecting at least one link and potentially more but not a definite expectation. Therefore when one is found, there is an assumption if others were there then they would also be located in the same way, so it has one which was the expected minimum and it simply then moves on missing those that are hidden. However, for the user, the web page shows a player featuring multiple high-quality options to stream the content from many hosts without the 'sacrificial' link ever even being shown as an option at all.

Our analysis shows that this is happening quite often on sites using this method of protection. Analysis of Lumen data shows vendors are sending notices for both the actual referring page URL, as well as the 'sacrificial' link which all appear on the Lumen database. However, none of the content links that the user is shown appear in this data. This leads to the possibility that they are simply missed in a way similar to that previously described.

The leap forward in complexity has been within a relatively short time. From when most infringing sites had the direct lockers or stream host content links within the page HTML, these links, often unprotected, made it essential that vendors providing content protection services must also adapt and use new technology to tackle this new wave of content distribution technologies. Hybrid platforms are now are being used the most across huge traffic key infringing sites. At VideoLock, we took our use of AI further. VideoLock developed a system capable of analysing the sites. Our technologies learn from the various techniques to hide or mislead. Our systems can adapt to how multiple players or APIs are used on these sites in layers as well as using other platforms as failover options should links be taken down and then those failover options also doing the same. At times, a layer further down the chain can link back to the top layer or another higher layer, these work in any arrangement the site operator wishes, making it possible for every site to be quite different when you initially view it. The AI system has a huge advantage here over human resource for this task. It can see the patterns and elements that might make one such layer uniquely identifiable even if being masked by previously unknown domains.

What comes from this system is a way to not only tackle each of the layers that contain the players/APIs/ proxy streamed embed players, etc, but also a way to map every site and the relationships between these sources of content links. This is vital. It is a massive part of what is needed to identify which of the various distribution systems and platforms is harvesting links from others. Our data shows where links are collected from, creating a map of those entities serving as key hubs in the flow of content links. The whole ecosystem is very cannibalistic in terms of where each locates its content links, introducing a critical vulnerability and a very efficient way to block at the very roots in systems, sources seemingly unrelated to many of the well-known end-user content access entities including TeaTV, BeeTV, Orion, several Android apps inc. Vidcorn, Plusdede.