MSK-IX / News / Statistics can be affected by how you collect data
September 20, 2023

Statistics can be affected by how you collect data

APNIC has developed a system to analyze the use of DNS resolvers with measurements based on advertising platforms. After February 27, 2022, when Google suspended ads for users from Russia, these measurements changed significantly in terms of their completeness and relevance. This significantly limited the possibility of testing various hypotheses regarding the use of the DNS system by Russian publ
Statistics can be affected by how you collect data
Head of the DNS project MSK-IX Pavel Khramtsov

Pavel Khramtsov, head of the DNS project at MSK-IX traffic exchange platform, and research project manager at the InData Foundation for Development of Networking Technologies, presented a report, Statistics of Open Resolvers: Comparison of Outside and Inside View, at the TLDCON 2023 conference.

He proposed a classic statistical problem – to determine which resolvers are used by end users. His example clearly demonstrated that measurements strongly depended on whether the actual current conditions for data collection were as described in the methodology.

A DNS resolver is a tool that finds the requested address in a distributed information system, the DNS system.

To analyze the use of DNS resolvers, APNIC has developed a system of measurements that uses advertising websites. It is based on two protocols, HTTP and DNS. A script is placed on an advertising website such as Googlе, and uploaded from APNIC’s HTTP server. The script can determine the end user’s IP-address. The script accesses an APNIC authoritative name server via a DNS resolver. Accordingly, this authoritative name server can determine the IP address of the resolver. The data is then entered into a single database where it is matched and analyzed.

“But if you look at the final picture, the statistics around the world and in Russia are markedly different. In February 2022, the amount of Russian traffic that reached APNIC script plummeted to a fraction of what it was,” Pavel Khramtsov argued.

At the same time, Google’s popularity with Russian users has not changed. It was Google that suspended ads for Russian users, and that move affected the work of the APNIC script.

In this regard, a fair question arises as to whether the data that APNIC has collected since February 27, 2022, is still relevant. If the plan was to find out which resolvers are accessed by Russian users, this data set is incomplete.

“When we analyze any data, it is always necessary to ask ourselves whether the methods we use are applicable; whether the sample is representative; and whether we have enough sources of measurements to get reliable data,” Pavel Khramtsov summed up.

dn dn