DNS Data - Satellite¶
Satellite/Iris is Censored Planet’s remote measurement technique that detects DNS interference using Open DNS resolvers. Below, we provide an overview of Satellite and its data format. Refer to our academic papers for in-depth details about Satellite.
Satellite-v2.2-raw¶
To provide raw data for easy data analysis, we made the following changes:
Split data based on the country of resolvers so that it is easier to select and download data according to users’ country of interest.
Separated the data collection phase and data analysis phase. Right now the Satellite data from our raw measurement data website is truthful to the data collected without further analysis. We deprecated the “anomaly” field since there are misunderstandings that anomaly represents censorship.
Added new data containing further metadata fields and flattened nested data for easy analysis. Modified field names for disambiguation purposes.
domain
StringThe test domain being queried.
domain_is_control
BooleanEquals true if the queried domain is the root server for liveness test.
test_url
StringThe URL of the queried domain.
date
StringThe date of the measurement.
start_time
StringThe start time of the measurement.
end_time
StringThe end time of the measurement.
resolver_ip
StringThe IP address of the vantage point (a DNS resolver).
resolver_name
StringThe hostname of the vantage point.
resolver_is_trusted
BooleanEquals true if the resolver is a control resolver.
resolver_netblock
StringThe netblock the vantage point belongs to.
resolver_asn
StringThe AS number of the AS the vantage point resides in.
resolver_as_name
StringThe name of the AS the vantage point resides in.
resolver_as_full_name
StringThe full name of the AS the vantage point resides in.
resolver_as_class
StringThe class of the AS the vantage point resides in.
resolver_country
StringThe country the vantage point resides in.
resolver_organization
StringThe IP organization the vantage point resides in.
received_error
StringFlatten error messages from the received responses.
received_rcode
IntegerFlatten rcode from the received responses. Response code mapping to success (0) or errors (-1 for connection error, > 0 for errors specified in RFC 2929).
source
StringTar file name of the measurement.
answers
JSON objectThe resolver’s returned answers for queried domain.
ip
: StringReturned IP.
asn
: StringThe AS number of the AS the returned IP resides in.
as_name
: StringThe AS name of the AS the returned IP resides in.
censys_http_body_hash
: StringThe hash of the HTTP body from Censys.
censys_ip_cert
: StringThe hash of the TLS certificate from Censys.
http_error
: StringParsed HTTP page error message from
fetch
module.
http_response_status
: StringParsed HTTP page status code from
fetch
module.
http_response_headers
: StringParsed HTTP page headers from
fetch
module.
http_response_body
: StringParsed HTTP page body from
fetch
module.
https_error
: StringParsed HTTPS page error message from
fetch
module.
https_response_status
: StringParsed HTTPS page status code from
fetch
module.
https_response_headers
: StringParsed HTTPS page headers from
fetch
module.
https_response_body
: StringParsed HTTPS page body from
fetch
module.
https_tls_version
: StringParsed TLS version from
fetch
module.
https_tls_cipher_suite
: StringParsed TLS cipher suite from
fetch
module.
https_tls_cert
: StringParsed TLS certificate from
fetch
module.
https_tls_cert_common_name
: StringParsed common name field from TLS certificate.
https_tls_cert_alternative_names
: StringParsed alternative name field from TLS certificate.
https_tls_cert_issuer
: StringParsed issuer field from TLS certificate.
https_tls_cert_start_date
: StringParsed start date of the TLS certificate.
https_tls_cert_end_date
: StringParsed end date of the TLS certificate.
Satellite-v1 (deprecated)¶
Figure - Overview of Satellite-v1
Satellite-v1 is the first version of Satellite that we operated from August 2018 - February 2021. The primary function of Satellite is to detect incorrect DNS resolutions from open DNS resolvers in many countries.
From a measurement machine at the University of Michigan, we send a DNS query for a website whose reachability we’re interested in, to an open DNS resolver in a country of interest (1). The response from the DNS resolver is our Test IP (2).
We also send a DNS query for the same website to trusted control resolvers (3), and record their response as the control IP (4).
We then compare the test and control responses using several heuristics, including a direct IP address comparison, and comparison of the AS number, AS names, HTTP content hashes, and TLS certificates associated with the test and control IP addresses (5). Satellite-v1 only labels a measurement as an anomaly when all of the heuristics mismatch.
Our various publications and reports have used Satellite-v1 to detect many cases of DNS manipulation. For instance, in our recent investigation into the filtering of COVID-19 websites , Satellite-v1 found many networks using website filtering products to manipulate DNS responses of COVID-related websites.
Limitations¶
Although Satellite-v1 was extremely useful in detecting DNS interference at large scale, it suffered from several limitations, which form the improvements in Satellite-v2.x.
Satellite-v1 could not detect DNS censorship where A records were not available i.e. Satellite-v1 primarily focused on detecting incorrect DNS resolutions through the resolved IP address, and did not contain heuristics to measure DNS manipulation which manifested through timeouts, NXDOMAIN responses, SERVFAIL responses, etc.
Satellite-v1 required post-processing to remove false positives and confirm the presence of anomalies, such as through using post-measurement heuristics and blockpage regexes. Satellite-v2 has the inbuilt capability to perform most post-processing measurements.
Satellite-v2 (deprecated)¶
Figure - Overview of Satellite-v2
Satellite-v2 is our brand new version of Satellite, where we’ve made several modifications to the measurement technique and data format for facilitating accurate and efficient remote DNS interference measurements. Below, we detail the major changes we’ve made in Satellite-v2.
Fetching HTML pages hosted at resolved IPs marked as an anomaly - Satellite-v2 has an in-built fetch feature that performs HTTP and HTTPS GET requests to resolved IPs that fail our heuristics. This step was being performed as a post-processing step in Satellite-v1. This addition helps in quickly identifying blockpages such as the example shown in the figure below. Moreover, we are in the process of developing a technique to use TLS certificates to detect DNS manipulation. Reach out to censoredplanet@umich.edu for more information.
Measuring DNS interference without A records - In Satellite-v2, we have added a sandwiched retry mechanism to our Satellite measurements in order to detect DNS interference that results in a non-zero R code response. A description of the method is shown in the figure below. We first make a control query to the open DNS resolver, providing a domain name that we do not expect to be blocked (eg. www.example.com). After the control query, we make up to 4 retries of the test DNS query, providing the test domain name. In case an A record is detected, we stop the test measurement. At the end, we perform another control query similar to the first measurement. The control queries ensure that the resolver is behaving correctly for an innocuous domain, and the multiple retry mechanism accounts for temporary errors in the network. With the help of the sandwiched retry mechanism, Satellite-v2 is able to detect DNS interference that manifests as timeouts, NXDOMAIN, SERVFAIL etc. From our preliminary analysis of Satellite-v2 data, we’ve already found several cases of DNS interference that can be identified using this method. For example, from the Satellite-v2 scan performed on 2021-03-17, we are able to identify 174,795 responses that have non-zero R codes from China, which makes up 15.6% out of the responses marked as interference. This kind of DNS interference was previously omitted by satellite v1. Shown below is an example measurement that passed the sandwich control tests, but received server failure R code. This could be an indicator of censorship or geoblocking.
Adding scan-level heuristics to exclude false positives - Another step part of the post-processing pipeline of Satellite-v1 that is inbuilt in Satellite-v2. We exclude potentially false positive anomalies by using scan-level heuristics, such as the number of domains resolving to the anomalous IP address, or the anomalous IP address being part of a big CDN. Note that this step may lead to Satellite-v2 missing certain censorship.
Other changes - We updated the heuristics to determine whether a DNS response is interfered - Satellite-v2 now includes a new “confidence” field, which addresses the certainty of interference according to the state of comparison between responses from the test resolvers and the control resolvers. We also make sure that IPs with no metadata information from Censys are not marked as interference.
Satellite-v2.1¶
Satellite-v2.1 incorporates minor changes from Satellite-v2.0, starting after April 14, 2021. Most of these changes are related to change in data formats.
Satellite-v2.2¶
Satellite-v2.2 incorporates major changes in code and data structure from Satellite-v2.1, but no major changes in the functionality of Satellite. The changes are made after June 7, 2021 and they include,
Store information generated from the query, tag, detect, and verify module in memory, producing only one file (results.json) as output, instead of generating outputs for every module. Renamed query-tag-detect-verify as “test” module, and probe-filter as “discovery”.
Updated test module so that it first conducts queries for control resolvers, and then query, tag and detect test resolvers in batches.
Satellite v2 is divided into three parts:
discovery
: consist ofprobe
andfilter
modules.test
: consist ofquery
,tag
anddetect
modules.verification and blockpage fetching: consist of
fetch
andverify
.
Generate a DNS A query packet for a controlled domain (
dns.pkt
).Perform a ZMap (Internet-wide) scan with the probe packet for open DNS resolvers.
resolvers_raw.json
contains the ZMap output:saddr
StringIP address of a DNS resolver.
data
StringRaw response to probe domain.
Perform PTR queries on the IPs of resolvers found by ZMap and filter out the ones without PTR records.
Perform Liveness test on the infrastructural resolvers and filter out the ones that fail.
Add predefined “control” and “special” resolvers to form the final set of vantage points.
Tag each resolver with the location from Maxmind.
resolvers.json
contains the infrastructure, “control”, and “special” resolvers.vp
StringThe IP address of the vantage point (a DNS resolver).
name
StringResult from PTR query (if infrastructure), “control”, or “special”.
location
: JSON objectcountry_name
StringThe full name of the country where the resolver is located.
country_code
StringThe two-letter ISO 3166 code of the country where the resolver is located.
Make DNS queries for each test domain to each resolver. The query for the test domain is attempted up to four times in case of connection error. To check the status of the resolver, a control measurement is conducted before the queries for the test domain. If the first control measurement fails, no further measurements will be conducted for the same
(resolver, domain)
pair. If all 4 trials for the test domain fail, another control measurement will be conducted.Parse and separate responses from control resolvers and non-control resolvers.
- Tag each answer IP with information from Censys.
Note:
Fields may have empty strings if the information was not available on Censys.
Compare query responses between non-control resolvers and control resolvers to identify interference. When running satellite v2 as a whole module,
detect
does not output any files. However, when run separately,detect
outputsresults.json
with theexcluded
field set tofalse
and theexcluded_reason
field set tonull
by default. (See the output structure inverify
section)Note:
For each response, the answer IPs and their tags are compared to the set of answer IPs and tags from all the control resolvers for the same query domain. A response is classified as an anomaly if there is no overlap between the two.
Perform HTTP(S) GET requests to the IPs identified as anomalies.
blockpages.json
contains the responses:ip
StringThe IP address from an anomalous DNS response.
keyword
StringThe domain queried for the anomalous DNS response.
http
ObjectHTTP response.
https
ObjectHTTPS response.
fetched
BooleanEquals true if a page is successfully fetched.
start_time
StringThe start time of the measurement.
end_time
StringThe end time of the measurement.
- New heuristics to exclude possible cases of erroneous answers from resolvers. Currently,
verify
excludes answer IPs that are part of big CDNs (Note: this could lead to false negatives) and answer IPs that appear for a low number of domains (<=2). results.json
contains all the information when runningfull
mode.vp
StringThe IP address of the vantage point (a DNS resolver).
test_url
StringThe test domain being queried.
location
: JSON objectcountry_name
StringThe full name of the country where the resolver is located.
country_code
StringThe two-letter ISO 3166 code of the country where the resolver is located.
passed_liveness
BooleanEquals
false
if both control queries were unsuccessful.
in_control_group
BooleanEquals true if at least one control resolver had a valid response for this test domain.
connect_error
BooleanEquals true if all test domain query attempts returned errors. This field is also set to be
true
if the first control measurement fails, and no further measurements for the test domain are conducted. Use this field in conjunction with thepassed_liveness
field to find anomalies.
anomaly
BooleanEquals true if an anomaly is detected. In case there are no tags for the answers or control, then this field is conservatively marked as false.
start_time
StringThe start time of the measurement.
end_time
StringThe end time of the measurement.
response
: JSON objectThe resolver’s returned answers for the queried domain are the keys.
url
: StringThe domain being queried in this trial, either the control domain for liveness test or
test_url
. The liveness test DNS responses are only recorded if they do not contain a type-A RR.
has_type_a
: BooleanEquals
true
if the query returned a valid A resource record.
error
: StringContains error information.
rcode
: IntegerResponse code mapping to success (0) or errors (-1 for connection error, > 0 for errors specified in RFC 2929).
response
: JSON ObjectConsist of a map between IPs the resolver responded for the queried domain and tags from Maxmind:
http
StringThe hash of the HTTP body.
cert
StringThe hash of the TLS certificate.
asname
StringThe autonomous system (AS) name.
asnum
IntegerThe autonomous system (AS) number.
matched
ArrayAn array of its tags that matched the control tags - if the IP is in the control set, “ip” is appended and if the IP has no tags, “no_tags” is appended.
- New heuristics to exclude possible cases of erroneous answers from resolvers. Currently,
Notes¶
While Satellite includes multiple control resolvers intended to avoid false inferences there is still a possibility that certain measurements are marked as anomalies incorrectly. To confirm censorship, it is critical that the raw DNS responses are compared to known blockpage fingerprints. The blockpage fingerprints currently recorded by Censored Planet are available here. Moreover, aggregations can be used to avoid anomalous vantage points and domains. Please use our analysis pipeline to process the data before using it.
Censored Planet detects network interference of websites using remote measurements to infrastructural vantage points within networks (eg. institutions). Note that this raw data cannot determine the entity responsible for the blocking or the intent behind it. Please exercise caution when using the data, and reach out to us at censoredplanet@umich.edu if you have any questions.