DNS Data - Satellite

Satellite/Iris is Censored Planet’s remote measurement technique that detects DNS interference using Open DNS resolvers. Below, we provide an overview of Satellite and its data format. Refer to our academic papers for in-depth details about Satellite.

Satellite-v2.2-raw

To provide raw data for easy data analysis, we made the following changes:

  1. Split data based on the country of resolvers so that it is easier to select and download data according to users’ country of interest.

  2. Separated the data collection phase and data analysis phase. Right now the Satellite data from our raw measurement data website is truthful to the data collected without further analysis. We deprecated the “anomaly” field since there are misunderstandings that anomaly represents censorship.

  3. Added new data containing further metadata fields and flattened nested data for easy analysis. Modified field names for disambiguation purposes.

    • domainString

      The test domain being queried.

    • domain_is_controlBoolean

      Equals true if the queried domain is the root server for liveness test.

    • test_urlString

      The URL of the queried domain.

    • dateString

      The date of the measurement.

    • start_timeString

      The start time of the measurement.

    • end_timeString

      The end time of the measurement.

    • resolver_ipString

      The IP address of the vantage point (a DNS resolver).

    • resolver_nameString

      The hostname of the vantage point.

    • resolver_is_trustedBoolean

      Equals true if the resolver is a control resolver.

    • resolver_netblockString

      The netblock the vantage point belongs to.

    • resolver_asnString

      The AS number of the AS the vantage point resides in.

    • resolver_as_nameString

      The name of the AS the vantage point resides in.

    • resolver_as_full_nameString

      The full name of the AS the vantage point resides in.

    • resolver_as_classString

      The class of the AS the vantage point resides in.

    • resolver_countryString

      The country the vantage point resides in.

    • resolver_organizationString

      The IP organization the vantage point resides in.

    • received_errorString

      Flatten error messages from the received responses.

    • received_rcodeInteger

      Flatten rcode from the received responses. Response code mapping to success (0) or errors (-1 for connection error, > 0 for errors specified in RFC 2929).

    • sourceString

      Tar file name of the measurement.

    • answersJSON object

      The resolver’s returned answers for queried domain.

      • ip: String

        Returned IP.

      • asn: String

        The AS number of the AS the returned IP resides in.

      • as_name: String

        The AS name of the AS the returned IP resides in.

      • censys_http_body_hash: String

        The hash of the HTTP body from Censys.

      • censys_ip_cert: String

        The hash of the TLS certificate from Censys.

      • http_error: String

        Parsed HTTP page error message from fetch module.

      • http_response_status: String

        Parsed HTTP page status code from fetch module.

      • http_response_headers: String

        Parsed HTTP page headers from fetch module.

      • http_response_body: String

        Parsed HTTP page body from fetch module.

      • https_error: String

        Parsed HTTPS page error message from fetch module.

      • https_response_status: String

        Parsed HTTPS page status code from fetch module.

      • https_response_headers: String

        Parsed HTTPS page headers from fetch module.

      • https_response_body: String

        Parsed HTTPS page body from fetch module.

      • https_tls_version: String

        Parsed TLS version from fetch module.

      • https_tls_cipher_suite: String

        Parsed TLS cipher suite from fetch module.

      • https_tls_cert: String

        Parsed TLS certificate from fetch module.

      • https_tls_cert_common_name: String

        Parsed common name field from TLS certificate.

      • https_tls_cert_alternative_names: String

        Parsed alternative name field from TLS certificate.

      • https_tls_cert_issuer: String

        Parsed issuer field from TLS certificate.

      • https_tls_cert_start_date: String

        Parsed start date of the TLS certificate.

      • https_tls_cert_end_date: String

        Parsed end date of the TLS certificate.

Satellite-v1 (deprecated)

Figure - Overview of Satellite-v1

Figure - Overview of Satellite-v1

Satellite-v1 is the first version of Satellite that we operated from August 2018 - February 2021. The primary function of Satellite is to detect incorrect DNS resolutions from open DNS resolvers in many countries.

  • From a measurement machine at the University of Michigan, we send a DNS query for a website whose reachability we’re interested in, to an open DNS resolver in a country of interest (1). The response from the DNS resolver is our Test IP (2).

  • We also send a DNS query for the same website to trusted control resolvers (3), and record their response as the control IP (4).

  • We then compare the test and control responses using several heuristics, including a direct IP address comparison, and comparison of the AS number, AS names, HTTP content hashes, and TLS certificates associated with the test and control IP addresses (5). Satellite-v1 only labels a measurement as an anomaly when all of the heuristics mismatch.

Our various publications and reports have used Satellite-v1 to detect many cases of DNS manipulation. For instance, in our recent investigation into the filtering of COVID-19 websites , Satellite-v1 found many networks using website filtering products to manipulate DNS responses of COVID-related websites.

Limitations

Although Satellite-v1 was extremely useful in detecting DNS interference at large scale, it suffered from several limitations, which form the improvements in Satellite-v2.x.

  • Satellite-v1 could not detect DNS censorship where A records were not available i.e. Satellite-v1 primarily focused on detecting incorrect DNS resolutions through the resolved IP address, and did not contain heuristics to measure DNS manipulation which manifested through timeouts, NXDOMAIN responses, SERVFAIL responses, etc.

  • Satellite-v1 required post-processing to remove false positives and confirm the presence of anomalies, such as through using post-measurement heuristics and blockpage regexes. Satellite-v2 has the inbuilt capability to perform most post-processing measurements.

Satellite-v2 (deprecated)

Figure - Overview of Satellite-v2.0

Figure - Overview of Satellite-v2

Satellite-v2 is our brand new version of Satellite, where we’ve made several modifications to the measurement technique and data format for facilitating accurate and efficient remote DNS interference measurements. Below, we detail the major changes we’ve made in Satellite-v2.

  • Fetching HTML pages hosted at resolved IPs marked as an anomaly - Satellite-v2 has an in-built fetch feature that performs HTTP and HTTPS GET requests to resolved IPs that fail our heuristics. This step was being performed as a post-processing step in Satellite-v1. This addition helps in quickly identifying blockpages such as the example shown in the figure below. Moreover, we are in the process of developing a technique to use TLS certificates to detect DNS manipulation. Reach out to censoredplanet@umich.edu for more information.

  • Measuring DNS interference without A records - In Satellite-v2, we have added a sandwiched retry mechanism to our Satellite measurements in order to detect DNS interference that results in a non-zero R code response. A description of the method is shown in the figure below. We first make a control query to the open DNS resolver, providing a domain name that we do not expect to be blocked (eg. www.example.com). After the control query, we make up to 4 retries of the test DNS query, providing the test domain name. In case an A record is detected, we stop the test measurement. At the end, we perform another control query similar to the first measurement. The control queries ensure that the resolver is behaving correctly for an innocuous domain, and the multiple retry mechanism accounts for temporary errors in the network. With the help of the sandwiched retry mechanism, Satellite-v2 is able to detect DNS interference that manifests as timeouts, NXDOMAIN, SERVFAIL etc. From our preliminary analysis of Satellite-v2 data, we’ve already found several cases of DNS interference that can be identified using this method. For example, from the Satellite-v2 scan performed on 2021-03-17, we are able to identify 174,795 responses that have non-zero R codes from China, which makes up 15.6% out of the responses marked as interference. This kind of DNS interference was previously omitted by satellite v1. Shown below is an example measurement that passed the sandwich control tests, but received server failure R code. This could be an indicator of censorship or geoblocking.

  • Adding scan-level heuristics to exclude false positives - Another step part of the post-processing pipeline of Satellite-v1 that is inbuilt in Satellite-v2. We exclude potentially false positive anomalies by using scan-level heuristics, such as the number of domains resolving to the anomalous IP address, or the anomalous IP address being part of a big CDN. Note that this step may lead to Satellite-v2 missing certain censorship.

  • Other changes - We updated the heuristics to determine whether a DNS response is interfered - Satellite-v2 now includes a new “confidence” field, which addresses the certainty of interference according to the state of comparison between responses from the test resolvers and the control resolvers. We also make sure that IPs with no metadata information from Censys are not marked as interference.

Satellite-v2.1

Satellite-v2.1 incorporates minor changes from Satellite-v2.0, starting after April 14, 2021. Most of these changes are related to change in data formats.

Satellite-v2.2

Satellite-v2.2 incorporates major changes in code and data structure from Satellite-v2.1, but no major changes in the functionality of Satellite. The changes are made after June 7, 2021 and they include,

  • Store information generated from the query, tag, detect, and verify module in memory, producing only one file (results.json) as output, instead of generating outputs for every module. Renamed query-tag-detect-verify as “test” module, and probe-filter as “discovery”.

  • Updated test module so that it first conducts queries for control resolvers, and then query, tag and detect test resolvers in batches.

Satellite v2 is divided into three parts:

  1. discovery: consist of probe and filter modules.

  2. test: consist of query, tag and detect modules.

  3. verification and blockpage fetching: consist of fetch and verify.

  1. Generate a DNS A query packet for a controlled domain (dns.pkt).

  2. Perform a ZMap (Internet-wide) scan with the probe packet for open DNS resolvers.

    resolvers_raw.json contains the ZMap output:

    • saddrString

      IP address of a DNS resolver.

    • dataString

      Raw response to probe domain.

  1. Perform PTR queries on the IPs of resolvers found by ZMap and filter out the ones without PTR records.

  2. Perform Liveness test on the infrastructural resolvers and filter out the ones that fail.

  3. Add predefined “control” and “special” resolvers to form the final set of vantage points.

  4. Tag each resolver with the location from Maxmind.

    resolvers.json contains the infrastructure, “control”, and “special” resolvers.

    • vpString

      The IP address of the vantage point (a DNS resolver).

    • nameString

      Result from PTR query (if infrastructure), “control”, or “special”.

    • location: JSON object
      • country_nameString

        The full name of the country where the resolver is located.

      • country_codeString

        The two-letter ISO 3166 code of the country where the resolver is located.

  1. Make DNS queries for each test domain to each resolver. The query for the test domain is attempted up to four times in case of connection error. To check the status of the resolver, a control measurement is conducted before the queries for the test domain. If the first control measurement fails, no further measurements will be conducted for the same (resolver, domain) pair. If all 4 trials for the test domain fail, another control measurement will be conducted.

  2. Parse and separate responses from control resolvers and non-control resolvers.

  1. Tag each answer IP with information from Censys.

    Note:

    • Fields may have empty strings if the information was not available on Censys.

  1. Compare query responses between non-control resolvers and control resolvers to identify interference. When running satellite v2 as a whole module, detect does not output any files. However, when run separately, detect outputs results.json with the excluded field set to false and the excluded_reason field set to null by default. (See the output structure in verify section)

    Note:

    • For each response, the answer IPs and their tags are compared to the set of answer IPs and tags from all the control resolvers for the same query domain. A response is classified as an anomaly if there is no overlap between the two.

  1. Perform HTTP(S) GET requests to the IPs identified as anomalies.

    blockpages.json contains the responses:

    • ipString

      The IP address from an anomalous DNS response.

    • keywordString

      The domain queried for the anomalous DNS response.

    • httpObject

      HTTP response.

    • httpsObject

      HTTPS response.

    • fetchedBoolean

      Equals true if a page is successfully fetched.

    • start_timeString

      The start time of the measurement.

    • end_timeString

      The end time of the measurement.

  1. New heuristics to exclude possible cases of erroneous answers from resolvers. Currently, verify excludes answer IPs that are part of big CDNs (Note: this could lead to false negatives) and answer IPs that appear for a low number of domains (<=2).

    results.json contains all the information when running full mode.

    • vpString

      The IP address of the vantage point (a DNS resolver).

    • test_urlString

      The test domain being queried.

    • location: JSON object
      • country_nameString

        The full name of the country where the resolver is located.

      • country_codeString

        The two-letter ISO 3166 code of the country where the resolver is located.

    • passed_livenessBoolean

      Equals false if both control queries were unsuccessful.

    • in_control_groupBoolean

      Equals true if at least one control resolver had a valid response for this test domain.

    • connect_errorBoolean

      Equals true if all test domain query attempts returned errors. This field is also set to be true if the first control measurement fails, and no further measurements for the test domain are conducted. Use this field in conjunction with the passed_liveness field to find anomalies.

    • anomalyBoolean

      Equals true if an anomaly is detected. In case there are no tags for the answers or control, then this field is conservatively marked as false.

    • start_timeString

      The start time of the measurement.

    • end_timeString

      The end time of the measurement.

    • response : JSON object

      The resolver’s returned answers for the queried domain are the keys.

      • url: String

        The domain being queried in this trial, either the control domain for liveness test or test_url. The liveness test DNS responses are only recorded if they do not contain a type-A RR.

      • has_type_a: Boolean

        Equals true if the query returned a valid A resource record.

      • error: String

        Contains error information.

      • rcode: Integer

        Response code mapping to success (0) or errors (-1 for connection error, > 0 for errors specified in RFC 2929).

      • response: JSON Object

        Consist of a map between IPs the resolver responded for the queried domain and tags from Maxmind:

        • httpString

          The hash of the HTTP body.

        • certString

          The hash of the TLS certificate.

        • asnameString

          The autonomous system (AS) name.

        • asnumInteger

          The autonomous system (AS) number.

        • matchedArray

          An array of its tags that matched the control tags - if the IP is in the control set, “ip” is appended and if the IP has no tags, “no_tags” is appended.

Notes

While Satellite includes multiple control resolvers intended to avoid false inferences there is still a possibility that certain measurements are marked as anomalies incorrectly. To confirm censorship, it is critical that the raw DNS responses are compared to known blockpage fingerprints. The blockpage fingerprints currently recorded by Censored Planet are available here. Moreover, aggregations can be used to avoid anomalous vantage points and domains. Please use our analysis pipeline to process the data before using it.

Censored Planet detects network interference of websites using remote measurements to infrastructural vantage points within networks (eg. institutions). Note that this raw data cannot determine the entity responsible for the blocking or the intent behind it. Please exercise caution when using the data, and reach out to us at censoredplanet@umich.edu if you have any questions.