What Information Is Transmitted to the Cloud when the Bitdefender Filter Performs an Antispam Scan on a Message

This article explains how the Bitdefender antispam filter works and what information is transmitted to the cloud during a scan.

Solution

Cloud-based spam detection systems transmit specific information to identify and filter spam emails effectively. This article overviews the information sent to the Bitdefender cloud for each scanned email message. It is important to note that no content from the original email body or personally identifiable information is transmitted, ensuring privacy and data protection.

  1. Anti-Spam SDK Information: The cloud-based spam detection system collects information about the anti-spam software development kit (SDK) used, including the SDK version, engine version, and the platform it was built on. This information helps in maintaining and updating the spam detection capabilities.
     
  2. Sender IP Address: The sender’s IP address, extracted from the email headers, is transmitted to the cloud. This IP address aids in identifying potential spam sources and enhancing the accuracy of spam detection algorithms.
     
  3. Email Message Fingerprint: A cryptographic hash-based fingerprint of the email message is generated by combining hashes of different parts of the email headers and body. These irreversible hashes, devoid of the original email content, are transmitted to the cloud for analysis and comparison against known spam patterns.
     
  4. Statistical Information about the Message Body: Various statistical metrics about the email body are collected, such as the number of letter characters, non-text characters, whitespace characters, etc. These statistics contribute to spam detection algorithms by establishing patterns and anomalies in the message body.
     
  5. Statistical Information about the Message Headers: Aggregate statistical information related to the message headers is collected, such as a bitmask indicating the presence or absence of specific headers. The headers themselves or any associated sensitive information are not transmitted.
     
  6. Statistical Information about the Scanned Email: Additional statistical data about the scanned email is gathered, including the number of attachments, MIME parts, and binary parts. This information aids in identifying suspicious email structures and patterns.
     
  7. HTML Content: If an email contains HTML, the cloud-based system captures the background and foreground text colors. This data contributes to the analysis of HTML-based spam emails.
     
  8. DKIM Information: If present, the cloud-based system collects MD5 hashes of certain components related to the DomainKeys Identified Mail (DKIM) signature, such as From, Reply-To, Return-Path, and the signing domain. This information assists in verifying the authenticity of the email and detecting potential email spoofing.
     
  9. URLs, Email Addresses, Phone Numbers, and QQ Identifiers: MD5 hashes are generated for URLs, email addresses, phone numbers, and QQ identifiers found within the body of the scanned email. These hashes enable efficient comparison against known spam patterns without transmitting the actual sensitive information.
     
  10. Header-Based Hashes: MD5 hashes of the FROM address, FROM domain, and REPLY-TO address (obtained from the email headers) are generated and transmitted. This information helps in identifying patterns associated with spam sources.
     
  11. Attachments: When attachments are present, the cloud-based system collects various data points without transmitting the actual attachment content. This includes: 
    1. Hashes of images embedded within the email to detect image-based spam. 
    2. MD5 and in-house hashes of specific attachment types (e.g., office documents, PDFs, executables) to detect spam messages concealed within attachments.
    3. File names of attachments flagged as potentially harmful, such as Windows executables. 
    4. File sizes of attachments in bytes.
       
  12. Zipped Archives: If an attachment is a zipped archive, additional information is collected, including the full path of the file within the archive and the compression ratio (compressed vs. uncompressed file size). This aids in analyzing potentially harmful files concealed within archives.
     
  13. Cryptocurrency Wallets: To detect extortion scams, the cloud-based system collects the hash