DataX Formats, Schemas, Identifiers & Limits
  • 4 Minutes to read

    DataX Formats, Schemas, Identifiers & Limits


      Article summary

      Abstract

      Describes the ingestion formats, partner identifiers, rate-limited quotas for taxonomy and audience posts, and data retention for audiences for DataX partners. Error code responses are also described in this section.

      Ingestion Formats

      Partner will call the POST /v1/usermatch endpoint with their assigned OAuth 2.0 credentials.

      Partner Match data is shared in one of the following methods:

      CSV File With bzip2 Compression

      Define Schema in the .csv file header.

      Example 1:

      PXID

      SHA256EMAIL

      Partner ID 1

      HashedEmail Value 1

      Partner ID 2

      HashedEmail Value 2

      Example 2:

      PXID

      SHA256PHONE

      Partner ID 1

      HashedPhone Value 1

      Partner ID 2

      HashedPhone Value 2

      Example 3:

      PXID

      SHA256EMAIL

      SHA256PHONE

      Partner ID 1

      HashedEmail Value 1

      HashedPhone Value 1

      Partner ID 2

      HashedEmail Value 2

      HashedPhone Value 2

      HTTP /POST

      Note that before uploading your JSON, you’ll need to encrypt it with SHA256. That means, you must convert your email and phone number list to a hash, which you would then place in your file. For example, if the original email is [email protected], the hashed value in the file would be:

      d48adb3c108a657adf7597921f3bfc591ee3f00d658d2d288e0bb396ac0d5964
      
      OR
      
      Phone Number: +911112341234
      Would have a hashed value of: 8e4b975211195dc6ddc68c4faaad597a715f30dd46e06f6c20e8ebd363105232

      Important

      Hashed email: Your file name must be properly normalized using lowercase and contain no spaces.

      Hashed Phone Numbers: Before SHA-256 hashing phone numbers, here are the normalization rules based on E.164:

      • E.164 phone numbers can have a maximum of 15 digits.

      • Normalized E.164 phone numbers use the following syntax, with no spaces, hyphens, parentheses or other special characters:

        • [+] [country code] [subscriber number including area code]

        • Examples:

          • US: 1 (123) 456-7890 is normalized to +11234567890.

          • Singapore: 65 1243 5678 is normalized to +6512345678.

          • Sydney, Australia: (02) 1234 5678 is normalized to drop the leading zero for the city plus include the country code: +61212345678.

      • Additional external documentation:

      Additional Hashed Phone Number Test Dataset

      Examples listed here have already been normalized. This is for validating the phone number hashing.

      Phone Number

      SHA256 Hash

      +12112341234

      af97f05c18bb00fabc6efb687f266f81922dc5774f4fb9ab3b36220d322bcfe2

      +14012341234

      cfe2df2ecc018ca1515564cf8b50317240e1087e9923e332a8e480e123036e63

      +16012341234

      cf2ef0b305c6bbbeb6049d05e478339f08636c856ccd7b87f9af55cdfab23f8c

      +15112341234

      322ae4d32f4c78b010ef3c1c5c791b14443a263f9d20708fee0b4572954cd429

      +13412341234

      dcbe9a054609dda663dbbfc0aa942f51ed688aa9dfb9cf49c4184ec38e56bf87

      +551112341234

      25b5811f7cb1f84de828e85dbf5b9104d5640bd5a8622d6f8d8f1df960f5cf4c

      +56212341234

      452e9628adda15274224bf33ca45890fc34b81fead821f1732f7aad3bf4ee73d

      +442012341234

      df20b3aff6ce273ee28f7cf03f747a47116fed8ce898e3df95f87efca7bdecdb

      +33112341234

      745a510945dfcaa1af05a68571ad388c09f1d9a01368ded47e0667acd5feed0b

      Regular Expression to Validate Normalized Phone Number

      Pattern I164_PATTERN = Pattern.compile("^\\+[1-9]\\d{1,14}$");

      DataX only supports pre-encrypted files. Once the files are encrypted, all the personal data that resides on the files will be protected, using the SHA256 function, so that no raw emails or phone numbers are ever stored.

      If an email address or phone number is not hashed in the proper format, DataX will not process the audience records.

      Follow these steps:

      1. Define schema parameter during a POST call.

      2. Introduce a comma-separated schema parameter that defines the column order and the name values.

      3. Schema MUST be defined in the headers of the file itself.

      Example

      POST /v1/usermatch?schema=PXID,SHA256EMAIL,SHA256PHONE
              Partner ID1, HashedEmail Value 1,HashedPhone Value 1
              Partner ID2, HashedEmail Value 2,HashedPhone Value 2
              Partner ID3, HashedEmail Value 3,HashedPhone Value 3

      Notes:

      • A Partner has the option to define the schema as they wish.

      • There is a priority of the schema defined at different places: schema parameter > schema header in files > default schema.

      • If a schema is not defined during ingestion, the default schema format will be PXID|SHA256EMAIL.

      Data Validation

      When calling the POST /v1/usermatch endpoint, the uploaded bz2 file’s format and content will be validated. The header line and up to the first 10 lines in the content will be checked. If the validation is not passed, the following error code and message will be returned:

      Error Code

      Error Message

      Description

      400 DxFileIsNull

      No data file provided in the request

      Make sure the data file is provided.

      400 DxFileIsEmpty

      Empty data file provided in the request

      Check the content of the file.

      400 DxFileNotValidBZ2

      Bz2 file is malformed

      Check the file is a valid bz2 file.

      400 DxFileNotValidSchema

      Insufficient number of valid schema fields

      Check the file schema field number is 2 and the delimiter is comma.

      400 DxFileHeaderOnly

      File with only header line

      Check the content lines in the file.

      400 DxFileInvalidDelimiter

      Invalid delimiter

      Check the delimiter used is comma.

      400 DxFileHeaderOnly

      File with only header line

      Check the content lines in the file.

      400 DxFileInconsistentFieldAndSchema

      Field numbers not consistent with schema numbers

      Check the number of content fields and schema fields are the same.

      400 DxFileInvalidPxid

      Invalid pxid

      Make sure PXID is formed by the following ASCII characters from 33 to 126 in ASCII table (44 is comma and is excluded).

      400 DxFileInvalidHashedEmail

      Invalid hashedEmail

      Make sure hashedEmail’s length is 64 and the valid characters are ‘a’-’f’, ‘A’-’F’ and ‘0’-9’.

      400 DxFileInconsistentContent

      Inconsistent schema and content

      Make sure the header field order is the same as the content field order.

      DataX Partner Identifier

      Variable

      Syntax

      PXID - Partner Cross Identifier

      PXID = Partner Cross Identifier (case-sensitive).

      Note that Data ingestion will perform an exact match.

      Rate Limitation

      DataX is rate-limited per the following quota limits for taxonomy and audience posts:

      Error Code

      Error Message

      Description

      429 Too many requests

      Rate Limit exceeded per hour (Limit: 100)

      Number of requests allowed in an hour per provider.

      Data Retention

      Segment Expiration:

      • Audiences will be linked to the active segment for 45 days.

      • Audience refreshes can be posted at any time (for example, daily, weekly, monthly, etc).

      • If audiences are not refreshed by the default expiration TTL, they will be unlinked from the segment(s).

      User Match Expiration:

      • Partner IDs will be stored in-house for measurement.

      • In-house data will be stored/ linked to a Verizon Media ID for 90 days to support audience refreshes.

      Error Responses

      Error Code

      Error Message

      Description

      400 DxInvalidRequest

      <urtType> is not supported

      Used urnType is not supported

      400 DxJobNotFound

      Cannot Find Job with id <request_id>

      The request_id is not found in the datax db.

      400 Bad Request

      Bad Application Id

      The application id is not correct.

      500 DxInternalError

      Unable to Create Job

      The upload job can’t be created.

      500 UNABLE_TO_PROCESS_REQUEST

      Failed to process. Try again after some time

      Server is not available during the processing time.


      Was this article helpful?

      What's Next