User and Audience Data Metadata
  • 7 Minutes to read

    User and Audience Data Metadata


      Article summary

      User Data resource collectors define an extension over the Base DataX Metadata structure.

      JSON Structure

      {
      
                      (Base DataX Metadata)
      
      "extensions" : {
      "urnType" : {string},
      "ts" : {number},
      "exp" : {number},
      "taxonomyEntities" : [ {string} ],
      "view" : {string},
      "links" : [{
              "rel": "https://datax.yahooapis.com/rels/taxonomy",
              "href" : {string}
       },{
              "rel": "https://datax.yahooapis.com/rels/stats",
              "href" : {string}
       },{
              "rel": "https://datax.yahooapis.com/rels/errors",
              "href" : {string}
       },{
              "rel": "https://datax.yahooapis.com/rels/retry/errors",
              "href" : {string}
       },{
              "rel": "https://datax.yahooapis.com/rels/errors",
              "href" : {string}
       }],
      "stats" :
              {
                      "records":{
                        "err":0,
                        "total":5,
                        "%ok":100
                      },
                      "cells":{
                        "err":0,
                        "total":5,
                        "%ok":100
                      },
                      "seg":{
                        "summary":{
                          "err":0,
                          "total":5,
                          "%ok":100
                      },
                      "details":{
                        "datax-segment_test2":{
                          "err":0,
                          "total":5,
                          "%ok":100
            }
          }
        }
      }

      JSON Property Descriptions

      Property Name

      Type

      Description

      status.state (Base DataX Metadata)

      String

      [output-only] Current state of the Data Upload within DataX. One of these:

      • UPLOADING: Contents are being received over https.

      • PROCESSING: Moving, uncompressing, validating etc.

      • ACCEPTED: The data has been processed and will be live soon.

      • ACCEPTED-WITH-ERRORS: Data was processed with some errors.

        Successful records have been accepted and will be live soon. Please see stats for the degree of acceptance. If errors are due to data itself, please re-upload new and fixed data. If errors are due to taxonomy, please fix taxonomy and re-submit failed records for processing. Note: re-submitting will only process failed records. Successfully uploaded records will not be processed again.

      • ERROR: Data processing failed completely.

        If errors are due to data itself, please re-upload new and fixed data. If errors are due to taxonomy, please fix taxonomy and re-submit this data for processing.

      extensions.urnType

      String

      Represents the type of URN used to identify a user in the data. One of:

      1. EYUID

      2. DXID

      3. YAHID

      4. ZIP4

      5. IXID

      6. Mobile IDs:

      • IDFA (iOS ID for Advertisers)

      • GPADVID (Google Play Advertising ID)

      Note: All of the above except for the mobile IDs are Yahoo IDs. The flavor of ID that you work with depends on the tech stack you use to perform a user match with Yahoo.

      extensions.ts

      Number

      [optional] Timestamp associated with each cell of this upload, unless overridden by the cell itself. It must be a UNIX epoch value in seconds, e.g., 1376244670 for “Sun Aug 11 18:11:10 2013”.

      Its value must represent a time in the past. This value can also be overridden at a cell level in API/formats that take a “ts” value (datax- audience).

      A cell is one data point – one or more per line of input in the uploaded data. In datax-segment and datax-attribute formats, each record has at most one data point/cell, whereas in datax-audience format, each line of input (record) may have more than one data point/cell (multiple segment qualifications/attribute values per user record).

      extensions.view

      String

      [output-only] for:

      1. INCREMENTAL (default)

      All UserData API calls are implicitly incremental unless otherwise stated by an API caller with inclusion of the query parameter “?snapshot” (not supported currently).

      A snapshot view implies exclusion of all users from the set of “taxonomyEntities” in a payload, unless a user is specifically “included” in that taxonomy-entity.

      If or when snapshots are supported by DataX, all API calls and JSON representations will look exactly the same, but with an extra step of establishing a logical upload first to help limit the size of snapshot files to chunks less than 10 GB per part-upload (compare with part-files).

      extensions.links

      Array

      [output-only] Useful link extensions. Currently defines three. DataX may define more to be discovered at development time. Any new links will contain URI style link relation extensions with precise documentation at the URI specified against the “rel” property.

      extensions.links[].rel

      URI

      [output-only] Currently defines the following relation types:

      1. https://datax.yahooapis.com/rels/taxonomy – The Taxonomy resource is defined earlier in this document.

      2. https://datax.yahooapis.com/rels/stats – This link provides statistics on the input data post processing. Please see Stats Link (below) for a JSON layout. Most of this information, except for the “seg” and “att” details are also available, in the meta data body itself, under the “stats” section. Please see “extensions.stats” below for a detailed description of these statistics. “seg” and “att” are a further breakdown of the “cells” stats on a per taxonomy id basis.

      3. https://datax.yahooapis.com/rels/errors - This link provides some sample errors found while processing the payload. Currently, it captures two kinds of errors: a) Cells that failed taxonomy validation, b) User IDs that we failed to identify. For (a), if the errors are due to an incorrect Taxonomy, please post a corrected Taxonomy and use the retryerrors link to reprocess failed records as described below.

      4. https://datax.yahooapis.com/rels/retryerrors - Use this to re-process records that failed due to an incorrect taxonomy. Please do so, only after posting a corrected Taxonomy first. This link may be made available only if recoverable errors were found during processing of this upload.

      5. https://datax.yahooapis.com/rels/errors - This link provides some sample errors found that cannot be recovered without re-uploading fixed data. These are usually badly formatted JSON records.

      extensions.links[].href

      URI

      [output-only] Links to the reference taxonomy, data statistics and failed record processing, in this version of DataX, plus more in the future.

      extensions.stats

      Object

      [output-only] Stats summary generated post processing. Note: More stats may be available in the audience-stats link. Each unit of stat is described by three properties in the context of an element, which could be records, users, cells, segments, attributes etc depending on the context:

      • total: Total number of elements found.

      • err: Total number of elements that failed processing due to errors.

      • %ok: Percentage of successfully processed elements.

      extensions.stats.records

      Object

      [output-only] Stats on records. A record is defined as a line of input in the ingested data.

      extensions.stats.urn

      Object

      [output-only] Stats on number of unique users found in successfully processed records. A user may fail processing if it could not be converted/mapped to an internal Yahoo user.

      extensions.stats.cells

      Object

      [output-only] Stats on number of data cells found in successfully processed records against valid user ids. A cell is a unit of data; e.g., in the datax-audience format, a single record may contain multiple segment qualifications and attribute values for a given user. Each qualification & attribute value counts as an individual cell (unit of data). In the datax- segment format, for example, a single record is a single cell/unit of data.

      extensions.stats.seg

      Object

      [output-only] This is a subset of cells stats for segments data only. Includes a summary stat as well as per-segment stats.

      extensions.stats.attr

      Object

      [output-only] This is a subset of cells stats for attribute data only. Includes a summary stat as well as per-attribute stats.

      extensions.stats.%accepted

      Number

      [output-only] Percentage of the upload that was successfully processed. This is an approximate representation, as records that fail to load may have contained multiple cells, e.g., multiple segment inclusions/exclusions & attributes for a single user. Similarly, even though a record is loaded successfully, individual cells within it may fail to load due to various errors. Please see Audience stats link below for more details.

      Stats Link

      This link provides statistics on the input data post processing. Most of this information, except for the ‘seg’ and ‘att’ details are also available, in the meta data body itself, under the ‘stats’ section. Please see ‘extensions.stats’ in the previous section for a detailed description of these statistics. ‘seg’ and ‘att’ are a further breakdown of the ‘cells’ stats on a per taxonomy ID basis.

      {
      "Records":{"total":{n},
      "err":{n},"%ok":{n}},
      "urn"   :{"total":{n},"err":{n},"%ok":{n}},
      "cells" :{"total":{n},"err":{n},"%ok":{n}},
      "seg"   :{"summary":{"total":{n},"err":{n},"%ok":{n}},
      "details":{"taxoId":{"total":{n},"err":{n},"%ok":{n}}}},
      "att"   :{"summary":{"total":{n},"err":{n},"%ok":{n}},
      "details":{"taxoId":{"total":{n},"err":{n},"%ok":{n}}}}, "%accepted":{number}
      }

      Errors Link (New: Combination of errors and errors)

      This link provides a file with sample errors found while processing the payload. Currently, it captures two kinds of errors:

      1. Cells that failed taxonomy validation

      2. User ids that we failed to identify

      For (1), if the errors are due to an incorrect Taxonomy, please post a corrected Taxonomy prior to reprocessing the failed audience data.

      File also contains sample errors found that cannot be recovered without re-uploading fixed data. These are usually badly formatted JSON records, represented as:

      {
      "errorCode" : "DxAudBadJson",
      "errorSample" : "[{\"cell_error_info\":\"{something really wrong\"}]"
      },

      Please note that escaped quotes “ are used because the failed (JSON) record is represented as-is within the error JSON for easy programmatic extraction.

      Previous Format:

      {"err":"DxUrnMissing","val":"[{\"cell_error_info\":\"{\\\"something\\\":\\\"wrong\\\"}\"}]"}
      {"err":"DxAudBadJson","val":"[{\"cell_error_info\":\"{something really wrong\"}]"}
      {"err":"DxAudAttributeNotInRange","val":"[{\"cell_error_info\":\"something\"}]"}
      {"err":"DxAudBadTaxoId","val":"[{\"cell_error_info\":\"413552\"}]"}
      {"err":"DxAudBadTaxoId","val":"[{\"cell_error_info\":\"413551\"}]"}

      New Format (JSON Array Structure):

      [ {
        "errorCode" : "DxUrnMissing",
        "errorSample" : "[{\"cell_error_info\":\"{\\\"something\\\":\\\"wrong\\\"}\"}]"
      }, {
               "errorCode" : "DxAudBadJson",
        "errorSample" : "[{\"cell_error_info\":\"{something really wrong\"}]"
      }, {
              "errorCode" : "DxAudBadTaxoId",
               "errorSample" : "[{\"cell_error_info\":\"413551\"}]"
      }, {
        "errorCode" : "DxAudAttributeNotInRange",
        "errorSample" : "[{\"cell_error_info\":\"something\"}]"
      }, {
        "errorCode" : "DxAudBadTaxoId",
        "errorSample" : "[{\"cell_error_info\":\"413552\"}]"


      Was this article helpful?