- 7 Minutes to read
User and Audience Data Metadata
- 7 Minutes to read
User Data resource collectors define an extension over the Base DataX Metadata structure.
JSON Structure
{
(Base DataX Metadata)
"extensions" : {
"urnType" : {string},
"ts" : {number},
"exp" : {number},
"taxonomyEntities" : [ {string} ],
"view" : {string},
"links" : [{
"rel": "https://datax.yahooapis.com/rels/taxonomy",
"href" : {string}
},{
"rel": "https://datax.yahooapis.com/rels/stats",
"href" : {string}
},{
"rel": "https://datax.yahooapis.com/rels/errors",
"href" : {string}
},{
"rel": "https://datax.yahooapis.com/rels/retry/errors",
"href" : {string}
},{
"rel": "https://datax.yahooapis.com/rels/errors",
"href" : {string}
}],
"stats" :
{
"records":{
"err":0,
"total":5,
"%ok":100
},
"cells":{
"err":0,
"total":5,
"%ok":100
},
"seg":{
"summary":{
"err":0,
"total":5,
"%ok":100
},
"details":{
"datax-segment_test2":{
"err":0,
"total":5,
"%ok":100
}
}
}
}
JSON Property Descriptions
Property Name | Type | Description |
---|---|---|
status.state (Base DataX Metadata) | String | [output-only] Current state of the Data Upload within DataX. One of these:
|
extensions.urnType | String | Represents the type of URN used to identify a user in the data. One of:
Note: All of the above except for the mobile IDs are Yahoo IDs. The flavor of ID that you work with depends on the tech stack you use to perform a user match with Yahoo. |
extensions.ts | Number | [optional] Timestamp associated with each cell of this upload, unless overridden by the cell itself. It must be a UNIX epoch value in seconds, e.g., 1376244670 for “Sun Aug 11 18:11:10 2013”. Its value must represent a time in the past. This value can also be overridden at a cell level in API/formats that take a “ts” value (datax- audience). A cell is one data point – one or more per line of input in the uploaded data. In datax-segment and datax-attribute formats, each record has at most one data point/cell, whereas in datax-audience format, each line of input (record) may have more than one data point/cell (multiple segment qualifications/attribute values per user record). |
extensions.view | String | [output-only] for:
All UserData API calls are implicitly incremental unless otherwise stated by an API caller with inclusion of the query parameter “?snapshot” (not supported currently). A snapshot view implies exclusion of all users from the set of “taxonomyEntities” in a payload, unless a user is specifically “included” in that taxonomy-entity. If or when snapshots are supported by DataX, all API calls and JSON representations will look exactly the same, but with an extra step of establishing a logical upload first to help limit the size of snapshot files to chunks less than 10 GB per part-upload (compare with part-files). |
extensions.links | Array | [output-only] Useful link extensions. Currently defines three. DataX may define more to be discovered at development time. Any new links will contain URI style link relation extensions with precise documentation at the URI specified against the “rel” property. |
extensions.links[].rel | URI | [output-only] Currently defines the following relation types: 1. 2. 3. 4. 5. |
extensions.links[].href | URI | [output-only] Links to the reference taxonomy, data statistics and failed record processing, in this version of DataX, plus more in the future. |
extensions.stats | Object | [output-only] Stats summary generated post processing. Note: More stats may be available in the audience-stats link. Each unit of stat is described by three properties in the context of an element, which could be records, users, cells, segments, attributes etc depending on the context:
|
extensions.stats.records | Object | [output-only] Stats on records. A record is defined as a line of input in the ingested data. |
extensions.stats.urn | Object | [output-only] Stats on number of unique users found in successfully processed records. A user may fail processing if it could not be converted/mapped to an internal Yahoo user. |
extensions.stats.cells | Object | [output-only] Stats on number of data cells found in successfully processed records against valid user ids. A cell is a unit of data; e.g., in the datax-audience format, a single record may contain multiple segment qualifications and attribute values for a given user. Each qualification & attribute value counts as an individual cell (unit of data). In the datax- segment format, for example, a single record is a single cell/unit of data. |
extensions.stats.seg | Object | [output-only] This is a subset of cells stats for segments data only. Includes a summary stat as well as per-segment stats. |
extensions.stats.attr | Object | [output-only] This is a subset of cells stats for attribute data only. Includes a summary stat as well as per-attribute stats. |
extensions.stats.%accepted | Number | [output-only] Percentage of the upload that was successfully processed. This is an approximate representation, as records that fail to load may have contained multiple cells, e.g., multiple segment inclusions/exclusions & attributes for a single user. Similarly, even though a record is loaded successfully, individual cells within it may fail to load due to various errors. Please see Audience stats link below for more details. |
Stats Link
This link provides statistics on the input data post processing. Most of this information, except for the ‘seg’ and ‘att’ details are also available, in the meta data body itself, under the ‘stats’ section. Please see ‘extensions.stats’ in the previous section for a detailed description of these statistics. ‘seg’ and ‘att’ are a further breakdown of the ‘cells’ stats on a per taxonomy ID basis.
{
"Records":{"total":{n},
"err":{n},"%ok":{n}},
"urn" :{"total":{n},"err":{n},"%ok":{n}},
"cells" :{"total":{n},"err":{n},"%ok":{n}},
"seg" :{"summary":{"total":{n},"err":{n},"%ok":{n}},
"details":{"taxoId":{"total":{n},"err":{n},"%ok":{n}}}},
"att" :{"summary":{"total":{n},"err":{n},"%ok":{n}},
"details":{"taxoId":{"total":{n},"err":{n},"%ok":{n}}}}, "%accepted":{number}
}
Errors Link (New: Combination of errors and errors)
This link provides a file with sample errors found while processing the payload. Currently, it captures two kinds of errors:
Cells that failed taxonomy validation
User ids that we failed to identify
For (1), if the errors are due to an incorrect Taxonomy, please post a corrected Taxonomy prior to reprocessing the failed audience data.
File also contains sample errors found that cannot be recovered without re-uploading fixed data. These are usually badly formatted JSON records, represented as:
{
"errorCode" : "DxAudBadJson",
"errorSample" : "[{\"cell_error_info\":\"{something really wrong\"}]"
},
Please note that escaped quotes “ are used because the failed (JSON) record is represented as-is within the error JSON for easy programmatic extraction.
Previous Format:
{"err":"DxUrnMissing","val":"[{\"cell_error_info\":\"{\\\"something\\\":\\\"wrong\\\"}\"}]"}
{"err":"DxAudBadJson","val":"[{\"cell_error_info\":\"{something really wrong\"}]"}
{"err":"DxAudAttributeNotInRange","val":"[{\"cell_error_info\":\"something\"}]"}
{"err":"DxAudBadTaxoId","val":"[{\"cell_error_info\":\"413552\"}]"}
{"err":"DxAudBadTaxoId","val":"[{\"cell_error_info\":\"413551\"}]"}
New Format (JSON Array Structure):
[ {
"errorCode" : "DxUrnMissing",
"errorSample" : "[{\"cell_error_info\":\"{\\\"something\\\":\\\"wrong\\\"}\"}]"
}, {
"errorCode" : "DxAudBadJson",
"errorSample" : "[{\"cell_error_info\":\"{something really wrong\"}]"
}, {
"errorCode" : "DxAudBadTaxoId",
"errorSample" : "[{\"cell_error_info\":\"413551\"}]"
}, {
"errorCode" : "DxAudAttributeNotInRange",
"errorSample" : "[{\"cell_error_info\":\"something\"}]"
}, {
"errorCode" : "DxAudBadTaxoId",
"errorSample" : "[{\"cell_error_info\":\"413552\"}]"