Yahoo Conversion API
  • 19 Minutes to read

    Yahoo Conversion API


      Article summary

      The Yahoo Conversion API can be used to send transaction data for campaign measurement, attribution, and optimization. Transaction data includes product ID, product quantity/dollar amount and other conversion-related attributes. Yahoo matches transaction data to campaign exposure data from the DSP using In-Flight Sales Analysis.

      Important

      Before you can send transaction data using the Conversion API, the product catalog must be sent via the S3/Partner Data Store (PDS) endpoint.

      Transaction Data can be shared in one of the following three ways.

      These endpoints support various implementation use cases:

      • Sending both online and offline purchase data via the Conversion API for Pixel API (preferred method).

      • Sending both online and offline purchase data through S3/PDS.

      • Sending purchase data using multiple methods (e.g., online data via DOT/Pixel API and offline data via S3).

      Deliver Product Catalog Through S3/Partner Data Store (PDS)

      Deliver the product catalog to the Yahoo S3 endpoint, which is updated on a recurring basis (preferably daily). This catalog contains a 3-level product hierarchy that users can search and select from the DSP UI when creating ISA rules. It includes the exact product IDs/SKUs provided in the sales data feed, ensuring precise tracking and accurate attribution of product-level sales transactions. The following section details the high level architecture and steps to set up delivery of product catalog data to Yahoo.

      Architecture & Data Flow

      File Format and Contents

      File Type: Tab delimited gzip/bzip2 file

      File Schema: Product ID, Product Owner, Product Brand, Product Name, Partner Attributes

      Data type: STRING for all columns.

      Column Name (not case sensitive)

      Mandatory?

      Description

      Product ID

      Yes

      Unique ID that identifies a UPC or SKU

      Product Owner

      Yes

      Brand name of the product owner e.g. GMI, Samsung

      Product Brand

      Yes

      Brand family that the product/upc belongs to e.g. Pillsbury. If the product/upc does not have a brand this field should be populated as NA

      Product Name

      Yes

      Descriptive name of the Product

      FLEXIBLE_VARIABLE_<attribute name 1>

      No

      Catch-all for partner specific attributes. For example, “FLEXIBLE_VARIABLE_new_model”, “flexible_variable_new_model”, “Flexible_Variable_NewModel”, etc.

      File format and Contents of “_manifest”

      The unit of the file size is “byte”.

      <.schema size><SPACE><.schema> --- <SPACE> delimited

      <file_1 size><SPACE><file_1.csv.bz2>

      <file_2 size> <file_2.csv.bz2>

      ...

      <file_n size> <file_n.csv.bz2>

      File format and Contents of “<.schema>”

      The schema file, <.schema>, could be either of the following two format:

      1. A Pig header file, ”.pig_header”, in CSV column header format with extended data type support:

      2. The standard Apache Pig schema file, “.pig_schema”, in JSON file format.

      Requirements from Data Provider

      Please provide the following to allow for load estimation of the data you plan to send:

      1. Upload frequency (daily or hourly)

      2. Number of files per batch upload and their size after compression, bz2 preferred.

      3. Projected data volume per day

      Partners can either load data into their own S3 bucket and provide necessary access to Yahoo, or upload data to a Yahoo S3 bucket. If using a Yahoo S3 bucket to deliver the data, Yahoo will need the public IP address ranges (in the form of IP CIDRs) of the hosts from which the data provider will upload files to the S3 location provided. A sample IP range list: "188.12.111.0/24, 188.12.112.0/30, 198.11.23.23/30".

      High Level Guidelines: Privacy, Security, & Performance

      The following are best practices for sharing data via S3:

      1. Do not send data for opted out users.

      2. Protect S3 bucket/location with security policy such as

        1. Disable public access

        2. Enforce HTTPS (TLS1.2 or above) connection

        3. Enable server side encryption (SSE-S3) w/ cypher key rotation at least every 12 months.

      3. Group files by feed types and day/hour and upload them to folders named by feed types and date/time. See “Directory Layout & File Format) section below.

      4. Limit the file size to around 1 GB (after bz2 compressed)

        1. For small data sets, limit the number of files to under 5 per hour.

      5. Avoid many small files.

      6. Support credential rotation: annually or in need.

      7. Secrets—for example, credentials—should be delivered in encrypted format: GPG public key to be provided by the receiver of the credentials. See details in the “Sharing Secrets with External Partners using GPG” section below.

      Directory Layout & File Format

      s3://<bucket name>/<3p-m>/product_catalog/yyyyMMdd[/hh]

      _manifest -- upload this after all other files are uploaded

      <.schema>

      <file_1.csv.bz2>  --- data file; TAB, ‘\t’, delimited

      <file_2.csv.bz2>

      ...

      <file_n.csv.bz2>

      Pig Data Types

      Simple Types in .pig_header

      Constant Value in .pig_schema

      Description

      Example

      INT

      10

      Signed 32-bit integer

      10

      LONG

      15

      Signed 64-bit integer

      Data: 10L or 10l

      Display: 10L

      FLOAT

      20

      32-bit floating point

      Data: 10.5F or 10.5f or 10.5e2f or 10.5E2F

      Display: 10.5F or 1050.0F

      DOUBLE

      25

      64-bit floating point

      Data: 10.5 or 10.5e2 or 10.5E2

      Display: 10.5 or 1050.0

      CHARARRAY

      55

      Character array (string) in Unicode UTF-8 format

      hello world

      BYTEARRAY

      50

      Byte array (blob)

      BOOLEAN

      5

      boolean

      true/false (case insensitive)

      DATETIME

      30

      datetime

      1970-01-01T00:00:00.000+00:00

      BIGINTEGER

      65

      Java BigInteger

      200000000000

      BIGDECIMAL

      70

      Java BigDecimal

      33.45678332

      Contacts of Operational alerts

      <provider-specific-name>_s3_feed_alerts@<provider-specific-name>.com

      Sharing Secrets with External Partners using GPG

      Background

      This document describes how company A, the secret sender, can share secrets with company B, the secret recipient, over the internet securely with GPG encryption.

      In a nutshell,

      1. Company B uses GPG to generate a pair of public key, b_armor.pub, and private key, b_armor.

        1. Company B needs to provide a password, passowrd_B,  in generating the public/private keys.

        2. Company B will need this password in decrypting the encrypted secret from company A.

      2. Company B emails b_armor.pub to company A

      3. Company A uses b_armor.pub to encrypt the secret, and emails the encrypted blob to company B.

      4. The company B uses b_armor (and password_B when prompted) to decrypt the encrypted blob.

      Detailed Steps

      1. Company A requests company B to generate (if not yet) and export their GPG public key. Company B follows instruction here: https://kb.iu.edu/d/awio

        1. Generate a key, assume company B’s email address is: “[email protected]

          1. gpg --gen-key

            1. Choose “(1) RSA and RSA (default)”

            2. Key length: 2048

            3. Expire in 2 weeks: 2w

            4. Full Name

            5. Email address: [email protected]

            6. Comment: GPG key w/ company A

            7. Enter passphrase: <password_B> (company B need in decrypting the secret)

        2. gpg -o b_armor.pub -a --export [email protected]

        3. Company B emails “b_armor.pub” as attachment to company A.

      2. Company A imports company B’s public key

        1. gpg --import “b_armor.pub”

      3. Company A puts secrets in “my_secret.txt” and encrypt it using b_armor.pub identified as “[email protected]

        1. gpg -o my_secret_armor.txt -a -e -r [email protected] my_secret.txt

      4. Company A emails “my_secret_armor.txt” as attachment to company B.

      Create and Assign or Assign an Existing Pixel ID (yp ID) as the Default Pixel for ISA Conversions

      Important

      In regards to multi-advertiser integrations, multiple advertisers will have different pixel IDs belonging to the same multi-vendor integration owner. Contact your Yahoo Account team for further details.

      Create a New Pixel and Assign It as the Default Pixel

      1. Under your advertiser in the DSP, navigate to the Tracking tab.

      2. Click Create New and Pixel. This generates the unique pixel ID that you can use to populate in the POST endpoint. You don’t need to place the pixel tracking code, but simply take the pixel ID from this to use it.

      3. Reach out to your account team to set the pixel as the default pixel for the ISA conversions.

      Assign an Existing Pixel as the Default Pixel

      Reach out to your account team to set an existing pixel as the default pixel for the ISA conversions.

      Transaction Data Delivery Through Browser to Server

      1. Place the Javascript Dot tag, which is linked to the default pixel ID you chose, on your product website (refer to example below).

        <script>(function(w,d,t,r){var q=[];w.ypr=function(e,t,d){q.push({e:e,t:t,d:d})};var s=d.createElement(t);s.src=r;s.async=!0;s.onload=s.onreadystatechange=function(){var y,rs=this.readyState;if(rs&&rs!="complete"&&rs!="loaded"){return};try{w.ypr=function(e,t,d){d.pixelId=<pixel_id_value_here>;YAHOO.ywa.I13N.firePostBeacon(e,t,d)};for(var n=0;n<q.length;n++){var o=q[n];w.ypr(o.e,o.t,o.d)};q=[]}catch(e){}};var scr=d.getElementsByTagName(t)[0],par=scr.parentNode;par.insertBefore(s,scr)})(window,document,"script","https://s.yimg.com/wi/ytc.js")</script>

        Important

        • If it is desired to use an existing online conversion pixel or ISA pixel as the default pixel, leave the existing code on the page and append the new pixel code from the ISA rule list page. Both the existing and new codes can coexist on the same page, but only one instance of the new code should be present.

        • If the default pixel needs to be changed at any time, contact your Yahoo Account team to update the default pixel assignment. Once the default pixel is changed, replace the existing default pixel code on your product website with the new code pointing to the new default pixel ID.

        • The default pixel can have multiple rules attached to it and the code for the default pixel only needs to be implemented once on your product website, provided the default pixel does not change.

      2. Make a call to the Javascript tag with transaction events (refer to example below). For more details on what fields are required for the call, refer to Required and Supported Fields.

        <!-- retailer's code -->
        <script>
        var config = {
        pageUrl: 'https://example.com',
        userData: {
        ids: {
        EMAIL: ['hashed_value'],
        PHONE: ['hashed_value'],
        },
        },
        eventData: {
        totalPrice: 123.45,
        currency: "usd",
        products: [
        {
        productId: "12345SKU", // unique id for product / service
        unitPrice: 123.45,
        quantity: 1, // count of product in transaction
        customKeyValues: {
        custom: "value",
        key: "hole"
        }
        }
        ]
        },
        customKeyValues: {
        key: 'value',
        },
        };
        function fireBeacon() {
        window.ypr('collect', 'purchase', config);
        }
        </script>

      Required and supported fields

      The required fields for a successful call are described in the table below.

      Field

      Type

      Required/Optional

      Description/Example

      pageUrl

      String

      Required

      The URL of the page.

      country

      String

      Optional

      The alpha-2 country code.

      customKeyValues

      Map<String, String>

      Optional

      The custom key value pairs provided by the advertiser.

      userData

      Object

      Required

      The wrapper object for all user data.

      userData.ids

      Map<String, List<String>>

      Required

      The ID type to list of ids for that type. Supported keys (types) include:

      • IDFA (iOS advertising id)

      • GPSAID (Android advertising id)

      • DEVICE_ID (Generic device identifier)

      • EMAIL (value MUST be sha256 hashed)

      • PHONE (value MUST be sha256 hashed)

      • PXID (value is a combination of <provider source id>:<provider id>)

        • Provider source ID will be provided by Yahoo to the advertiser.

      Example:

      userData: {

      ids: {

      "IDFA": ["082a1d3e-954b-4e84-8bbd-516c20e7d0ad"],

      "EMAIL":["4d0d74c86081eff9c3f51536dc82f0a7ab1824fb01954ae9d9516cfc36e8dc43"],

      "PXID":["42:4c86081eff9c3f51"]

      }

      }

      eventData

      Object

      Required

      The wrapper object for all event data.

      eventData.totalPrice

      Number

      Optional

      The total price of the entire purchase.

      eventData.currency

      String

      Optional

      The currency code for purchase (“USD” for example).

      products

      List<Product>

      Required

      The list of all the Product objects involved in the event, such as a purchase.

      Example:

      products: [

      {

      productId: "Yod/pur",

      unitPrice: 25.99,

      quantity: 2,

      customKeyValues: {

      "BRAND": "Yahoo",

      "PRODUCT_NAME": "Yodel button purple"

      }

      }

      ]

      product

      Object

      Required

      The wrapper object for product details.

      product.id

      String

      Required

      The unique id for the product, such as SKU.

      product.quantity

      Integer

      Optional

      The number of units for the product.

      product.unitPrice

      Number

      Required

      The price per individual product unit.

      customKeyValues

      Map<String, String>

      Optional

      A map of custom keys and values.

      Supported keys currently include:

      • BRAND

      • PRODUCT_NAME

      Example:

      customKeyValues: {

      "BRAND": "Yahoo",

      "PRODUCT_NAME": "Yodel button purple"

      }

      Transaction Data Delivery Through Server to Server

      1. The client creates authentication credentials via the Yahoo SSH public keys using the OAuth 2.0 credential provider, as explained in the section below.

      2. Yahoo creates a new set of encrypted credentials for the client.

      3. Yahoo and the client work together to generate an allow-list with a unique Pixel ID <pixelId> for the API POST call.

      4. Yahoo and the client work together to create new rules in the Yahoo DSP.

      5. The client begins to POST events to the Yahoo Conversion API.

      Endpoint Details

      The specification shown below identifies the key components of the Conversion API request, including the POST url for the online and offline conversions endpoint, header types and the POST body.

      Online Conversions Endpoint

      When using the online conversion endpoint, data is processed multiple times a day, but the rate limit is tighter.

      POST https://dataxonline.yahoo.com:443/v1/conversions/<pixelId>/events

      Note

      The online endpoint does not accept offline conversion events. If offline conversion events are submitted to the online endpoint, a partial success http error (HTTP 206) is returned.

      • ActionSource cannot be equal to the 'physical_store'.

      Offline Conversions Endpoint

      When using the offline conversion endpoint, data is processed daily but the rate limit is higher.

      POST https://datax.yahoo.com:443/v1/conversions/<pixelId>/events

      Note

      If submitting online events to this endpoint, processing is slower.

      Headers

      Content-Type: application/json
      
      Accept: application/json
      
      Authorization: Bearer <access_token>

      POST body - JSON keys

      [
        {
          "eventName": "addToCart",
          "eventId": "event1",
          "eventTs": 1713545795,
          "actionSource": "web",
          "actionSourceUrl": "http://store.com",
          "country": "USA",
          "region": "NA",
          "userData": {
             "email" : [ "email1_hash", "email2_hash" ],
             "gpsaid" : ["gpsaid_hash"],
             "phone" : ["phone_hash","phone2_hash"],
             "pxid" : ["pxid_key:pxid_value"],
             "ip_address" : "clientIp_hash",
             "userAgent" : "Y Browser"
          },
          "privacy" : {
            "optOut": false,
            "data_processing_options": ["LDU"],
            "data_processing_options_country": 0,
            "data_processing_options_state": 0
          },
          "order": {
            "orderId": "order1",
            "price": 1.22,
            "currency": "USD",
            "products" : [
              {
                "id": "abc1",
                "name": "duck",
                "brand": "Rubber",
                "quantity": 2,
                "unitPrice": 0.11,
                "category": "bath",
                "subCategory": "toys"
              },{  
                "id": "abc2",
                "name": "rake",
                "brand": "Wood",
                "unitPrice": 1.0,
                "category": "garden",
                "subCategory": "tools"
              }
            ]
          },
          "customData": {
            "attributes": {
              "promo": "abc1",
              "store": "store1"
            }
          }
        }
      ]

      Required and supported fields

      The required fields for a successful JSON body POST are described in the table below.

      Field

      Type

      Required/Optional

      Description / Example

      eventName

      Enum

      Required

      The type of conversion/action that triggers the event. Supported values include:

      • purchase

      eventId

      String

      Required

      The custom external id for the transaction event used for identifying the digital event. This should be a unique value. If the ID is not unique, duplicate events will be dropped.

      eventTs

      Integer

      Required

      The epoch timestamp of the event.

      actionSource

      Enum [web, app, phone, email, online, physical_store]

      Required

      The digital or physical source of the event. If the value is physical_store, make sure to submit the event to the offline conversion endpoint.

      actionSourceUrl

      String

      Optional

      The URL of the website where the conversion occurred. Null in case of some actionSources, such as phone or physical_store.

      country

      String

      Optional

      Two characters. For example, US.

      region

      String

      Optional

      Examples: APAC, NAR, EMEA, LATAM, ROW

      userData

      Object

      Required. Ensure that either the email, phone, gpsaid, idfa, pxid, sid or bid field is supplied as well.

      The wrapper object for all user data.

      userData.email

      List<string>

      Optional

      Emails sha256 hashed

      userData.phone

      List<string>

      Optional

      Phone number sha256 hashed

      userData.gpsaid

      List<string>

      Optional

      The Android advertising ID

      userData.idfa

      List<string>

      Optional

      The Apple advertising ID

      userData.pxid

      List<string>

      Optional

      3rd party external identifier. Format: pxIdSrcId + ‘:’ + pxIdValue. Provider source ID will be provided by Yahoo to the advertiser.

      userData.ip_address

      String

      Optional

      Ip address sha

      userData.userAgent

      String

      Optional

      The HTTP header passed by the browser.

      privacy

      Object

      Required if sending data for any field within the order object is needed.

      The wrapper object for all privacy data.

      privacy.optOut

      Boolean

      Optional

      A flag that indicates we should not use this event for ads delivery optimization.

      order

      Object

      Optional

      The wrapper object for all order data.

      order.orderId

      String

      Optional

      The sale/transaction unique id

      order.price

      Number

      Optional

      The total sales price at order/basket/cart level.

      order.currency

      String

      Optional

      The ISO 4217 code of currency corresponding to price, null is unknown.

      order.products

      Array

      products : {

      id : string,

      name : string,

      brand : string,

      quantity : integer,

      unitPrice : number,

      category : string,

      subCategory : string

      }

      Required

      The list of product(s) included in an order. This includes details on a product’s ID, name, brand, quantity, unit price, category and subcategory.

      quantity=1 at default if not defined

      customData.attributes

      Map<string,string>

      Optional

      Custom key value pairs

      OAUTH 2.0 AUTHENTICATION

      The Conversion API is a server-to-server implementation that requires OAuth 2.0 authentication before any data can be posted to Yahoo Ad Tech servers.

      OAuth 2.0 is a mechanism that relies on continuously refreshing authentication tokens. Clients are providing those tokens during the posting of data. Note that Yahoo Ad Tech has no plans to support OAuth 1.0, which depends on static tokens.

      Follow the steps outlined below to create your Client ID and Secret for secure authentication.

      Request Client Credentials

      Note

      Requesting client credentials includes an internal allowlist approval process that may require additional time for the setup to be completed.

      To complete the steps below, you first need a Client ID and a Client Secret. Follow the steps outlined below to request them.

      1. Generate a private key.

        >> openssl genpkey -aes256 -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out private_key.pem
      2. Generate a public key using the above private key.

        >> openssl rsa -in private_key.pem -out public_key.pem -outform PEM -pubout
      3. Send the public key to your Yahoo representative.

      4. Yahoo will then send a file containing credentials encrypted with the above public key for you to use.

      5. Decrypt the file with the private key.

        >> openssl rsautl -decrypt -inkey private_key.pem -in credential.enc -out my_credentials.txt

      Post-Credential OAuth 2.0 Workflow

      Once you’ve received your encrypted credentials from your Yahoo representative, follow these steps:

      Step 1: The external provider calls the ID B2B server to get the access_token, which is valid for 60 minutes.

      Step 2: Provider calls the POST /identity/oauth2/access_token endpoint of ID B2B with the JWT token created out of the provided client_id and the client credential.

      Sample Request
      curl -X POST 'https://id.b2b.yahooinc.com/identity/oauth2/access_token' \
      -H 'Content-Type: application/x-www-form-urlencoded' \
      --data-urlencode 'grant_type=client_credentials' \
      --data-urlencode 'client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer' \
      --data-urlencode 'client_assertion=<jwt_token>' \
      --data-urlencode 'scope=pixel-event' \
      
      --data-urlencode 'realm=dataxonline'

      Note

      If onboarding and using staging credentials against staging/sandbox endpoints, then use https://id-uat.b2b.yahooinc.com/identity/oauth2/access_token for the endpoint in the call.

      Sample Response
      {
       "access_token": "wcf1011c-70fe-4740-b8a1-781d2b4dd3q3",
           "scope": "pixel-event",
           "token_type": "Bearer",
           "expires_in": 3599
      }

      Generating JSON Web Token (JWT)

      The JSON Web Token is composed of three main parts:

      1. Header: normalized structure specifying how the token is signed (generally using the HMAC SHA-256 algorithm).

      2. Free set of claims embedding whatever you want: client_id, aud, expiration date, etc.

      3. Signature ensuring data integrity.

      The signature mechanism is HMAC_SHA256 as defined by the JOSE specifications:

      https://tools.ietf.org/html/draft-ietf-jose-json-web-signature-31
      JWT Header
      {
      "alg": "HS256",
      "typ": "JWT"
      }
      
      JWT Claims
      {
      
      "aud": "https://id.b2b.yahooinc.com/identity/oauth2/access_token?realm=dataxonline,
      "iss": "{client_id}",
      "sub": "{client_id}",
      "exp": {expiry time in seconds},
      "iat": {issued time in seconds},
      "jti": "{UUID}"
      }

      Note the following:  

      • “exp” and “iat” values should be numeric. Do not set them as strings.

      • “exp” value is currentTime + 3600 (i.e. 60 minutes).

      • Don’t use currentTime + (24 * 60 * 60). You may get a “JWT has expired or is not valid” error.

      • UUID - A Universally Unique IDentifier, https://www.ietf.org/rfc/rfc4122.txt

      Walking through manual steps to build this JWT value

      jwt_signing_string = base64url_encode(jwt_header) + '.' + base64url_encode(jwt_body)
      
      jwt_signature = base64url_encode(hmac_sha256(jwt_signing_string, client_secret))
      
      JWT = jwt_signing_string + '.' + jwt_signature

      You can find JWT libraries at https://openid.net/developers/libraries/.

      Step 3: The client extracts the access_token from the response and makes calls to the Conversion API endpoint with the access_token in the Authorization header.

      Sample Request
      curl -X POST \
          https://datax.yahoo.com:443/v1/conversions/<pixelId>/events
          -H 'authorization: Bearer 9f4c74cf-5bb9-45ce-987c-e5240e5710b8' \
          -H 'cache-control: no-cache' \
          -H 'content-type: application/json' \
      
          -d '[
          {
          "event_time": "1632847109",
          "action_source": "WEBSITE",
          "action_source_url": "www.test.com",
      
          "user_data": {
          "idfa": "e5b50a8b-3a77-4f83-aff4-68aa167f7c67",
          "vmcid": "p$g,o$96051f32-00a9-11e5-ae6b-ef933523f69b-7fa00a750700,t$2173541273654",
          "email": "17a6624c439a77854504c6987bee2a7fd2deb078aab26d48d051b2af70a4ea2f",
          },
      
          "custom_data": {
           "value": "test_value",
           "ec": "test_category",
           "el": "test_label",
           "ea": "test_action",
           "product_id": ["product_id1", "product_id2"],
           "user_defined": {
                  "addToCart" : "true",
                  "signUp" : "true",
                  "purchaseAmount" : "1500"
           }
          }}
            ]
      Sample Response
      {
      "success": true
      }

      However, if the client access_token is not found in the cache, we will call the ID B2B server endpoint identity/oauth2/introspect.

      Step 4: The API will verify the client access_token. It first checks if the client’s access_token in the header of the request is present in the ClientAccessToken cache. If found in the cache, we don’t call the IDB2B server for access_token verification. We check the value in the cache to find out if it is valid or invalid.

      However, if the client access_token is not found in the cache, we will call the ID B2B server endpoint identity/oauth2/introspect.

      • admin_access_token is the access token for DataX online API, which will be sent in the Bearer Authorization header.

      • client_access_token is the access_token of the client or the external provider which has to be verified

      Sample Response
      {
      "active": false
      }

      Step 5: The result of the client access_token introspection with ID B2B will be saved in the ClientAccessToken cache with the key as client access_token and the response from Id B2B server as the cache value. The cache expiration is 7 minutes.

      Error Responses

      The following return codes can come back in response:

      Code

      Message

      Reason

      200

      Submission processed.

      Valid request

      206

      Submission is partially processed.

      Submission included invalid events

      400

      Error. Unsupported Content-Type.

      Invalid Content-Type provided.

      400

      Error. Missing body and no query parameters provided.

      Missing query params and body.

      400

      Error. Request body/params formatting error.

      Unable to parse request body/params. Failed to decode KVs in request body. Failed to decode KVs in query params.

      401

      Error. Invalid ‘Authorization’ HTTP Header. Request a new token. (Not enforced in initial release)

      Invalid Authorization token.

      429

      Request is rate limited.

      500

      Internal Server Error

      502

      External Server Error

      An external call failed during serving the query.

      Transaction Data Delivery Through S3/Partner Data Store (PDS)

      The following section details the high level architecture and steps to set up the delivery of data files to designated  AWS S3 locations for Yahoo to download.

      Architecture & Data Flow

      Onboarding Requirements from Data Provider

      Please provide the following to allow for load estimation of the data you plan to send.

      1. Upload frequency: assume daily.

      2. Number of files per batch upload and their size after compression, bz2 preferred.

      3. Projected data volume per day

      High Level Guidelines: Privacy, Security, & Performance

      The following are best practices for sharing data via S3:

      1. Do not send data for opted out users.

      2. Do not send duplicate user events. Should deplication can not be avoided due to integration with multiple data onboarding endpoints such as S3, Conversion API, and DOT Pixel, an extra string field, ‘‘event_id”, needs to be sent for each event on all endpoints, so that Yahoo system can use it as a dedupe key.

      3. Protect S3 bucket/location with security policy such as

        1. Disable public access

        2. Enforce HTTPS (TLS1.2 or above) connection

        3. Enable server side encryption (SSE-S3) w/ cypher key rotation at least every 12 months.

      4. Group files by feed types and day/hour and upload them to folders named by feed types and date/time. See “Directory Layout & File Format) section below.

      5. Limit the file size to around 1 GB (after bz2 compressed)

        1. For small data sets, limit the number of files to under 5 per hour.

      6. Avoid many small files.

      7. Support credential rotation: annually or in need.

      8. Secrets, e.g., credentials, should be delivered in encrypted format: GPG public key to be provided by the receiver of the credentials. See details in the “Sharing Secrets with External Partners using GPG” section below.

      Directory Layout & File Format

      s3://<bucket name>/<3p-m>/<feed_n>/yyyyMMdd

      _manifest – 0 byte; upload completion marker; upload this after all other files are uploaded

      <file_1.csv.bz2>  --- data file; TAB, ‘\t’, delimited

      <file_2.csv.bz2>

      ...

      <file_n.csv.bz2>

      File format and Contents of “_manifest”

      The unit of the file size is “byte”.

      <.schema size><SPACE><.schema> --- <SPACE> delimited

      <.meta size><SPACE><.meta>

      <file_1 size><SPACE><file_1.csv.bz2>

      <file_2 size> <file_2.csv.bz2>

      <file_n size> <file_n.csv.bz2>

      File format and Contents of “<.schema>”

      The schema file, <.schema>, could be either of the following two formats.

      1. A Pig header file, ”.pig_header”, in CSV column header format with extended data type support:

        1. <column_header:data_type>[,<column_header:data_type>]{*}

        2. Supported data types: http://pig.apache.org/docs/latest/basic.html#data-types

        3. https://pig.apache.org/docs/r0.17.0/api/constant-values.html#org.apache.pig.data.DataType

      2. The standard Apache Pig schema file, “.pig_schema”, in JSON file format.

        Note

        The names, data types, and order of the attributes in the “.schema” should match the columns in the data files.

      Pig Data Types

      Simple Types in .pig_header

      Constant Value in .pig_schema

      Description

      Example

      INT

      10

      Signed 32-bit integer

      10

      LONG

      15

      Signed 64-bit integer

      Data: 10L or 10l

      Display: 10L

      FLOAT

      20

      32-bit floating point

      Data: 10.5F or 10.5f or 10.5e2f or 10.5E2F

      Display: 10.5F or 1050.0F

      DOUBLE

      25

      64-bit floating point

      Data: 10.5 or 10.5e2 or 10.5E2

      Display: 10.5 or 1050.0

      CHARARRAY

      55

      Character array (string) in Unicode UTF-8 format

      hello world

      BYTEARRAY

      50

      Byte array (blob)

      BOOLEAN

      5

      boolean

      true/false (case insensitive)

      DATETIME

      30

      datetime

      1970-01-01T00:00:00.000+00:00

      BIGINTEGER

      65

      Java BigInteger

      200000000000

      BIGDECIMAL

      70

      Java BigDecimal

      33.45678332

      Contacts of Operational alerts

      <provider-specific-name>_s3_feed_alerts@<provider-specific-name>.com

      Sharing Secrets with External Partners using GPG

      Background

      This document describes how company A, the secret sender, can share secrets with company B, the secret recipient, over the internet securely with GPG encryption.

      In a nutshell,

      1. Company B uses GPG to generate a pair of public key, b_armor.pub, and private key, b_armor.

        1. Company B needs to provide a password, passowrd_B,  in generating the public/private keys.

        2. Company B will need this password in decrypting the encrypted secret from company A.

      2. Company B emails b_armor.pub to company A

      3. Company A uses b_armor.pub to encrypt the secret, and emails the encrypted blob to company B.

      4. The company B uses b_armor (and password_B when prompted) to decrypt the encrypted blob.

      Detailed Steps

      1. Company A requests company B to generate (if not yet) and export their GPG public key. Company B follows instruction here: https://kb.iu.edu/d/awio

        1. Generate a key, assume company B’s email address is: “[email protected]

          1. gpg --gen-key

            1. Choose “(1) RSA and RSA (default)”

            2. Key length: 2048

            3. Expire in 2 weeks: 2w

            4. Full Name

            5. Email address: [email protected]

            6. Comment: GPG key w/ company A

            7. Enter passphrase: <password_B> (company B need in decrypting the secret)

        2. gpg -o b_armor.pub -a --export [email protected]

        3. Company B emails “b_armor.pub” as attachment to company A.

      2. Company A imports company B’s public key

        1. gpg --import “b_armor.pub”

      3. Company A puts secrets in “my_secret.txt” and encrypt it using b_armor.pub identified as “[email protected]

        1. gpg -o my_secret_armor.txt -a -e -r [email protected] my_secret.txt

      4. Company A emails “my_secret_armor.txt” as attachment to company B


      Was this article helpful?