- 18 Minutes to read
Yahoo Conversion API
- 18 Minutes to read
The Yahoo Conversion API can be used to send transaction data for campaign measurement, attribution, and optimization. Transaction data includes product ID, product quantity/dollar amount and other conversion-related attributes. Yahoo matches transaction data to campaign exposure data from the DSP using In-Flight Sales Analysis.
Important
Before you can send transaction data using the Conversion API, the product catalog must be sent via the S3/Partner Data Store (PDS) endpoint.
Transaction Data can be shared in one of the following three ways.
Conversions via DOT JS Pixel (Browser to Server): Send online transaction data, including product codes for individual transactions, using your existing pixel credentials.
Conversion API for Pixel API (Server to Server): Send transaction data using your existing pixel credentials to track individual transactions.
S3/Partner Data Store (PDS): Send transaction data using S3 credentials.
These endpoints support various implementation use cases:
Sending purchase data via the Conversion API (preferred method).
Sending purchase data through S3/PDS.
Sending purchase data using multiple methods (e.g., online data via DOT and/or offline data via Conversion API or S3).
Deliver Product Catalog Through S3/Partner Data Store (PDS)
Deliver the product catalog to the Yahoo S3 endpoint, which is updated on a recurring basis (preferably daily). This catalog contains a 3-level product hierarchy that users can search and select from the DSP UI when creating ISA rules. It includes the exact product IDs/SKUs provided in the sales data feed, ensuring precise tracking and accurate attribution of product-level sales transactions. The following section details the high level architecture and steps to set up delivery of product catalog data to Yahoo.
Architecture & Data Flow
File Format and Contents
File Type: Tab delimited gzip/bzip2 file
File Schema: Product ID, Product Owner, Product Brand, Product Name, Partner Attributes
Data type: STRING for all columns.
Column Name (not case sensitive) | Mandatory? | Description |
---|---|---|
Product ID | Yes | Unique ID that identifies a UPC or SKU |
Product Owner | Yes | Brand name of the product owner e.g. GMI, Samsung |
Product Brand | Yes | Brand family that the product/upc belongs to e.g. Pillsbury. If the product/upc does not have a brand this field should be populated as NA |
Product Name | Yes | Descriptive name of the Product |
FLEXIBLE_VARIABLE_<attribute name 1> | No | Catch-all for partner specific attributes. For example, “FLEXIBLE_VARIABLE_new_model”, “flexible_variable_new_model”, “Flexible_Variable_NewModel”, etc. |
File format and Contents of “_manifest”
The unit of the file size is “byte”.
<.schema size><SPACE><.schema> --- <SPACE> delimited
<file_1 size><SPACE><file_1.csv.bz2>
<file_2 size> <file_2.csv.bz2>
...
<file_n size> <file_n.csv.bz2>
File format and Contents of “<.schema>”
The schema file, <.schema>, could be either of the following two format:
A Pig header file, ”.pig_header”, in CSV column header format with extended data type support:
<column_header:data_type>[,<column_header:data_type>]{*}
Supported data types: http://pig.apache.org/docs/latest/basic.html#data-types
https://pig.apache.org/docs/r0.17.0/api/constant-values.html#org.apache.pig.data.DataType
The standard Apache Pig schema file, “.pig_schema”, in JSON file format.
Requirements from Data Provider
Please provide the following to allow for load estimation of the data you plan to send:
Upload frequency (daily or hourly)
Number of files per batch upload and their size after compression, bz2 preferred.
Projected data volume per day
Partners can either load data into their own S3 bucket and provide necessary access to Yahoo, or upload data to a Yahoo S3 bucket. If using a Yahoo S3 bucket to deliver the data, Yahoo will need the public IP address ranges (in the form of IP CIDRs) of the hosts from which the data provider will upload files to the S3 location provided. A sample IP range list: "188.12.111.0/24, 188.12.112.0/30, 198.11.23.23/30".
High Level Guidelines: Privacy, Security, & Performance
The following are best practices for sharing data via S3:
Do not send data for opted out users.
Protect S3 bucket/location with security policy such as
Disable public access
Enforce HTTPS (TLS1.2 or above) connection
Enable server side encryption (SSE-S3) w/ cypher key rotation at least every 12 months.
Group files by feed types and day/hour and upload them to folders named by feed types and date/time. See “Directory Layout & File Format) section below.
Limit the file size to around 1 GB (after bz2 compressed)
For small data sets, limit the number of files to under 5 per hour.
Avoid many small files.
Support credential rotation: annually or in need.
Secrets—for example, credentials—should be delivered in encrypted format: GPG public key to be provided by the receiver of the credentials. See details in the “Sharing Secrets with External Partners using GPG” section below.
Directory Layout & File Format
s3://<bucket name>/<3p-m>/product_catalog/yyyyMMdd[/hh]
_manifest -- upload this after all other files are uploaded
<.schema>
<file_1.csv.bz2> --- data file; TAB, ‘\t’, delimited
<file_2.csv.bz2>
...
<file_n.csv.bz2>
Pig Data Types
Simple Types in .pig_header | Constant Value in .pig_schema | Description | Example |
---|---|---|---|
10 | Signed 32-bit integer | 10 | |
15 | Signed 64-bit integer | Data: 10L or 10l Display: 10L | |
20 | 32-bit floating point | Data: 10.5F or 10.5f or 10.5e2f or 10.5E2F Display: 10.5F or 1050.0F | |
25 | 64-bit floating point | Data: 10.5 or 10.5e2 or 10.5E2 Display: 10.5 or 1050.0 | |
55 | Character array (string) in Unicode UTF-8 format | hello world | |
50 | Byte array (blob) | ||
5 | boolean | true/false (case insensitive) | |
30 | datetime | 1970-01-01T00:00:00.000+00:00 | |
65 | Java BigInteger | 200000000000 | |
70 | Java BigDecimal | 33.45678332 |
Contacts of Operational alerts
<provider-specific-name>_s3_feed_alerts@<provider-specific-name>.com
Sharing Secrets with External Partners using GPG
Background
This document describes how company A, the secret sender, can share secrets with company B, the secret recipient, over the internet securely with GPG encryption.
In a nutshell,
Company B uses GPG to generate a pair of public key, b_armor.pub, and private key, b_armor.
Company B needs to provide a password, passowrd_B, in generating the public/private keys.
Company B will need this password in decrypting the encrypted secret from company A.
Company B emails b_armor.pub to company A
Company A uses b_armor.pub to encrypt the secret, and emails the encrypted blob to company B.
The company B uses b_armor (and password_B when prompted) to decrypt the encrypted blob.
Detailed Steps
Company A requests company B to generate (if not yet) and export their GPG public key. Company B follows instruction here: https://kb.iu.edu/d/awio
Generate a key, assume company B’s email address is: “[email protected]”
gpg --gen-key
Choose “(1) RSA and RSA (default)”
Key length: 2048
Expire in 2 weeks: 2w
Full Name
Email address: [email protected]
Comment: GPG key w/ company A
Enter passphrase: <password_B> (company B need in decrypting the secret)
gpg -o b_armor.pub -a --export [email protected]
Company B emails “b_armor.pub” as attachment to company A.
Company A imports company B’s public key
gpg --import “b_armor.pub”
Company A puts secrets in “my_secret.txt” and encrypt it using b_armor.pub identified as “[email protected]”
gpg -o my_secret_armor.txt -a -e -r [email protected] my_secret.txt
Company A emails “my_secret_armor.txt” as attachment to company B.
Create and Assign or Assign an Existing Pixel ID (yp ID) as the Default Pixel for ISA Conversions
Important
In regards to multi-advertiser integrations, multiple advertisers will have different pixel IDs belonging to the same multi-vendor integration owner. Contact your Yahoo Account team for further details.
Create a New Pixel and Assign It as the Default Pixel
Under your advertiser in the DSP, navigate to the Tracking tab.
Click Create New and Pixel. This generates the unique pixel ID that you can use to populate in the POST endpoint. You don’t need to place the pixel tracking code, but simply take the pixel ID from this to use it.
Reach out to your account team to set the pixel as the default pixel for the ISA conversions.
Assign an Existing Pixel as the Default Pixel
Reach out to your account team to set an existing pixel as the default pixel for the ISA conversions.
Transaction Data Delivery Through Browser to Server
Place the Javascript Dot tag, which is linked to the default pixel ID you chose, on your product website (refer to example below).
<script>(function(w,d,t,r){var q=[];w.ypr=function(e,t,d){q.push({e:e,t:t,d:d})};var s=d.createElement(t);s.src=r;s.async=!0;s.onload=s.onreadystatechange=function(){var y,rs=this.readyState;if(rs&&rs!="complete"&&rs!="loaded"){return};try{w.ypr=function(e,t,d){d.pixelId=<pixel_id_value_here>;YAHOO.ywa.I13N.firePostBeacon(e,t,d)};for(var n=0;n<q.length;n++){var o=q[n];w.ypr(o.e,o.t,o.d)};q=[]}catch(e){}};var scr=d.getElementsByTagName(t)[0],par=scr.parentNode;par.insertBefore(s,scr)})(window,document,"script","https://s.yimg.com/wi/ytc.js")</script>
Important
If it is desired to use an existing online conversion pixel or ISA pixel as the default pixel, leave the existing code on the page and append the new pixel code from the ISA rule list page. Both the existing and new codes can coexist on the same page, but only one instance of the new code should be present.
If the default pixel needs to be changed at any time, contact your Yahoo Account team to update the default pixel assignment. Once the default pixel is changed, replace the existing default pixel code on your product website with the new code pointing to the new default pixel ID.
The default pixel can have multiple rules attached to it and the code for the default pixel only needs to be implemented once on your product website, provided the default pixel does not change.
Make a call to the Javascript tag with transaction events (refer to example below). For more details on what fields are required for the call, refer to Required and Supported Fields.
<!-- retailer's code --> <script> var config = { pageUrl: 'https://example.com', userData: { ids: { EMAIL: ['hashed_value'], PHONE: ['hashed_value'], }, }, eventData: { totalPrice: 123.45, currency: "usd", products: [ { id: "12345SKU", // unique id for product / service unitPrice: 123.45, quantity: 1, // count of product in transaction customKeyValues: { custom: "value", key: "hole" } } ], customKeyValues: { key: 'value', } } }; function fireBeacon() { window.ypr('collect', 'purchase', config); } </script>
Required and supported fields
The required fields for a successful call are described in the table below.
Field | Type | Required/Optional | Description/Example |
---|---|---|---|
pageUrl | String | Required | The URL of the page. |
country | String | Optional | The alpha-2 country code. |
userData | Object | Required | The wrapper object for all user data. |
userData.ids | Map<String, List<String>> | Required | The ID type to list of ids for that type. Supported keys (types) include:
Example: userData: { ids: { "IDFA": ["082a1d3e-954b-4e84-8bbd-516c20e7d0ad"], "EMAIL":["4d0d74c86081eff9c3f51536dc82f0a7ab1824fb01954ae9d9516cfc36e8dc43"], "PXID":["42:4c86081eff9c3f51"] } } |
eventData | Object | Required | The wrapper object for all event data. |
eventData.totalPrice | Number | Optional | The total price of the entire purchase. |
eventData.currency | String | Optional | The currency code for purchase (“USD” for example). |
products | List<Product> | Required | The list of all the Product objects involved in the event, such as a purchase. Example: products: [ { productId: "Yod/pur", unitPrice: 25.99, quantity: 2, customKeyValues: { "BRAND": "Yahoo", "PRODUCT_NAME": "Yodel button purple" } } ] |
product | Object | Required | The wrapper object for product details. |
product.id | String | Required | The unique id for the product, such as SKU. |
product.quantity | Integer | Optional | The number of units for the product. |
product.unitPrice | Number | Required | The price per individual product unit. |
product.customKeyValues | Map<String, String> | Optional | Product level custom key value map |
eventData.customKeyValues | Map<String, String> | Optional | A map of custom keys and values. Supported keys currently include:
Example: customKeyValues: { "BRAND": "Yahoo", "PRODUCT_NAME": "Yodel button purple" } |
Transaction Data Delivery Through Server to Server
The specification shown below identifies the key components of the Conversion API request, including the POST url for streaming and batch conversions endpoint, header types and the POST body.
Integration steps
The client creates authentication credentials via the Yahoo SSH public keys using the OAuth 2.0 credential provider, as explained in the section below.
Yahoo creates a new set of encrypted credentials for the client.
Yahoo and the client work together to generate an allow-list with a unique Pixel ID <pixelId> for the API POST call.
Yahoo and the client work together to create new rules in the Yahoo DSP.
The client begins to POST events to the Yahoo Conversion API.
Endpoint
Streaming Conversions Endpoint
When using the streaming conversion endpoint, data is processed multiple times a day, but the rate limit is tighter.
POST https://streaming.datax.yahoo.com/v1/events/<pixelId>
Batch Conversions Endpoint
When using the batch conversion endpoint, data is processed daily but the rate limit is higher.
POST https://batch.datax.yahoo.com/v1/events/<pixelId>
Headers
Content-Type: application/json
Accept: application/json
Authorization: Bearer <access_token>
POST body - JSON keys
[
{
"eventName": "addToCart",
"eventId": "event1",
"eventTs": 1713545795,
"actionSource": "web",
"actionSourceUrl": "http://store.com",
"country": "USA",
"region": "NA",
"userData": {
"email" : [ "email1_hash", "email2_hash" ],
"gpsaid" : ["gpsaid_hash"],
"phone" : ["phone_hash","phone2_hash"],
"pxid" : ["pxid_key:pxid_value"],
"ip_address" : "clientIp_hash",
"userAgent" : "Y Browser"
},
"privacy" : {
"optOut": false
},
"eventData": {
"price": 1.22,
"currency": "USD",
"products" : [
{
"id": "abc1",
"name": "duck",
"brand": "Rubber",
"quantity": 2,
"unitPrice": 0.11,
"category": "bath",
"subCategory": "toys",
"customKeyValues" : {
"vendorModel" : "abc"
}
},{
"id": "abc2",
"name": "rake",
"brand": "Wood",
"unitPrice": 1.0,
"category": "garden",
"subCategory": "tools"
}
],
"customKeyValues" : {
"orderId": "order1",
"promo": "abc1",
"store": "store1"
}
}
}
]
Required and supported fields
The required fields for a successful JSON body POST are described in the table below.
Field | Type | Required/Optional | Description / Example |
---|---|---|---|
eventName | Enum | Required | The type of conversion/action that triggers the event. Supported values include:
|
eventId | String | Required | The custom external id for the transaction event used for identifying the digital event. This should be a unique value. If the ID is not unique, duplicate events will be dropped. |
eventTs | Integer | Required | The epoch timestamp of the event. |
actionSource | Enum [web, app, phone, email, online, physical_store] | Required | The digital or physical source of the event. |
actionSourceUrl | String | Optional | The URL of the website where the conversion occurred. Null in case of some actionSources, such as phone or physical_store. |
country | String | Optional | Two characters. For example, US. |
region | String | Optional | Examples: APAC, NA, EMEA, LATAM, ROW |
userData | Object | Required. Ensure that either the email, phone, gpsaid, idfa, pxid, sid or bid field is supplied as well. | The wrapper object for all user data. |
userData.email | List<string> | Optional | Emails sha256 hashed |
userData.phone | List<string> | Optional | Phone number sha256 hashed |
userData.gpsaid | List<string> | Optional | The Android advertising ID |
userData.idfa | List<string> | Optional | The Apple advertising ID |
userData.pxid | List<string> | Optional | 3rd party external identifier. Format: pxIdSrcId + ‘:’ + pxIdValue. Provider source ID will be provided by Yahoo to the advertiser. |
userData.ip_address | String | Optional | Ip address sha |
userData.userAgent | String | Optional | The HTTP header passed by the browser. |
privacy | Object | Required if sending data for any field within the order object is needed. | The wrapper object for all privacy data. |
privacy.optOut | Boolean | Optional | A flag that indicates we should not use this event for ads delivery optimization. |
eventData | Object | Required | The wrapper object for commerce event data. |
eventData.price | Number | Optional | The total sales price at event level. |
eventData.currency | String | Optional | The ISO 4217 code of currency corresponding to price, null is unknown. |
eventData.products | Array products : { id : string, name : string, brand : string, quantity : integer, unitPrice : number, category : string, subCategory : string, customKeyValues : map<string,string> } | Required | The list of product(s) included in the event. This includes details on a product’s ID, name, brand, quantity, unit price, category and subcategory. quantity=1 at default if not defined |
eventData.customKeyValues | Map<string,string> | Optional | Custom key value pairs |
OAUTH 2.0 AUTHENTICATION
The Conversion API is a server-to-server implementation that requires OAuth 2.0 authentication before any data can be posted to Yahoo Ad Tech servers.
OAuth 2.0 is a mechanism that relies on continuously refreshing authentication tokens. Clients are providing those tokens during the posting of data. Note that Yahoo Ad Tech has no plans to support OAuth 1.0, which depends on static tokens.
Follow the steps outlined below to create your Client ID and Secret for secure authentication.
Request Client Credentials
Note
Requesting client credentials includes an internal allowlist approval process that may require additional time for the setup to be completed.
To complete the steps below, you first need a Client ID and a Client Secret. Follow the steps outlined below to request them.
Generate a private key.
>> openssl genpkey -aes256 -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out private_key.pem
Generate a public key using the above private key.
>> openssl rsa -in private_key.pem -out public_key.pem -outform PEM -pubout
Send the public key to your Yahoo representative.
Yahoo will then send a file containing credentials encrypted with the above public key for you to use.
Decrypt the file with the private key.
>> openssl rsautl -decrypt -inkey private_key.pem -in credential.enc -out my_credentials.txt
Post-Credential OAuth 2.0 Workflow
Once you’ve received your encrypted credentials from your Yahoo representative, follow these steps:
Step 1: The external provider calls the ID B2B server to get the access_token, which is valid for 60 minutes.
Step 2: Provider calls the POST /identity/oauth2/access_token
endpoint of ID B2B with the JWT token created out of the provided client_id and the client credential.
Sample Request
curl -X POST 'https://id.b2b.yahooinc.com/identity/oauth2/access_token' \
-H 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode 'client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer' \
--data-urlencode 'client_assertion=<jwt_token>' \
--data-urlencode 'scope=conversion-event' \
--data-urlencode 'realm=dataxonline'
Note
If onboarding and using staging credentials against staging/sandbox endpoints, then use https://id-uat.b2b.yahooinc.com/identity/oauth2/access_token for the endpoint in the call.
Sample Response
{
"access_token": "wcf1011c-70fe-4740-b8a1-781d2b4dd3q3",
"scope": "conversion-event",
"token_type": "Bearer",
"expires_in": 3599
}
Generating JSON Web Token (JWT)
The JSON Web Token is composed of three main parts:
Header: normalized structure specifying how the token is signed (generally using the HMAC SHA-256 algorithm).
Free set of claims embedding whatever you want: client_id, aud, expiration date, etc.
Signature ensuring data integrity.
The signature mechanism is HMAC_SHA256 as defined by the JOSE specifications:
https://tools.ietf.org/html/draft-ietf-jose-json-web-signature-31
JWT Header
{
"alg": "HS256",
"typ": "JWT"
}
JWT Claims
{
"aud": "https://id.b2b.yahooinc.com/identity/oauth2/access_token?realm=dataxonline,
"iss": "{client_id}",
"sub": "{client_id}",
"exp": {expiry time in seconds},
"iat": {issued time in seconds},
"jti": "{UUID}"
}
Note the following:
“exp” and “iat” values should be numeric. Do not set them as strings.
“exp” value is currentTime + 3600 (i.e. 60 minutes).
Don’t use currentTime + (24 * 60 * 60). You may get a “JWT has expired or is not valid” error.
UUID - A Universally Unique IDentifier, https://www.ietf.org/rfc/rfc4122.txt
Walking through manual steps to build this JWT value
jwt_signing_string = base64url_encode(jwt_header) + '.' + base64url_encode(jwt_body)
jwt_signature = base64url_encode(hmac_sha256(jwt_signing_string, client_secret))
JWT = jwt_signing_string + '.' + jwt_signature
You can find JWT libraries at https://openid.net/developers/libraries/.
Step 3: The client extracts the access_token from the response and makes calls to the Conversion API endpoint with the access_token in the Authorization header.
Sample Request
curl -X POST \
https://streaming.datax.yahoo.com:443/v1/events/123456 \
-H 'authorization: Bearer 9f4c74cf-5bb9-45ce-987c-e5240e5710b8' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '[
{
"eventName": "PURCHASE",
"eventId": "4CE3F5FD-B203-45EA-98BE-79FB28E92DF5",
"eventTs": 1733508168,
"actionSource": "web",
"actionSourceUrl": "http://store.com",
"country": "USA",
"region": "NA",
"userData": {
"email" : [ "536a09742acb5b4ec7c7d6c0e20a5d3f4318817817353b69f8ee15f27d3fc9fa",
"7ebd20af4a7ff32eb6331c108cb1aa2176c195f7f1b472fc7ebfd2051f89f8a1" ],
"gpsaid" : ["c2f11fe5-3600-4ade-901e-5cf84f2d71a5"],
"phone" : ["1036636844eea8b0c54623eee63eb5d83ad5b86c02cfa7a8da29a3c140c9b100",
"f4ef23f72996f81f2bbc90929eb7d1e6397cba597cbbd70b5afa717ad41e500f"],
"pxid" : ["999:XY50038zETeXJBOYNTRn7Z3T6VSkxDF5ZpRz3wvPEVmt1ZXHo"],
"ip_address" : "77785fc7b151a325a39d2c40a7701cb57736f2ce34b7edbef4e538f42c1509d3",
"userAgent" : "Y Browser"
},
"order": {
"orderId": "order1",
"price": 1.22,
"currency": "USD",
"products" : [
{
"id": "abc1",
"name": "duck",
"brand": "Rubber",
"quantity": 2,
"unitPrice": 0.11,
"category": "bath",
"subCategory": "toys"
},{
"id": "abc2",
"name": "rake",
"brand": "Wood",
"unitPrice": 1.0,
"category": "garden",
"subCategory": "tools"
}
]
},
"customData": {
"attributes": {
"promo": "abc1",
"store": "store1"
}
}
}
]'
Sample Response
{
"success": true
}
However, if the client access_token is not found in the cache, we will call the ID B2B server endpoint identity/oauth2/introspect
.
Step 4: The API will verify the client access_token.
Error Responses
The following return codes can come back in response.
Code | Message | Reason |
---|---|---|
200 | JSON: { success: "COMPLETE" } | Valid request. |
200 | JSON: { success: "PARTIAL", message: "{ <ERROR_TYPE>=<count> }" } | Submission included invalid events and so partially ingested. The returned success field is marked as partial and the message field contains the breakdown of error types and their respective counts. |
400 | Error. Unsupported Content-Type. | Invalid Content-Type provided. |
400 | Error. Missing body and no query parameters provided. | Missing query params and body. |
400 | Error. Request body/params formatting error. | Unable to parse request body/params. Failed to decode KVs in request body. Failed to decode KVs in query params. |
401 | Error. Invalid ‘Authorization’ HTTP Header. Request a new token. (Not enforced in initial release) | Invalid Authorization token. |
429 | Request is rate limited. | |
500 | Internal Server Error | |
502 | External Server Error | An external call failed during serving the query. |
Transaction Data Delivery Through S3/Partner Data Store (PDS)
The following section details the high level architecture and steps to set up the delivery of data files to designated AWS S3 locations for Yahoo to download.
Architecture & Data Flow
Onboarding Requirements from Data Provider
Please provide the following to allow for load estimation of the data you plan to send.
Upload frequency: assume daily.
Number of files per batch upload and their size after compression, bz2 preferred.
Projected data volume per day
High Level Guidelines: Privacy, Security, & Performance
The following are best practices for sharing data via S3:
Do not send data for opted out users.
Do not send duplicate user events. Should deplication can not be avoided due to integration with multiple data onboarding endpoints such as S3, Conversion API, and DOT Pixel, an extra string field, ‘‘event_id”, needs to be sent for each event on all endpoints, so that Yahoo system can use it as a dedupe key.
Protect S3 bucket/location with security policy such as
Disable public access
Enforce HTTPS (TLS1.2 or above) connection
Enable server side encryption (SSE-S3) w/ cypher key rotation at least every 12 months.
Group files by feed types and day/hour and upload them to folders named by feed types and date/time. See “Directory Layout & File Format) section below.
Limit the file size to around 1 GB (after bz2 compressed)
For small data sets, limit the number of files to under 5 per hour.
Avoid many small files.
Support credential rotation: annually or in need.
Secrets, e.g., credentials, should be delivered in encrypted format: GPG public key to be provided by the receiver of the credentials. See details in the “Sharing Secrets with External Partners using GPG” section below.
Directory Layout & File Format
s3://<bucket name>/<3p-m>/<feed_n>/yyyyMMdd
_manifest – 0 byte; upload completion marker; upload this after all other files are uploaded
<file_1.csv.bz2> --- data file; TAB, ‘\t’, delimited
<file_2.csv.bz2>
...
<file_n.csv.bz2>
File format and Contents of “_manifest”
The unit of the file size is “byte”.
<.schema size><SPACE><.schema> --- <SPACE> delimited
<.meta size><SPACE><.meta>
<file_1 size><SPACE><file_1.csv.bz2>
<file_2 size> <file_2.csv.bz2>
…
<file_n size> <file_n.csv.bz2>
File format and Contents of “<.schema>”
The schema file, <.schema>, could be either of the following two formats.
A Pig header file, ”.pig_header”, in CSV column header format with extended data type support:
<column_header:data_type>[,<column_header:data_type>]{*}
Supported data types: http://pig.apache.org/docs/latest/basic.html#data-types
https://pig.apache.org/docs/r0.17.0/api/constant-values.html#org.apache.pig.data.DataType
The standard Apache Pig schema file, “.pig_schema”, in JSON file format.
Note
The names, data types, and order of the attributes in the “.schema” should match the columns in the data files.
Pig Data Types
Simple Types in .pig_header | Constant Value in .pig_schema | Description | Example |
---|---|---|---|
10 | Signed 32-bit integer | 10 | |
15 | Signed 64-bit integer | Data: 10L or 10l Display: 10L | |
20 | 32-bit floating point | Data: 10.5F or 10.5f or 10.5e2f or 10.5E2F Display: 10.5F or 1050.0F | |
25 | 64-bit floating point | Data: 10.5 or 10.5e2 or 10.5E2 Display: 10.5 or 1050.0 | |
55 | Character array (string) in Unicode UTF-8 format | hello world | |
50 | Byte array (blob) | ||
5 | boolean | true/false (case insensitive) | |
30 | datetime | 1970-01-01T00:00:00.000+00:00 | |
65 | Java BigInteger | 200000000000 | |
70 | Java BigDecimal | 33.45678332 |
Contacts of Operational alerts
<provider-specific-name>_s3_feed_alerts@<provider-specific-name>.com
Sharing Secrets with External Partners using GPG
Background
This document describes how company A, the secret sender, can share secrets with company B, the secret recipient, over the internet securely with GPG encryption.
In a nutshell,
Company B uses GPG to generate a pair of public key, b_armor.pub, and private key, b_armor.
Company B needs to provide a password, passowrd_B, in generating the public/private keys.
Company B will need this password in decrypting the encrypted secret from company A.
Company B emails b_armor.pub to company A
Company A uses b_armor.pub to encrypt the secret, and emails the encrypted blob to company B.
The company B uses b_armor (and password_B when prompted) to decrypt the encrypted blob.
Detailed Steps
Company A requests company B to generate (if not yet) and export their GPG public key. Company B follows instruction here: https://kb.iu.edu/d/awio
Generate a key, assume company B’s email address is: “[email protected]”
gpg --gen-key
Choose “(1) RSA and RSA (default)”
Key length: 2048
Expire in 2 weeks: 2w
Full Name
Email address: [email protected]
Comment: GPG key w/ company A
Enter passphrase: <password_B> (company B need in decrypting the secret)
gpg -o b_armor.pub -a --export [email protected]
Company B emails “b_armor.pub” as attachment to company A.
Company A imports company B’s public key
gpg --import “b_armor.pub”
Company A puts secrets in “my_secret.txt” and encrypt it using b_armor.pub identified as “[email protected]”
gpg -o my_secret_armor.txt -a -e -r [email protected] my_secret.txt
Company A emails “my_secret_armor.txt” as attachment to company B