File onboarding dataset upload file specifications

23 min

getting started file onboarding overview guidelines here ensure that the audience application will properly read your dataset file when you upload it for data onboarding the document defines the file format requirements and content specifications of input files you upload file upload location to process watchfold, and one time datasets, you will be given sftp account access for initial analysis and custom projects, you can upload files directly into the /to cadent/ directory afterwards, please contact your support representative for processing for the user interface, you must create a dataset that includes a watch folder within the /to cadent/ directory where files can be uploaded to or a file can be directly uploaded from your computer or within a directory in the cadent sftp account review file processing status in audience after you begin to upload a file to audience, it's status displays on the datasets page color of the status icon indicates the current file status using a red, yellow, green color code roll your cursor over the icon for more detailed information user preference for email notification you have the option to receive email notifications containing acknowledgment and negative acknowledgment messages for specific dataset uploads contact your cadent representative if you want to receive these notifications file onboarding process match identifiers select the match identifier that represents the data column you want to capture from the data file this is the match key that the platform uses to find the association between your file and the household map after the platform associates the household map to your input id, you can then select the available output ids using the audience manager user interface following are the current ingest sources (household) crosswalk id — this is a cadent generated id associated with a zip 11 download an aperture household crosswalk id mapping file the file contains zip 11s tied to aperture household crosswalk ids download the file from here setup > crosswalk ids (household) postal address — this is the usps postal address in the format of address1, address2, city, state, zip not all fields are required note the following guidelines if you have a zip code, you do not need a city and state if you have a city and state, you do not need a zip address 2 is optional (household) zip+4 — this is the usps zip+4 code the last 4 digits represent a specific delivery route within the overall delivery area (household) zip11 — this is the usps zip 11, which is an 11 digit number that denotes a household ip address — this is the ip address of the data coming from a client’s device such as a desktop, laptop, mobile device, or connected tv device device id — this can be either a mobile device id such as apple’s identifier for advertising or google’s advertising id it can also be a connected tv or smart tv device id (household) client crosswalk id this is the client generated id associated with the zip 11, which is an 11 digit number that identifies the household to use the four client crosswalk ids, upload a mapping file with either zip 11s associated with a client crosswalk id postal addresses associated with a client crosswalk id mobile advertising id this is either apple’s identifier for advertising or google’s advertising id the mobile advertising id should be raw and unhashed hashed email this is an an email address that’s been converted into a string of numbers and letters by a hashing algorithm to transform email into an unrecognizable code taxonomy and segment ids if you plan to onboard data that has segments, you should create a taxonomy that defines, at a minimum, an association between a segment id and a segment name there are three principle reasons for including a taxonomy the taxonomy associates the segment ids with the segment names for more complex data, the taxonomy can define a hierarchy as well, which makes visualizing the output data more clear for the end user when sending data to cadent, use the segment id and not the segment name the segment name is seen by the buyer of the data in the execution platform the execution platform may be cadent, another data management platform, or a demand side platform taxonomy example the following an example of a csv formatted taxonomy from a used car dealership’s (flat) taxonomy \segment id (required)\ \segment name (required)\ \description (optional)\ ford owner ford purchased a ford vehicle toyota owner toyota purchased a toyota vehicle bmw owner bmw purchased a bmw vehicle the segment id cannot exceed 127 alphanumeric characters with no special characters except for an underscore examples of acceptable segment id formatting segment id, segment name, description ford owner,ford,purchased a ford vehicle toyota owner,toyota,purchased a toyota vehicle bmw owner,bmw,purchased a bmw vehicle onboarding a custom list of ip addresses the following an example of a csv formatted taxonomy from a used car dealership’s (flat) taxonomy \segment id (required)\ \segment name (required)\ \description (optional)\ ford owner ford purchased a ford vehicle toyota owner toyota purchased a toyota vehicle bmw owner bmw purchased a bmw vehicle the segment id cannot exceed 127 alphanumeric characters with no special characters except for an underscore examples of acceptable segment id formatting segment id, segment name, description ford owner,ford,purchased a ford vehicle toyota owner,toyota,purchased a toyota vehicle bmw owner,bmw,purchased a bmw vehicle input data file format in general, the input data is formatted as input id separated by segment ids that are associated with the input id based on the type of input id you select, you may need additional delimiters to separate fields in the input id for example, a postal address has multiple fields (address1, address2, city, state zip) that all need to have a delimiter you can upload a file that does not have any segment ids (sometimes done for a match test) in that case, the deployment partner where you plan to send the files must be able to understand the set of on boarded ids option 1 input data file with no header select a delimiter that separates your input id from your segment ids the input id delimiters can be a comma, pipe, or tab if your input id is a postal address, use the same delimiter to separate the different fields of the postal address input data file with segment ids to create the input data file with segment ids, choose one of these input data options option 1 file without a header option 2 file with header fields option 1 input data file with no header select a delimiter that separates your input id from your segment ids the input id delimiters can be a comma, pipe, or tab if your input id is a postal address, use the same delimiter to separate the different fields of the postal address select a delimiter that separates your segment ids when you have multiple segment ids the delimiter can only be a comma or pipe if you choose to use the same delimiter that separates your input id from your segment ids as the delimiter that separates your segment ids, make sure you know where your first segment id begins the following examples show two methods of using delimiters one uses two different delimiters (pipe and commas) and the other uses only commas you must use only one of these options example 1 using pipe and comma delimiters postal address data format address1|address2|city|state|zip|segment id1,segment id2,segment id3 sample data 123 main st |apt a|san mateo|ca|94403| ford owner,bmw owner zip 11 data format zip11|segment id1,segment id2,segment id3 sample data 94403050145|ford owner,bmw owner ip address data format ip address|segment id1,segment id2,segment id3 sample data 123 456 789 321|ford owner,bmw owner mobile advertising id data format maid|segment id1,segment id2,segment id3 sample data 123 456 789 321|ford owner,bmw owner example 2 using only comma delimiters postal address data format address, address2, city, state, zip, segment id1, segment id2, segment id3 sample data 123 main st ,apt a,san mateo,ca,94403,ford owner, bmw owner zip 11 data format zip11, segment id1, segment id2, segment id3 sample data 94403050145,ford owner,bmw owner ip address data format ip address, segment id1, segment id2, segment id3 sample data 123 456 789 321,ford owner,bmw owner mobile advertising id data format maid,segment id1,segment id2, segment id3 sample data a4842ac167cf470e87807a7b18420e10, ford owner, bmw o wner option 2 input data file with header adding a header to your input data file can make it easier to read when including a header, use the following format input id, segment data type1, segment data type2, segment data type3 segment data type represents the header for example, if you list someone’s preferred airline, the segment data type is preferred airline for each of the rows, in the segment data type column, you enter the segment id that is associated with the airline (name) example zip11 as input id zip11, preferred airline 99507148814,ua 99503712805,aa the following example assumes that you have already identified the following in the taxonomy file ua, united airlines aa, american airlines the following rules apply you must have a header row in your data file that includes data types as the headings the first column must indicate the input id in the case of a postal address, you must use multiple columns use the final columns to segment data types ensure that your segment ids are listed in correct columns, corresponding to the segment data types indicated in the header row a blank field for a segment id is permitted segment ids must be unique across all of your data for example, if you use the segment ids m and f to indicate genders male and female, you cannot use the segment ids m and f to indicate anything else across all of your data option 2 examples \address 1\ \address 2\ \city\ \state\ \zip\ \car 1\ \car 2\ 123 main st apt a san mateo ca 94403 ford owner bmw owner 456 broadway san francisco ca 94014 toyota owner 245 5th ave suite 100 new york ny 10001 bmw owner following is the (uncompressed) csv input file for the table above address 1,address 2,city,state,zip,car #1,car #2 123 main st , apt a,san mateo,ca,94403,ford owner, bmw owner 456 broadway, san francisco,ca,94014,toyota owner 245 5th ave ,suite 100,new york, ny, 10001, bmw owner zip 11 example zip11, car #1, car #2 99801937328,ford owner,bmw owner 99824532702,toyota owner 99501529301,bmw owner ip address as input id example ip address, car #1, car #2 107 3 29 156,ford owner,bmw owner 245 117 89 0,toyota owner 24 151 75 130,bmw owner mobile advertising id as input id example maids, car #1, car #2 1d7f322c c0bc 4fbe ad4db8fcf768f005, ford owner, bmw owner 2fe95975 33e2 47c5 98c1 bf138022b554,toyota owner 90a59b94 d50d 45e9 a4d3 1018197cf567,bmw owner input data file without segments the following rules apply for input data files without segment ids if your input data file does not have segment ids, place each input id in its own line for non postal address input ids, don’t include any headers or delimiters for postal addresses as input ids, add a header and delimiters for clarity the header is necessary for us to understand which postal address fields you are including as not all fields are needed use commas or pipes for postal address delimiters, but not both it is ok if you have duplicates as our system will remove them once uploaded examples of postal address as input id the following examples of postal address using commas as delimiters example 1 address1, address2 ,city, state, zip 123 main st , apt a,san mateo,ca, 94403 456 broadway, san francisco,ca,94014 245 5th ave ,suite 100,new york,ny,10001 example 2 address1, address2, city, state 123 main st , apt a, san mateo, ca 456 broadway,, san francisco, ca 245 5th ave , suite 100, new york, ny example 3 address1, address2, zip 123 main st , apt a, 94403 456 broadway,, 94014 245 5th ave ,suite 100,10001 examples for addresses that do not include address2 fields only use the examples 1, 2, and 3 if all of your addresses do not have an address2 field; otherwise, you will lose matches example 1 address1, city, state, zip 123 main st , san mateo, ca, 94403 456 broadway, san francisco, ca, 94014 245 5th ave ,new york, ny, 10001 example 1 address1, city, state 123 main st , san mateo, ca 456 broadway, san francisco, ca 245 5th ave , new york, ny example 3 address1, zip 123 main st ,94403 456 broadway, 94014 245 5th ave , new york, 10001 example zip 11 as input id 99801937328 99824532702 99501529301 example ip address as input id 107 3 29 156 245 117 89 0 24 151 75 130 mobile advertising id as input id you can put the raw, unhashed apple idfa or google aaid in the same file as in the following three examples 1d7f322c c0bc 4fbe ad4d b8fcf768f005 2fe95975 33e2 47c5 98c1 bf138022b554 90a59b94 d50d 45e9 a4d3 1018197cf567 file naming and gzip only one input id type per file is permitted the platform only accepts compressed files using gzip if your raw input data file has more than one type of delimiter, name the raw file with a txt extension, and gzip the txt file to get a final input file with a gz extension to send to audience studio if your raw input data file has only commas as delimiters, name the raw file with a csv extension, gzip the csv file to get a final input file with a gz extension to send to audience studio file naming and gzip compression make sure that you put the input id type somewhere in the name of the file here are the input id types you need to have in the name of the file below cadent hh crosswalk zip11 postal address maid ip hashed email