For one dataset specified in your email, write up a partial data encyclopedia and dictionary. You DO NOT NEED TO CREATE OR FIND A DATASET. But you do need imagine what a dataset you’ve described in your email would look like in order to describe its fields (see below). Examples of one dataset:

Salesforce
History of salaries and bonuses for each employee
Netflix
All customer ratings for each video
Include (see Bartlett 12.2 for more definitions of some of these terms):

Purpose of Dataset
Source of dataset
Time window (that the dataset represents)
Cost of data (to the company)
Collection techniques and tools
As relevant
See also Bartlett Chapter 10
Quality
Completeness
Data definitions and examples For each column (field) in the dataset
Name
Definition
Variable Classification (see Bartlett Table 12.1, p. 247)
Example data – Create at least 10 rows of example data
For any details you cannot find in the cases or through research, make up a reasonable description.

Notes

Again, do not go looking for a dataset. I want you to imagine what the dataset you’ve described in the email essay would look like, and then describe its fields.
Source of dataset. In my example, I include a link to the source of the dataset I’m describing. (I just describe the first 2 columns/fields.) This is not a good example of what I’m looking for, as I’d like you all to think a bit more deeply about the source of the dataset. This is also relevant to (5) and (6). How was the data collected? Where does it live internally? (If it lives internally.) For example, a dataset of all Netflix users and what videos they watched would have been collected from the server logs or on-site scripting, and then would end up in a user activity database at Netflix. Do your best to imagine some of these details and fill in the sections accordingly. Also, don’t worry if it’s 100% accurate, I want to see your critical thinking skills and don’t care if you know the exact right details or names of things.
For part 9, include an ID column. (The Bartlett material doesnt specifically mention that most datasets require an ID column.) For example, employee ID (Salesforce) or customer ID (Netflix). In the example document, the data is collected yearly, so the year is the ID field.
For part 10, I will be adding an example of how to simulate some basic data here on Wednesday or Thursday
Requirements

Minimum 3 columns in described dataset

Leave a Comment

Your email address will not be published. Required fields are marked *