Terracotta Data Dictionary

At the end of an experiment, the researcher clicks "Export Data" to download deidentified research data to their browser in a ZIP file.   This ZIP file contains a variety of CSV files describing the experiment design, the contents of assignments, and participants' behavior and performance.  When an experiment uses informed consent, the export only includes data from consenting students; and when an experiment uses a manual selection process, the export only includes data from selected students.  The current Data Dictionary provides comprehensive documentation of the format and structure of each CSV file in the export, and the data contained in each column.

Additionally, a Terracotta data export contains a file called events.json.  This JSON-formatted file contains event logs formatted according to IMS Global (1EdTech) Caliper 1,2 standard.  At this time, event logs are provided for the Assessment Profile and the Media Profile.  

Many of the elements in Terracotta's data exports reflect typical concepts in education and assessment. Thus, it is possible to map elements in the CSV files to the Common Education Data Standards (CEDS) vocabulary of education data.  We provide this alignment tool for anyone interested in such mappings. While experimental research concepts are not directly represented in the CEDS data model, we consider experimental conditions to be represented as different "assessment form algorithms," experimental treatments to be represented as different "assessment form versions," and exposure sets to be represented as different "assessment administrations."

Please note: data types referenced here refer to the MySQL version of SQL

Table of Contents

Schema

Data Dictionary

Schema

This article explains the data included in each .csv file in the data export. Each section begins with the file or table name, explains the data included in each column, and then gives a basic explanation of sample data. When necessary, sections also include additional clarifications the reader may find helpful.

File name/Table Name

[Purpose of the file as a whole]

Columns

Name Data Type Description Options


Example Data and Explanation

Additional Clarifications

Data Dictionary

experiment.csv

The experiment.csv file gives a general overview of the experiment’s data.

Columns

Name Data Type Description Options
experiment_id INTEGER Unique identifier for the experiment
course_id INTEGER Unique identifier for the course
experiment_title TEXT Descriptive title/name of the experiment
experiment_description TEXT Detailed summary of the experiment
exposure_type TEXT Describes the distribution of participants among the various treatment categories BETWEEN, WITHIN
participation_type TEXT Describes how participants join the experiment CONSENT, AUTOMATIC, MANUAL
distribution_type TEXT Describes how conditions will be distributed among participants EVEN, CUSTOM, ALL
export_at DATETIME Recorded date and time when the experimental data package was exported
enrollment_cnt INTEGER Cumulative number of students in the course site since the course began
participant_cnt INTEGER Total number of participants entered into the experiment
condition_cnt INTEGER Number of conditions included in the experiment
created_at DATETIME Recorded date and time when the experiment was created
started_at DATETIME Recorded date and time when the experiment was started

Example Data and Explanation

experiment_id course_id experiment_title experiment_description exposure_type participation_type distribution_type export_at enrollment_cnt participant_cnt condition_cnt
521 38 Pre Question Experiment Do pre-questions work? WITHIN CONSENT EVEN 4/29/2022 7:27:58 PM 105 89 2

The table above includes sample data from an experiment. The experiment and course ID numbers are 521 and 38, respectively, to help connect this table to others. The experiment’s title is Pre Question Experiment, with the description, Do pre-questions work?. In this experiment, participants in a course were given consent forms to fill out; 89 agreed to participate. Researchers were interested in the effects of pre-questions on participants’ ability to learn information. They wanted to see if engaging with a quiz about lesson content before an assignment would increase the average grade on a subsequent assignment. As this is a within subject design, half of the 89 participants will complete the first assignment with no pre-questions as a control. The other half will complete the first assignment with pre-questions as a treatment. These two groups will then alternate having pre-questions and not having them for the second assignment. By default, since all participants will be experiencing the control and treatment, the treatment is distributed evenly. Data for this experiment was exported at 4/29/2022  7:27:58 PM.

Additional Clarifications

  • This file will produce only one row as it is treated in Terracotta as a single experiment. Additional experiments will be packaged as their own zip file which contains the various exported data files from an experiment.
  • In the instance where experimental data is carried over to another course, a new experiment ID will be generated for that experiment.
  • The value for enrollment_cnt is the cumulative number of LMS users with the student role who have ever been in the course site, which may be different from the course enrollment at any given moment. For example, if a course has 30 students enrolled, but, during the first week of the class, 5 students drop the class and 5 new students add the class in their place. There may still be 30 students enrolled, but the enrollment_cnt will be 35 because we continue to include the 5 students who dropped while also adding the 5 new enrollments. We calculate the enrollment_cnt this way because it prevents a situation where the number of participants is greater than the enrollment, and furthermore, this method of calculating enrollment is more consistent than alternative methods that count the number of active enrollments at an arbitrary moment in time.

participants.csv

The participants.csv file contains deidentified consent-related data for participants.

Columns

Name Data Type Description Options
participant_id INTEGER Unique numeric identifier for a participant
consented_at DATETIME Recorded date and time when the participant’s consent was submitted
consent_source TEXT Mode of consent for a participant to enter into an experiment CONSENT, AUTOMATIC, MANUAL

Example Data and Explanation

participant_id consented_at consent_source
1185 3/22/2022 7:01:37 PM CONSENT

In this row, a participant is identified by a number, 1185, instead of by their name. On March 22nd 2022, at 7:01:37 PM, the participant affirmed consent to participate in a Terracotta experiment.

participant_treatment.csv

The participant_treatment.csv file describes the order and distribution of assignments for testing along with experimental conditions.

Columns

Name Data Type Description Options
participant_id INTEGER Unique numeric identifier for a participant
exposure_id INTEGER Identifier for the specific exposure given to a participant
condition_id INTEGER Numeric identifier for a specific experimental condition
condition_name TEXT Descriptive identifier for the condition that was used
assignment_id INTEGER Numeric identifier for an assignment
assignment_name TEXT Assignment title
assignment_due_date INTEGER Assignment due date as set in Canvas All dates appear in the YEAR/MONTH/DAY format, and times are set to GMT
treatment_id INTEGER Combination of a condition and an assignment
grade_type TEXT The score retained as the assignment grade MOST RECENT, CUMULATIVE, AVERAGE, HIGHEST
attempts_allowed TEXT Indicates if students were allowed to retry the assignment LIMITED, UNLIMITED
time_required_between_attempts INTEGER Amount of time students were required to wait before retrying the assignment
final_score INTEGER Score retained as a grade on the assignment

Example Data and Explanation

participant_id exposure_id condition_id condition_name assignment_id assignment_name treatment_id grade_type attempts_allowed time_required_between_attempts final_score
1185 747 950 No PreQuestion 292 Week 11 Assignment 431 AVERAGE 2 2.0 hours 25
1185 748 951 PreQuestion 293 Week 12 Assignment 434 AVERAGE 2 2.0 hours 22
1132 747 951 PreQuestion 292 Week 11 Assignment 432 AVERAGE 2 2.0 hours 20
1132 748 950 No PreQuestion 293 Week 12 Assignment 433 AVERAGE 2 2.0 hours 25

Let’s imagine there are two participants here in our list of participants: participant 1185 and participant 1132. There are two assignments, Week 11 Assignment with ID 292, and Week 12 assignment with ID 293. There are 2 possible conditions: No PreQuestion (ID 950), and PreQuestion (ID 951). During the first exposure instance, ID 747, participant 1185 is given a NoPrequestion and Week 11 Assignment combination, resulting in treatment ID 431. During that same exposure instance, participant 1132 receives the PreQuestion for the Week 11 Assignment, resulting in a treatment ID of 432.


During the next exposure instance, 748, participant 1185 is exposed to a PreQuestion for the Week 12 Assignment, resulting in a combination identified as treatment ID 434. Participant 1132, however, experiences No PreQuestion for the Week 12 Assignment, resulting in a treatment instance of ID 433.

Additional Clarifications

  • The number of rows should be equal to the number of participants multiplied by the number of conditions if the experiment has a WITHIN subject design. For a BETWEEN subjects design, the number of rows should be equal to the number of participants.
  • An exposure is treated as a length of time that a participant experiences a particular condition. Between-subject designs will only have one exposure. Within-subjects designs include multiple exposures, since split groups of participants experience all treatments included in an experiment.

submissions.csv

The submissions.csv file creates a record of participant submission and assignment data.

Columns

Name Data Type Description Options
submission_id INTEGER Numeric identifier for a particular submission
participant_id INTEGER Unique numeric identifier for a participant
assignment_id INTEGER Numeric identifier for an assignment
treatment_id INTEGER Combination of a condition and an assignment
submitted_at DATETIME Date and time when a submission was transmitted
calculated_score DECIMAL Score that is calculated by means of computer and/or human grading
override_score DECIMAL Alternative score assigned by the instructor instead of the calculated_score
final_score DECIMAL Score used in grade calculation

Example Data and Explanation

submission_id participant_id assignment_id treatment_id submitted_at calculated_score override_score final_score
579 1185 292 431 3/26/2022 1:57:11 AM 25 25 25

In this example, participant 1185 submitted an assignment (292) with a unique submission instance of 579. Their assignment has a unique treatment combination of 431 and was submitted on March 26th of 2022, at 1:57:11 AM. The calculated score for their submission in the computer was 25 with an override score of 25, meaning there were no changes. The final score was 25; since the override score matched the calculated score, the final score was the original calculated score.

Additional Clarifications

  • Participants can have multiple submissions. For example, imagine a participant submits an assignment 3 times. This can happen if an instructor allows participants to redo an assignment more than once. The instructor could then choose, based on course policy, whether to use the best score out of the 3 or an average and use that as an override score.

items.csv

The items.csv file is a collection of elements within an assignment.

Columns

Name Data Type Description Options
item_id INTEGER Unique identifier for a specific item
assignment_id INTEGER Numeric identifier for an assignment
treatment_id INTEGER Combination of a condition and an assignment
condition_id INTEGER Numeric identifier for a specific experimental condition
item_text TEXT Rich text element, typically written in a format for HTML elements that builds the item into a digital document
item_format TEXT Format in which the user sees and interacts with an item MC, PAGE_BREAK, SHORT_ANSWER

Example Data and Explanation

item_id assignment_id treatment_id condition_id item_text item_format
645 292 431 950 <iframe src="https://www.youtube.com/embed/AunjHffklZI?enablejsapi=1" data-youtube-id="AunjHffklZI" height="315" width="560" frameborder="0" allowfullscreen="true" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></iframe> MC

In assignment 292, the participant is exposed to condition 950, generating the treatment ID 432. The specific item with which the participant interacted, 645, has item text used to display an embedded Youtube video. The question must be answered through multiple choice.

Additional Clarifications

A list of the item format option meanings:

  • MC: multiple choice
  • PAGE_BREAK: not a question, but a separator for the organization of the HTML document
  • SHORT_ANSWER: text-entry format

response_options.csv

The response_options.csv file includes a collection of responses available for an item.

Columns

Name Data Type Description Options
response_id INTEGER Unique identifier for a particular response
item_id INTEGER Unique identifier for a specific item
response TEXT Description of a response option’s value
response_position TEXT Ordinal position of a response
correct TEXT Value denoting an answer’s correctness or incorrectness TRUE, FALSE

Example Data and Explanation

response_id item_id response response_position correct
1312 650 Non-traditional arithmetic practice (like __ = 3 + 4) A TRUE
1313 650 Traditional arithmetic practice (like 3 + 4 = __) B FALSE
1314 650 Trick question. No experiment has shown that experience can change children’s understanding of the equal sign. C FALSE
1315 650 Common mathematics textbooks D FALSE

Item 650, a multiple choice question, asks,

“An experimental study indicated that exposing children to __________ improved their understanding of the equal sign.”

The records above show four multiple choice response options. Each includes a unique response ID, a text description of the response, and the response’s placement relative to other responses. Three of the values are FALSE or incorrect, while only one is TRUE (correct). The TRUE value selection awards the participant points for that question.

Additional Clarifications

  • This table is used exclusively for multiple choice items at the current time.
  • For the response ID, the number can never be repeated (i.e., can’t have response_id “1” for several questions).

item_responses.csv

The item_responses.csv file records participants’ actual responses and the results.

Columns

Name Data Type Description Options
item_response_id INTEGER Unique identifier for an instance of a response
submission_id INTEGER Submission in which the response appears
assignment_id INTEGER Numeric identifier for an assignment
condition_id INTEGER Numeric identifier for a specific experimental condition
treatment_id INTEGER Combination of a condition and an assignment
participant_id INTEGER Unique numeric identifier for a participant
item_id INTEGER Unique identifier for a specific item
response_type TEXT The format of the question to which a response was made
response TEXT A description of the response option
response_position TEXT The ordinal placement of the chosen response option
correctness TEXT Signifying whether or not the chosen response was correct TRUE, FALSE
responded_at DATETIME Date and time when the response was chosen
points_possible DECIMAL Total number of points a participant can receive when responding to a question
calculated_score DECIMAL Points awarded through computer-calculated scoring
override_score DECIMAL Manually assigned score that takes precedence over the calculated score, if a numeric override_score value is provided

Example Data and Explanation

item_response_id submission_id assignment_id condition_id treatment_id participant_id item_id response_type response response_id response_position correctness responded_at points_possible calculated_score override_score
3323 579 292 950 431 1185 645 MC I confirm that I have watched the video. 1298 A TRUE 3/26/2022 1:57 25 25 N/A

Here, participant 1185 responded to item 3323, in assignment 579, with condition 950, and treatment 431. Item 645 was a multiple choice question. The answer the participant chose has a description of “I confirm that I have watched the video” as response 1298 in place A at 1:57AM on March 26th, 2022. The answer was true (correct), and the participant was awarded the full 25 points. There was no override score assigned, thus leaving that in the cell with a value of N/A.

Additional Clarifications

The term “correctness” is used in this table as there are sometimes multiple correct answers to an item. For example, it is possible within a multiple choice question with answers A, B, C, and D for other A and B to be correct. Thus, we use “correctness” to indicate the extent to which an answer is correct without excluding other correct answers.

outcomes.csv

An outcome (also known as a dependent variable) is a variable that may be affected by an experimental manipulation. For example, if an experimental manipulation to a homework assignment in Terracotta is expected to affect students' later scores on a midterm exam, then the midterm exam scores would be an outcome. In Terracotta, outcome scores are defined on the experiment's Status tab. Outcomes can be entered manually into Terracotta (by typing scores for each individual student), or can be imported from a Canvas gradebook item. In within-subjects designs, it may be important to define an outcome score for each exposure set. Outcome scores are typically measured after an experiment, but an outcome can also contain any numeric score that is relevant to an experiment (pretest scores, moderator variables, etc.). Outcome scores are included in Terracotta data exports.

Columns

Name Data Type Description Options
outcome_id INTEGER Unique identifier for a specific outcome within the experiment
participant_id INTEGER Unique numeric identifier for a participant
exposure_id INTEGER Identifier for the specific exposure given to a participant
source TEXT Place from which the outcome data was drawn
outcome_name TEXT Descriptive phrase for an instance of an outcome
points_possible DECIMAL Total number of points that can be awarded in the experiment for a participant
outcome_score DECIMAL Actual score a participant received within the outcome

Example Data and Explanation

outcome_id participant_id exposure_id source outcome_name points_possible outcome_score
141 1108 747 none Exam 3 Equals Sign 4 3

In experiment outcome 141, participant 1108 experienced exposure 747 with an outcome of their response leading to “Exam 3 Equals Sign.” The total points that could be awarded were 4, and this participant’s score was 3.