Terracotta | Terracotta Data Dictionary

At the end of an experiment, the researcher clicks "Export Data" to download deidentified research data to their browser in a ZIP file. This ZIP file contains a variety of CSV files describing the experiment design, the contents of assignments, and participants' behavior and performance. When an experiment uses informed consent, the export only includes data from consenting students; and when an experiment uses a manual selection process, the export only includes data from selected students. The current Data Dictionary provides comprehensive documentation of the format and structure of each CSV file in the export, and the data contained in each column.

Additionally, a Terracotta data export contains a file called events.json. This JSON-formatted file contains event logs formatted according to IMS Global (1EdTech) Caliper 1,2 standard. At this time, event logs are provided for the Assessment Profile and the Media Profile.

Many of the elements in Terracotta's data exports reflect typical concepts in education and assessment. Thus, it is possible to map elements in the CSV files to the Common Education Data Standards (CEDS) vocabulary of education data. We provide this alignment tool for anyone interested in such mappings. While experimental research concepts are not directly represented in the CEDS data model, we consider experimental conditions to be represented as different "assessment form algorithms," experimental treatments to be represented as different "assessment form versions," and exposure sets to be represented as different "assessment administrations."

Please note: data types referenced here refer to the MySQL version of SQL

‍

Table of Contents

Schema

Data Dictionary

experiment.csv
participants.csv
participant_treatment.csv
submissions.csv
items.csv
response_options.csv
item_responses.csv
outcomes.csv

Schema

This article explains the data included in each .csv file in the data export. Each section begins with the file or table name, explains the data included in each column, and then gives a basic explanation of sample data. When necessary, sections also include additional clarifications the reader may find helpful.

File name/Table Name

[Purpose of the file as a whole]

Columns

Name	Data Type	Description	Options

‍

Example Data and Explanation

Additional Clarifications

‍

Data Dictionary

experiment.csv

The experiment.csv file gives a general overview of the experiment’s data.

Columns

Name	Data Type	Description	Options
experiment_id	INTEGER	Unique identifier for the experiment
course_id	INTEGER	Unique identifier for the course
experiment_title	TEXT	Descriptive title/name of the experiment
experiment_description	TEXT	Detailed summary of the experiment
exposure_type	TEXT	Describes the distribution of participants among the various treatment categories	BETWEEN, WITHIN
participation_type	TEXT	Describes how participants join the experiment	CONSENT, AUTOMATIC, MANUAL
distribution_type	TEXT	Describes how conditions will be distributed among participants	EVEN, CUSTOM, ALL
export_at	DATETIME	Recorded date and time when the experimental data package was exported
enrollment_cnt	INTEGER	Cumulative number of students in the course site since the course began
participant_cnt	INTEGER	Total number of participants entered into the experiment
condition_cnt	INTEGER	Number of conditions included in the experiment
created_at	DATETIME	Recorded date and time when the experiment was created
started_at	DATETIME	Recorded date and time when the experiment was started

‍

Example Data and Explanation

experiment_id	course_id	experiment_title	experiment_description	exposure_type	participation_type	distribution_type	export_at	enrollment_cnt	participant_cnt	condition_cnt
521	38	Pre Question Experiment	Do pre-questions work?	WITHIN	CONSENT	EVEN	4/29/2022 7:27:58 PM	105	89	2

‍

The table above includes sample data from an experiment. The experiment and course ID numbers are 521 and 38, respectively, to help connect this table to others. The experiment’s title is Pre Question Experiment, with the description, Do pre-questions work?. In this experiment, participants in a course were given consent forms to fill out; 89 agreed to participate. Researchers were interested in the effects of pre-questions on participants’ ability to learn information. They wanted to see if engaging with a quiz about lesson content before an assignment would increase the average grade on a subsequent assignment. As this is a within subject design, half of the 89 participants will complete the first assignment with no pre-questions as a control. The other half will complete the first assignment with pre-questions as a treatment. These two groups will then alternate having pre-questions and not having them for the second assignment. By default, since all participants will be experiencing the control and treatment, the treatment is distributed evenly. Data for this experiment was exported at 4/29/2022 7:27:58 PM.

Additional Clarifications

This file will produce only one row as it is treated in Terracotta as a single experiment. Additional experiments will be packaged as their own zip file which contains the various exported data files from an experiment.
In the instance where experimental data is carried over to another course, a new experiment ID will be generated for that experiment.
The value for enrollment_cnt is the cumulative number of LMS users with the student role who have ever been in the course site, which may be different from the course enrollment at any given moment. For example, if a course has 30 students enrolled, but, during the first week of the class, 5 students drop the class and 5 new students add the class in their place. There may still be 30 students enrolled, but the enrollment_cnt will be 35 because we continue to include the 5 students who dropped while also adding the 5 new enrollments. We calculate the enrollment_cnt this way because it prevents a situation where the number of participants is greater than the enrollment, and furthermore, this method of calculating enrollment is more consistent than alternative methods that count the number of active enrollments at an arbitrary moment in time.

participants.csv

The participants.csv file contains deidentified consent-related data for participants.

Columns

Name	Data Type	Description	Options
participant_id	INTEGER	Unique numeric identifier for a participant
consented_at	DATETIME	Recorded date and time when the participant’s consent was submitted
consent_source	TEXT	Mode of consent for a participant to enter into an experiment	CONSENT, AUTOMATIC, MANUAL

‍

Example Data and Explanation

participant_id	consented_at	consent_source
1185	3/22/2022 7:01:37 PM	CONSENT

‍

In this row, a participant is identified by a number, 1185, instead of by their name. On March 22nd 2022, at 7:01:37 PM, the participant affirmed consent to participate in a Terracotta experiment.

participant_treatment.csv

The participant_treatment.csv file describes the order and distribution of assignments for testing along with experimental conditions.

Columns

Name	Data Type	Description	Options
participant_id	INTEGER	Unique numeric identifier for a participant
exposure_id	INTEGER	Identifier for the specific exposure given to a participant
condition_id	INTEGER	Numeric identifier for a specific experimental condition
condition_name	TEXT	Descriptive identifier for the condition that was used
assignment_id	INTEGER	Numeric identifier for an assignment
assignment_name	TEXT	Assignment title
assignment_due_date	INTEGER	Assignment due date as set in Canvas	All dates appear in the YEAR/MONTH/DAY format, and times are set to GMT
treatment_id	INTEGER	Combination of a condition and an assignment
grade_type	TEXT	The score retained as the assignment grade	MOST RECENT, CUMULATIVE, AVERAGE, HIGHEST
attempts_allowed	TEXT	Indicates if students were allowed to retry the assignment	LIMITED, UNLIMITED
time_required_between_attempts	INTEGER	Amount of time students were required to wait before retrying the assignment
final_score	INTEGER	Score retained as a grade on the assignment

‍

Example Data and Explanation

participant_id	exposure_id	condition_id	condition_name	assignment_id	assignment_name	treatment_id	grade_type	attempts_allowed	time_required_between_attempts	final_score
1185	747	950	No PreQuestion	292	Week 11 Assignment	431	AVERAGE	2	2.0 hours	25
1185	748	951	PreQuestion	293	Week 12 Assignment	434	AVERAGE	2	2.0 hours	22
1132	747	951	PreQuestion	292	Week 11 Assignment	432	AVERAGE	2	2.0 hours	20
1132	748	950	No PreQuestion	293	Week 12 Assignment	433	AVERAGE	2	2.0 hours	25

‍

Let’s imagine there are two participants here in our list of participants: participant 1185 and participant 1132. There are two assignments, Week 11 Assignment with ID 292, and Week 12 assignment with ID 293. There are 2 possible conditions: No PreQuestion (ID 950), and PreQuestion (ID 951). During the first exposure instance, ID 747, participant 1185 is given a NoPrequestion and Week 11 Assignment combination, resulting in treatment ID 431. During that same exposure instance, participant 1132 receives the PreQuestion for the Week 11 Assignment, resulting in a treatment ID of 432.

During the next exposure instance, 748, participant 1185 is exposed to a PreQuestion for the Week 12 Assignment, resulting in a combination identified as treatment ID 434. Participant 1132, however, experiences No PreQuestion for the Week 12 Assignment, resulting in a treatment instance of ID 433.

Additional Clarifications

The number of rows should be equal to the number of participants multiplied by the number of conditions if the experiment has a WITHIN subject design. For a BETWEEN subjects design, the number of rows should be equal to the number of participants.
An exposure is treated as a length of time that a participant experiences a particular condition. Between-subject designs will only have one exposure. Within-subjects designs include multiple exposures, since split groups of participants experience all treatments included in an experiment.

submissions.csv

The submissions.csv file creates a record of participant submission and assignment data.

Columns

Name	Data Type	Description
submission_id	INTEGER	Numeric identifier for a particular submission
participant_id	INTEGER	Unique numeric identifier for a participant
assignment_id	INTEGER	Numeric identifier for an assignment
treatment_id	INTEGER	Combination of a condition and an assignment
submitted_at	DATETIME	Date and time when a submission was transmitted
calculated_score	DECIMAL	Score that is calculated by means of computer and/or human grading
override_score	DECIMAL	Alternative score assigned by the instructor instead of the calculated_score
final_score	DECIMAL	Score used in grade calculation

‍

Example Data and Explanation

submission_id	participant_id	assignment_id	treatment_id	submitted_at	calculated_score	override_score	final_score
579	1185	292	431	3/26/2022 1:57:11 AM	25	25	25

‍

In this example, participant 1185 submitted an assignment (292) with a unique submission instance of 579. Their assignment has a unique treatment combination of 431 and was submitted on March 26th of 2022, at 1:57:11 AM. The calculated score for their submission in the computer was 25 with an override score of 25, meaning there were no changes. The final score was 25; since the override score matched the calculated score, the final score was the original calculated score.

Additional Clarifications

Participants can have multiple submissions. For example, imagine a participant submits an assignment 3 times. This can happen if an instructor allows participants to redo an assignment more than once. The instructor could then choose, based on course policy, whether to use the best score out of the 3 or an average and use that as an override score.

items.csv

The items.csv file is a collection of elements within an assignment.

Columns

Name	Data Type	Description	Options
item_id	INTEGER	Unique identifier for a specific item
assignment_id	INTEGER	Numeric identifier for an assignment
treatment_id	INTEGER	Combination of a condition and an assignment
condition_id	INTEGER	Numeric identifier for a specific experimental condition
item_text	TEXT	Rich text element, typically written in a format for HTML elements that builds the item into a digital document
item_format	TEXT	Format in which the user sees and interacts with an item	MC, PAGE_BREAK, SHORT_ANSWER

‍

Example Data and Explanation

item_id	assignment_id	treatment_id	condition_id	item_text	item_format
645	292	431	950	<iframe src="https://www.youtube.com/embed/AunjHffklZI?enablejsapi=1" data-youtube-id="AunjHffklZI" height="315" width="560" frameborder="0" allowfullscreen="true" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></iframe>	MC

‍

In assignment 292, the participant is exposed to condition 950, generating the treatment ID 432. The specific item with which the participant interacted, 645, has item text used to display an embedded Youtube video. The question must be answered through multiple choice.

Additional Clarifications

A list of the item format option meanings:

MC: multiple choice
PAGE_BREAK: not a question, but a separator for the organization of the HTML document
SHORT_ANSWER: text-entry format

response_options.csv

The response_options.csv file includes a collection of responses available for an item.

Columns

Name	Data Type	Description	Options
response_id	INTEGER	Unique identifier for a particular response
item_id	INTEGER	Unique identifier for a specific item
response	TEXT	Description of a response option’s value
response_position	TEXT	Ordinal position of a response
correct	TEXT	Value denoting an answer’s correctness or incorrectness	TRUE, FALSE

‍

Example Data and Explanation

response_id	item_id	response	response_position	correct
1312	650	Non-traditional arithmetic practice (like __ = 3 + 4)	A	TRUE
1313	650	Traditional arithmetic practice (like 3 + 4 = __)	B	FALSE
1314	650	Trick question. No experiment has shown that experience can change children’s understanding of the equal sign.	C	FALSE
1315	650	Common mathematics textbooks	D	FALSE

‍

Item 650, a multiple choice question, asks,

“An experimental study indicated that exposing children to __________ improved their understanding of the equal sign.”

The records above show four multiple choice response options. Each includes a unique response ID, a text description of the response, and the response’s placement relative to other responses. Three of the values are FALSE or incorrect, while only one is TRUE (correct). The TRUE value selection awards the participant points for that question.

Additional Clarifications

This table is used exclusively for multiple choice items at the current time.
For the response ID, the number can never be repeated (i.e., can’t have response_id “1” for several questions).

item_responses.csv

The item_responses.csv file records participants’ actual responses and the results.

Columns

Name	Data Type	Description	Options
item_response_id	INTEGER	Unique identifier for an instance of a response
submission_id	INTEGER	Submission in which the response appears
assignment_id	INTEGER	Numeric identifier for an assignment
condition_id	INTEGER	Numeric identifier for a specific experimental condition
treatment_id	INTEGER	Combination of a condition and an assignment
participant_id	INTEGER	Unique numeric identifier for a participant
item_id	INTEGER	Unique identifier for a specific item
response_type	TEXT	The format of the question to which a response was made
response	TEXT	A description of the response option
response_position	TEXT	The ordinal placement of the chosen response option
correctness	TEXT	Signifying whether or not the chosen response was correct	TRUE, FALSE
responded_at	DATETIME	Date and time when the response was chosen
points_possible	DECIMAL	Total number of points a participant can receive when responding to a question
calculated_score	DECIMAL	Points awarded through computer-calculated scoring
override_score	DECIMAL	Manually assigned score that takes precedence over the calculated score, if a numeric override_score value is provided

‍

Example Data and Explanation

item_response_id	submission_id	assignment_id	condition_id	treatment_id	participant_id	item_id	response_type	response	response_id	response_position	correctness	responded_at	points_possible	calculated_score	override_score
3323	579	292	950	431	1185	645	MC	I confirm that I have watched the video.	1298	A	TRUE	3/26/2022 1:57	25	25	N/A

‍

Here, participant 1185 responded to item 3323, in assignment 579, with condition 950, and treatment 431. Item 645 was a multiple choice question. The answer the participant chose has a description of “I confirm that I have watched the video” as response 1298 in place A at 1:57AM on March 26th, 2022. The answer was true (correct), and the participant was awarded the full 25 points. There was no override score assigned, thus leaving that in the cell with a value of N/A.

Additional Clarifications

The term “correctness” is used in this table as there are sometimes multiple correct answers to an item. For example, it is possible within a multiple choice question with answers A, B, C, and D for other A and B to be correct. Thus, we use “correctness” to indicate the extent to which an answer is correct without excluding other correct answers.

outcomes.csv

An outcome (also known as a dependent variable) is a variable that may be affected by an experimental manipulation. For example, if an experimental manipulation to a homework assignment in Terracotta is expected to affect students' later scores on a midterm exam, then the midterm exam scores would be an outcome. In Terracotta, outcome scores are defined on the experiment's Status tab. Outcomes can be entered manually into Terracotta (by typing scores for each individual student), or can be imported from a Canvas gradebook item. In within-subjects designs, it may be important to define an outcome score for each exposure set. Outcome scores are typically measured after an experiment, but an outcome can also contain any numeric score that is relevant to an experiment (pretest scores, moderator variables, etc.). Outcome scores are included in Terracotta data exports.

Columns

Name	Data Type	Description
outcome_id	INTEGER	Unique identifier for a specific outcome within the experiment
participant_id	INTEGER	Unique numeric identifier for a participant
exposure_id	INTEGER	Identifier for the specific exposure given to a participant
source	TEXT	Place from which the outcome data was drawn
outcome_name	TEXT	Descriptive phrase for an instance of an outcome
points_possible	DECIMAL	Total number of points that can be awarded in the experiment for a participant
outcome_score	DECIMAL	Actual score a participant received within the outcome

‍

Example Data and Explanation

outcome_id	participant_id	exposure_id	source	outcome_name	points_possible	outcome_score
141	1108	747	none	Exam 3 Equals Sign	4	3

‍

In experiment outcome 141, participant 1108 experienced exposure 747 with an outcome of their response leading to “Exam 3 Equals Sign.” The total points that could be awarded were 4, and this participant’s score was 3.

Terracotta Data Dictionary

Schema

File name/Table Name

Columns

Example Data and Explanation

Additional Clarifications

Data Dictionary

experiment.csv

Columns

Example Data and Explanation

Additional Clarifications

participants.csv

Columns

Example Data and Explanation

participant_treatment.csv

Columns

Example Data and Explanation

Additional Clarifications

submissions.csv

Columns

Example Data and Explanation

Additional Clarifications

items.csv

Columns

Example Data and Explanation

Additional Clarifications

response_options.csv

Columns

Example Data and Explanation

Additional Clarifications

item_responses.csv

Columns

Example Data and Explanation

Additional Clarifications

outcomes.csv

Columns

Example Data and Explanation

Related Articles