At the end of an experiment, the researcher clicks "Export Data" to download deidentified research data to their browser in a ZIP file. This ZIP file contains a variety of CSV files describing the experiment design, the contents of assignments, and participants' behavior and performance. When an experiment uses informed consent, the export only includes data from consenting students; and when an experiment uses a manual selection process, the export only includes data from selected students. The current Data Dictionary provides comprehensive documentation of the format and structure of each CSV file in the export, and the data contained in each column.
Additionally, a Terracotta data export contains a file called events.json. This JSON-formatted file contains event logs formatted according to IMS Global (1EdTech) Caliper 1,2 standard. At this time, event logs are provided for the Assessment Profile and the Media Profile.
Many of the elements in Terracotta's data exports reflect typical concepts in education and assessment. Thus, it is possible to map elements in the CSV files to the Common Education Data Standards (CEDS) vocabulary of education data. We provide this alignment tool for anyone interested in such mappings. While experimental research concepts are not directly represented in the CEDS data model, we consider experimental conditions to be represented as different "assessment form algorithms," experimental treatments to be represented as different "assessment form versions," and exposure sets to be represented as different "assessment administrations."
Please note: data types referenced here refer to the MySQL version of SQL
Table of Contents
- experiment.csv
- participants.csv
- participant_treatment.csv
- submissions.csv
- items.csv
- response_options.csv
- item_responses.csv
- outcomes.csv
Schema
This article explains the data included in each .csv file in the data export. Each section begins with the file or table name, explains the data included in each column, and then gives a basic explanation of sample data. When necessary, sections also include additional clarifications the reader may find helpful.
File name/Table Name
[Purpose of the file as a whole]
Columns
Example Data and Explanation
Additional Clarifications
Data Dictionary
experiment.csv
The experiment.csv file gives a general overview of the experiment’s data.
Columns
Example Data and Explanation
The table above includes sample data from an experiment. The experiment and course ID numbers are 521 and 38, respectively, to help connect this table to others. The experiment’s title is Pre Question Experiment, with the description, Do pre-questions work?. In this experiment, participants in a course were given consent forms to fill out; 89 agreed to participate. Researchers were interested in the effects of pre-questions on participants’ ability to learn information. They wanted to see if engaging with a quiz about lesson content before an assignment would increase the average grade on a subsequent assignment. As this is a within subject design, half of the 89 participants will complete the first assignment with no pre-questions as a control. The other half will complete the first assignment with pre-questions as a treatment. These two groups will then alternate having pre-questions and not having them for the second assignment. By default, since all participants will be experiencing the control and treatment, the treatment is distributed evenly. Data for this experiment was exported at 4/29/2022 7:27:58 PM.
Additional Clarifications
- This file will produce only one row as it is treated in Terracotta as a single experiment. Additional experiments will be packaged as their own zip file which contains the various exported data files from an experiment.
- In the instance where experimental data is carried over to another course, a new experiment ID will be generated for that experiment.
- The value for enrollment_cnt is the cumulative number of LMS users with the student role who have ever been in the course site, which may be different from the course enrollment at any given moment. For example, if a course has 30 students enrolled, but, during the first week of the class, 5 students drop the class and 5 new students add the class in their place. There may still be 30 students enrolled, but the enrollment_cnt will be 35 because we continue to include the 5 students who dropped while also adding the 5 new enrollments. We calculate the enrollment_cnt this way because it prevents a situation where the number of participants is greater than the enrollment, and furthermore, this method of calculating enrollment is more consistent than alternative methods that count the number of active enrollments at an arbitrary moment in time.
participants.csv
The participants.csv file contains deidentified consent-related data for participants.
Columns
Example Data and Explanation
In this row, a participant is identified by a number, 1185, instead of by their name. On March 22nd 2022, at 7:01:37 PM, the participant affirmed consent to participate in a Terracotta experiment.
participant_treatment.csv
The participant_treatment.csv file describes the order and distribution of assignments for testing along with experimental conditions.
Columns
Example Data and Explanation
Let’s imagine there are two participants here in our list of participants: participant 1185 and participant 1132. There are two assignments, Week 11 Assignment with ID 292, and Week 12 assignment with ID 293. There are 2 possible conditions: No PreQuestion (ID 950), and PreQuestion (ID 951). During the first exposure instance, ID 747, participant 1185 is given a NoPrequestion and Week 11 Assignment combination, resulting in treatment ID 431. During that same exposure instance, participant 1132 receives the PreQuestion for the Week 11 Assignment, resulting in a treatment ID of 432.
During the next exposure instance, 748, participant 1185 is exposed to a PreQuestion for the Week 12 Assignment, resulting in a combination identified as treatment ID 434. Participant 1132, however, experiences No PreQuestion for the Week 12 Assignment, resulting in a treatment instance of ID 433.
Additional Clarifications
- The number of rows should be equal to the number of participants multiplied by the number of conditions if the experiment has a WITHIN subject design. For a BETWEEN subjects design, the number of rows should be equal to the number of participants.
- An exposure is treated as a length of time that a participant experiences a particular condition. Between-subject designs will only have one exposure. Within-subjects designs include multiple exposures, since split groups of participants experience all treatments included in an experiment.
submissions.csv
The submissions.csv file creates a record of participant submission and assignment data.
Columns
Example Data and Explanation
In this example, participant 1185 submitted an assignment (292) with a unique submission instance of 579. Their assignment has a unique treatment combination of 431 and was submitted on March 26th of 2022, at 1:57:11 AM. The calculated score for their submission in the computer was 25 with an override score of 25, meaning there were no changes. The final score was 25; since the override score matched the calculated score, the final score was the original calculated score.
Additional Clarifications
- Participants can have multiple submissions. For example, imagine a participant submits an assignment 3 times. This can happen if an instructor allows participants to redo an assignment more than once. The instructor could then choose, based on course policy, whether to use the best score out of the 3 or an average and use that as an override score.
items.csv
The items.csv file is a collection of elements within an assignment.
Columns
Example Data and Explanation
In assignment 292, the participant is exposed to condition 950, generating the treatment ID 432. The specific item with which the participant interacted, 645, has item text used to display an embedded Youtube video. The question must be answered through multiple choice.
Additional Clarifications
A list of the item format option meanings:
- MC: multiple choice
- PAGE_BREAK: not a question, but a separator for the organization of the HTML document
- SHORT_ANSWER: text-entry format
response_options.csv
The response_options.csv file includes a collection of responses available for an item.
Columns
Example Data and Explanation
Item 650, a multiple choice question, asks,
“An experimental study indicated that exposing children to __________ improved their understanding of the equal sign.”
The records above show four multiple choice response options. Each includes a unique response ID, a text description of the response, and the response’s placement relative to other responses. Three of the values are FALSE or incorrect, while only one is TRUE (correct). The TRUE value selection awards the participant points for that question.
Additional Clarifications
- This table is used exclusively for multiple choice items at the current time.
- For the response ID, the number can never be repeated (i.e., can’t have response_id “1” for several questions).
item_responses.csv
The item_responses.csv file records participants’ actual responses and the results.
Columns
Example Data and Explanation
Here, participant 1185 responded to item 3323, in assignment 579, with condition 950, and treatment 431. Item 645 was a multiple choice question. The answer the participant chose has a description of “I confirm that I have watched the video” as response 1298 in place A at 1:57AM on March 26th, 2022. The answer was true (correct), and the participant was awarded the full 25 points. There was no override score assigned, thus leaving that in the cell with a value of N/A.
Additional Clarifications
The term “correctness” is used in this table as there are sometimes multiple correct answers to an item. For example, it is possible within a multiple choice question with answers A, B, C, and D for other A and B to be correct. Thus, we use “correctness” to indicate the extent to which an answer is correct without excluding other correct answers.
outcomes.csv
An outcome (also known as a dependent variable) is a variable that may be affected by an experimental manipulation. For example, if an experimental manipulation to a homework assignment in Terracotta is expected to affect students' later scores on a midterm exam, then the midterm exam scores would be an outcome. In Terracotta, outcome scores are defined on the experiment's Status tab. Outcomes can be entered manually into Terracotta (by typing scores for each individual student), or can be imported from a Canvas gradebook item. In within-subjects designs, it may be important to define an outcome score for each exposure set. Outcome scores are typically measured after an experiment, but an outcome can also contain any numeric score that is relevant to an experiment (pretest scores, moderator variables, etc.). Outcome scores are included in Terracotta data exports.
Columns
Example Data and Explanation
In experiment outcome 141, participant 1108 experienced exposure 747 with an outcome of their response leading to “Exam 3 Equals Sign.” The total points that could be awarded were 4, and this participant’s score was 3.