Terracotta is designed to lower the technical and methodological barriers to conducting education research. These barriers are well-known in the education research community. I've started collecting quotes about the difficulty of running field experiments in education. Once Terracotta enlightens education research methods, perhaps we'll all look back on these quotes as evidence of a dark age, when progress was impeded by the absence of technological innovation. If you have a quote you'd like to share for this list, send it to me and I'll include it here with appreciation!
Randomized experiments of interventions applying to entire classrooms can be extremely difficult and expensive to do.
Slavin, R. E. (2002). Evidence-based education policies: Transforming educational practice and research. Educational Researcher, 31(7), 15-21. https://doi.org/10.3102/0013189X031007015
Although random trials remain the gold standard for assessing intervention effectiveness, it is often impractical and sometimes unethical to conduct such trials in the everyday contexts of postsecondary institution operations.
Borden VMH, Hosch BJ. (2018). Institutional Research and Themes, North America. Encyclopedia of International Higher Education Systems and Institutions. (pp. 1–10.) https://doi.org/10.1007/978-94-017-9553-1_586-2
In addition to feasibility considerations ... In education studies, variables can rarely be controlled tightly and blinding of subjects and study personnel may be unethical or impossible.
Sullivan, G. M. (2011). Getting off the “gold standard”: Randomized controlled trials and education research. Journal of Graduate Medical Education, 3(3), 285-289. https://doi.org/10.4300/JGME-D-11-00147.1
The requisite resources are generally far in excess of what most educational researchers could hope to amass in the absence of considerable extramural funding. Consequently, researchers elect to conduct more manageable, less ambitious, and typically, less carefully-controlled classroom-based investigations
Levin JR. (2005). Randomized classroom trials on trial. In Empirical Methods for Evaluating Educational Interventions (Phye GD, Robinson DH, and Levin JR, Eds.). pp. 3–27. Burlington: Academic Press.
By the time an experiment is designed, implemented, and evaluated, it is often true that the policy debate has moved ahead and the results are no longer of direct policy interest
Schanzenbach, D. W. (2012). Limitations of experiments in education research. Education Finance and Policy, 7(2), 219-232. https://doi.org/10.1162/EDFP_a_00063
Another possibility [that would explain the rarity of experimental research] is that the decline is related to researchers’ perceptions of the rigorous methodological standards, challenging practical constraints, and needed resources associated with conducting scientifically credible educational intervention research (e.g., Levin, 2005; Mosteller & Boruch, 2002). Although Pressley and Harris (1994) and Levin (1994) argued for “better” intervention studies, the perceived obstacles and costs may dissuade investigators from conducting such research.
Hsieh, P., Acee, T., Chung, W.-H., Hsieh, Y.-P., Kim, H., Thomas, G. D., You, J.-i., Levin, J. R., & Robinson, D. H. (2005). Is Educational Intervention Research on the Decline? Journal of Educational Psychology, 97(4), 523–529. https://doi.org/10.1037/0022-0618.104.22.1683
As development on Terracotta is rapidly speeding up, we've gotten ticketed — in a good way! Thanks to Atlassian's support for open source projects, Terracotta now has a free Jira instance for project management and ticketing, and Confluence for document management.
Open source projects (like Terracotta) can apply for free access to Atlassian's cloud software here. The requirements are that the project is licensed under one of the Open Source Initiative's licenses, the source code is available for download, and the project has a publicly accessible website.
After months of iterative design work, the Terracotta development effort is about to enter an exciting stage: Building the alpha version! Indiana University is partnering with Unicon Inc. for the build, tapping into Unicon's deep expertise in learning technology, LMS integrations, and data infrastructure.
Over the next three months, IU will be working closely with Unicon's development team on a series of sprints that will construct the Terracotta alpha release. All aspects of the platform will be open source from the start, with public updates along the way. Upon release, Terracotta will be a cloud-hosted pilot tool that can be integrated into any Canvas course site.
"Unicon is excited about the opportunity to partner with Indiana University to build the prototype for Terracotta," said Patty Wolfe, Sr. Director Applications, Integrations, & Data, Unicon. "We're proud to be joining Indiana University to bring forth the best technology to help instructors conduct experimental research will provide researchers and analysts the critical data they need to further educational research while eliminating many of the barriers to entry."
Unicon Inc. is a 27-year veteran agency in educational technology consulting and digital services. Based in Arizona, Unicon is a leading learner-centric technology consulting firm in the design and implementation of standards-based learning technology and has a long-running track record of developing robust open-source tools.
One of our core aims in designing and developing Terracotta is to streamline the process of conducting experimental research in education settings responsibly. For example, we've designed features within Terracotta to enable informed consent, privacy, and confidentiality. But outside Terracotta, most academic researchers also need to receive approval from an Institutional Review Board (IRB) in order to conduct a study.
As a proof-of-concept, we composed an IRB protocol for a pilot study to be carried out with Terracotta, and have recently received approval for this protocol from the Indiana University IRB. As further progress toward streamlining experimental research, we offer this protocol as a template that might facilitate future research using Terracotta:
For this pilot, we're proposing a research study comparing different versions of homework assignments in an undergraduate statistics course. We opted to design the study so that we would obtain informed consent from students before they are assigned to experimental treatment, but this requirement may not be necessary for all research in Terracotta. According to US code (§46.104.d.1), when research only "involves normal educational practices" examining "the effectiveness or the comparison among instructional techniques," then the research may be exempt from needing to obtain participants' consent to participate. Nevertheless, we feel that seeking permission from students is still a good idea.
The protocol documents linked above may not provide the appropriate materials required for all IRBs, and the documents themselves have had certain details of the local implementation removed for clarity. Nevertheless, these are provided openly and publicly, in hopes that they might help streamline future research using Terracotta.
One of Terracotta's core features is that it will support within-subject experiment designs. A within-subject design is when participants are assigned to multiple conditions at different times during the experiment. Instead of a between-subject design, where half the participants might get treatment "A" and the other half get "B," a within-subject design might have half the participants get treatments "A-then-B," and the other half get "B-then-A". There are a couple big advantages to within-subject designs (also called crossover designs):
When each participant receives all treatments, there is less concern about the research causing inequities. For example, imagine that version B turns out to be better than version A for student learning. In a between-subject design, this might cause students who received version A to have worse outcomes, caused by experimental treatment. But in a within-subject design, all students received the same treatments, just staggered in time.
Because each participant in a within-subject design receives both treatments, each participant is almost like their own "mini-experiment." In effect, the research study has more statistical power to infer differences between treatments A and B.
But while within-subject designs have some big advantages, they're also much more complicated, particularly when embedded in a classroom. For example, time and ordering become important -- the researcher needs to decide when the students receiving treatment A need to switch to treatment B, and vice versa. Also, the experimenter needs to have multiple outcome measures, at least one for every condition. And finally, the researcher needs to have multiple experimental treatments, at least as many treatments as experimental conditions, in order to have a balanced design.
We've been thinking a lot about how to make Terracotta so that the complexities of within-subject experiments are intuitive and easy-to-manage. We've decided to organize things around the idea of an "exposure set." In the illustration below, imagine that you're making an experiment with three conditions, Text, Video, and Video & Text. Because there are three conditions, there will be three exposure sets. Each exposure set has a different arrangement of students-to-conditions:
An exposure set is similar to the concept of a period in a crossover design, but whereas a period strictly contains only one treatment, we imagine that an exposure set could contain multiple treatments. For example, an AABB/BBAA design would have 4 periods (A,A,B,B or B,B,A,A), but in Terracotta, it would only have 2 exposure sets (AA,BB or BB,AA). Within any exposure set, the researcher can add multiple class assignments, and Terracotta will handle the hard work of ensuring that the right students see the right versions of each assignment (according to the student's treatment condition in that exposure set).
Finally, when mapping outcomes for a within-subject design, the researcher only needs to specify one outcome score for each exposure set, for each student (although it will be possible to add more).
Saying that Terracotta is an "experiment-builder" should give you a general idea about what it does, and saying that it is an experiment builder in a learning management system (LMS) should indicate where it sits, but you might still wonder how it works. How do you give teachers and researchers the flexibility to build custom experiments within an environment that is not made for experimental manipulation?
It's a tough question. During our design process, we've noticed a recurring tension between the desire to completely model the entire experimental procedure within Terracotta (to control everything), and the opposing desire to make complete use of native structures and features in the LMS (to control as little as possible). Neither of these extreme approaches will be successful. If we control everything, Terracotta would be a behemoth with a constellation of parameters that duplicate LMS features (e.g., open dates, due dates, and grading policies; Terracotta would be like an LMS within an LMS), but if we rely too heavily on the LMS, then we'll lose flexibility and unnecessarily restrain research creativity.
The Assignment "Bucket"
Our solution is to think about an assignment within the LMS like a bucket. The assignment bucket is native to the LMS, but it's only an empty container. The bucket has useful features within the course, like open dates, due dates, and grading policies, but it's just an empty shell on its own.
Now imagine that you could fill a bucket with a quicksilver-like substance, where the contents of the bucket appear differently, depending on who's looking in the bucket. That's how Terracotta works.
Terracotta will populate LMS assignments ~will fill the buckets~ with learning activities and materials that change depending on who's looking at them, automatically managing experimental variation within the buckets. When a student is assigned to Condition A, an assignment will reveal one version of a learning activity, but when another student (assigned to Condition B) clicks on the same assignment, they'll see a different learning activity within the bucket. In other words, different experimental treatments will exist within a single assignment bucket, and Terracotta will keep track of who sees what.
From the student's perspective, they'll be completing assignments as normal within the LMS, with no outward appearance that the assignment is different from any other assignment.
From the teacher's perspective, there'll be a little extra work involved. Teachers will create their assignments within Terracotta, specifying which treatments will be contained within the assignment, and how these treatments correspond to students' randomly-assigned experimental conditions. Treatments (what a student sees and does for the assignment) can be uploaded or built directly within Terracotta. Once an assignment is created within Terracotta, the teacher will then go to the LMS to fill a bucket, to populate an LMS assignment container with the assignment that they'd just created in Terracotta.
This design allows Terracotta to tightly embed an experiment within a course, leveraging native LMS features for how assignments should normally appear. It also provides researchers the same flexibility to develop experimental treatments that teachers would normally have when populating assignments.
Science is dangerous, we have to keep it most carefully chained and muzzled.
Mustapha Mond, The Controller Aldous Huxley (1932)
In Huxley’s Brave New World, in Orwell’s 1984, in Bradbury’s Fahrenheit 451, and even in Plato’s Allegory of the Cave, stable dystopias have two common features: a lack of freedom to explore, and an ignorance of one’s own imprisonment. People are kept within a restricted range of movement spanning minimal degrees of freedom, but at the same time they are complacent. By virtue of the minor flexibilities afforded within their boundaries, people are misled into believing they are free, even when they are restrained.
Fictional renderings of dystopian societies may seem wildly out there, particularly in the context of learning technologies, but they nevertheless provide a critical lens with which to view our present-day nonfictional world. For example, this lens has been used to consider whether we are currently living in a technological dystopia (Kolitz, 2020; Morozov, 2012; Winner, 1997). As with science fiction, a benchmark commonly applied in these considerations is whether technology enables freedom to openly explore and improve, or instead whether it (intentionally or unintentionally) stifles this freedom. In this regard, an area of emerging relevance is learning technology.
Amidst the current proliferation of online learning technologies, learning scientists are grappling with the challenges of translating science into practice at scale. The central problem goes something like this: any single learning tool or platform brims with assumptions, designed in a particular way, to enable a particular kind of interaction, for a particular kind of student, to benefit a particular kind of outcome, to be measured in a particular kind of way. Studies conducted within these platforms are also influenced and constrained by these particularities, and in turn, the idiosyncrasies of a learning platform limit research flexibility and generalizability. Even if it were possible for any researcher to experimentally manipulate any element within the ASSISTments platform, for example, the resulting inferences would still be specific to the platform, or at best, to intelligent tutoring systems. Estimates of the effectiveness of intelligent tutoring systems are further limited by the local relevance of the outcome measures under analysis and their implementations within the local learning environment (Kulik & Fletcher, 2016) — variables not manipulable within the tools themselves. Moreover, if a brilliant inventor stumbled upon a novel instructional system that promised improvements over intelligent tutoring systems, the research community’s reliance on existing platforms as principle tools for conducting experiments would no longer enable progress, this reliance would be a barrier to progress.
No platform developer, or learning technology startup sees themselves as Controller Mond, intentionally stifling science and innovation. On the contrary, we have gravitated toward the education sector out of an authentic interest in helping people, improving equity, and facilitating social progress. It is, without a doubt, a good thing that these platforms are developing new degrees of freedom to allow researchers to engineer and explore innovations within them. But if the only space to innovate is within walled gardens, we will be ever-limited in our ability to understand our potential, and to effect change.
That is why we need Terracotta. We need to make a research platform compatible with education settings, rather than focusing solely on making educational platforms compatible with research. While platforms and support tools unquestionably should be empowered to conduct rigorous experimental research on what works, they can’t be the only game in town. In turn, the primary goal of Terracotta is to democratize experimental learning research, and thus to advance the generalizability and translatability of research findings.
Huxley, A. (1932). Brave New World. Chatto & Windus.
Kolitz, D. (2020, August 24). Are We Already Living in a Tech Dystopia? Gizmodo. https://gizmodo.com/are-we-already-living-in-a-tech-dystopia-1844824718
Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review. Review of Educational Research, 86(1), 42–78.
Morozov, E. (2012). The Net Delusion: The Dark Side of Internet Freedom. PublicAffairs.
Winner, L. (1997). Technology Today: Utopia or Dystopia? Social Research, 64(3), 989–1017.
Terracotta is an experiment-builder (currently in development) integrated with a learning management system (LMS), allowing a teacher or researcher to randomly assign different versions of online learning activities to students, collect informed consent, and export deidentified study data. Terracotta is a portmanteau of Tool for Educational Research with RAndomized COnTrolled TriAls, and is designed to lower the technical and methodological barriers to conducting more rigorous and responsible education research.
Terracotta is currently under development at Indiana University, with support from Schmidt Futures, a philanthropic initiative co-founded by Eric and Wendy Schmidt, and in collaboration with the Learning Agency Lab.
Why do we need an experiment builder?
To reform and improve education in earnest, we need research that conclusively and precisely identifies the instructional strategies that are most effective at improving outcomes for different learners and in different settings — what works for whom in what contexts. To test whether an instructional practice exerts a causal influence on an outcome measure, the most straightforward and compelling research method is to conduct an experiment. But despite their benefits, experiments are also exceptionally difficult and expensive to carry out in education settings (Levin, 2005; Slavin, 2002; Sullivan, 2011).
Perhaps due to this difficulty, rigor in educational research has been trending downward. Roughly 79% of published STEM education interventions fail to provide strong evidence in support of their claims (Henderson et al., 2011), and the number of experimental educational research studies has been in steady decline, from 47% of published education studies in 1983, to 40% in 1994, to 26% in 2004, even while the prevalence of causal claims is increasing (Hsieh et al., 2005; Robinson et al., 2007). When education research studies do succeed in making strong causal claims, the replicability of these findings are disappointingly low (Makel & Plucker, 2014; Schneider, 2018).
To reverse these trends, we need to eliminate the barriers that prevent teachers and researchers from conducting experimental research on instructional practices.
Can an experiment-builder improve the ethics of classroom research?
Every teacher, in every classroom, is actively experimenting on their students. Every time a teacher develops a new lesson, makes a planning decision, or tries a new way to motivate a disengaged student, the teacher is, in a sense, carrying out an experiment. These “experiments” are viewed as positive features of a teacher’s professional development, and we generally assume that these new experimental “treatments” represent innovations over default instruction. What’s obviously missing in these examples is random assignment of students to treatment conditions, and therefore these are also missing a compelling way of knowing whether the new strategy worked.
By our view, it is even more ethical to conduct experiments when these experiments are done carefully and rigorously, using random assignment to different treatment conditions (Motz et al., 2018). This is because randomized experiments enable a teacher or researcher to infer whether the new strategy causes improvements without bias. In the absence of random assignment to treatment conditions, a teacher is stuck making comparisons based on subjective reflection, or based on confounded comparisons with other cohorts; this can cause ineffective strategies to appear to have worked, or effective strategies to be missed.
Even so, experiments conducted recklessly can clearly be unethical, so one of Terracotta’s central goals is to streamline and normalize the features of responsible, ethical experimentation. For example, students should be assigned to experimental variants of assignments (rather than deprived no-treatment controls), whenever possible research should favor within-subject crossover designs where all students receive all treatments (eliminating any bias due to condition assignment), students should be empowered to provide informed consent, and student data should be deidentified prior to export and analysis. Each of these standards is easily achievable using standard web technology, but no tool currently exists that meets these standards. That’s where Terracotta comes in.
What will Terracotta do?
Based on years of experience working through the challenges of implementing ethical and rigorous experiments in authentic online learning environments, Terracotta automates the methodological features that are fundamental to responsible experimental research (but that are particularly difficult to implement without dedicated software), seamlessly integrating experimental treatments within the existing ecosystem of online learning activities. For example:
Within-Subject Research Designs. A particularly effective way to ensure balanced, equitable access to experimental treatments is to have all students receive all treatment conditions, except different groups receive these treatments in different orders. This kind of within-subject crossover design will be implementable with a single click.
Random Assignment that Automatically Tracks Enrollment Changes. If students add the class late, new enrollments will be repeatedly monitored and manually assigned to a group to achieve balance and ensure access to experimental manipulations.
Informed Consent that Conceals Responses from the Instructor. Informed consent is a cornerstone in the ethical conduct of research. This choice should be protected from the teacher until grades are finalized, because many review boards are sensitive that a teacher is in a position to coerce students to participate. Terracotta will collect consent responses in a way that is temporarily hidden from the instructor.
Mapping Outcomes to Assignments. An experiment examines how different treatments affect relevant outcomes, such as learning outcomes (e.g., exam scores) or behavioral outcomes (e.g., attendance). Terracotta will include a feature where instructors can import, identify in the gradebook, or manually enter outcome data into Terracotta following each treatment event (for within-subject crossover designs) or at the end of an experiment (for between-subject designs).
Export of De-identified Data from Consenting Participants. At the end of the term, Terracotta will allow an export of all study data, with student identifiers replaced with random codes, and with non-consenting participants removed. This export set will include: (1) condition assignments; (2) scores on manipulated learning activities; (2) granular clickstream data for interactions with Terracotta assignments; and (3) outcomes data as entered by the instructor. By joining these data, de-identifying it, and scrubbing non-consenting participants, Terracotta can prepare a data export that is shareable with research collaborators (includes no personally identifiable information) and that meets ethical requirements (only includes data from consenting students).
What’s the overarching goal of Terracotta?
The goal is to enable more rigorous, more responsible experimental education research. But we don’t simply need more experiments — if this were the case, we could conveniently conduct behind-the-scenes experiments within existing learning technologies. Instead, we need more experiments that are carried out in classes where it might have otherwise been inconvenient to conduct an experiment using existing technologies, to better-understand the context-dependency of learning theory (Motz & Carvalho, 2019). We need grand-scale experiments that are systematically distributed across scores of classrooms to test how different experimental implementations moderate the effects of instructional practices (Fyfe et al., 2019). We need teachers and researchers to test the effects of recommended instructional practices with replication studies, so that we can improve the credibility of education research and identify the boundary conditions of current theory (Makel et al., 2019). We need teachers to be empowered to test their latent knowledge of what works in their classrooms, through experimental research that is more inclusive of practitioners (Schneider & Garg, 2020). To achieve these ends, we don’t simply need to make existing educational tools compatible with experimental research, we need to make an experimental research tool that is compatible with education settings.
Fyfe, E., de Leeuw, J., Carvalho, P., Goldstone, R., Sherman, J., and Motz, B. (2019). ManyClasses 1: Assessing the generalizable effect of immediate versus delayed feedback across many college classes. PsyArXiv,https://psyarxiv.com/4mvyh/
Henderson, C., Beach, A., Finkelstein, N. (2011). Facilitating change in undergraduate STEM instructional practices: An analytic review of the literature. Journal of Research in Science Teaching, 48, 952–984. doi:10.1002/tea.20439
Hsieh, P., Acee, T., Chung, W.-H., Hsieh, Y.-P., Kim, H., Thomas, G.D., You, J., Levin, J.R., and Robinson, D.H. (2005). Is educational intervention research on the decline? Journal of Educational Psychology, 97, 523-529. doi:10.1037/0022-0622.214.171.1243
Levin, J.R. (2005). Randomized classroom trials on trial. In G.D. Phye, D.H. Robinson, and J.R. Levin (Eds.), Empirical Methods for Evaluating Educational Interventions, pp. 3–27. Burlington: Academic Press. doi:10.1016/B978-012554257-9/50002-4
Makel, M.C., and Plucker, J.A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43(6), 304-316. doi:10.3102/0013189X14545513
Motz, B.A., Carvalho, P.F., de Leeuw, J.R., & Goldstone, R.L. (2018). Embedding experiments: Staking causal inference in authentic educational contexts. Journal of Learning Analytics, 5(2), 47-59. doi:10.18608/jla.2018.52.4
Motz, B.A. and Carvalho, P.F. (2019). Not whether, but where: Scaling-up how we think about effects and relationships in natural educational contexts. In Companion Proceedings of the 9th International Conference on Learning Analytics & Knowledge (LAK’19). doi:10.13140/RG.2.2.30825.34407
Robinson, D.H., Levin, J.R., Thomas, G.D., Pituch, K.A., and Vaughn, S. (2007) The incidence of “causal” statements in teaching-and-learning research journals. Am Educ Res J. 2007;44: 400–413. doi:10.3102/0002831207302174