Skip to main content

University of Michigan School of Information

Menu

411: Transforming education through AI with Dora Demszky

Information Changes Everything: The Podcast. Episode 411: Transforming education through AI. Dora Demszky. Assistant professor, Stanford University. umsi.info/podcast. News and research from the world of information science.

Listen to UMSI on:

Information Changes Everything

News and research from the world of information science
Presented by the University of Michigan School of Information (UMSI)

Episode

411

Released

July 30, 2024

Recorded

2023

Guests

Dora Demszky, assistant professor in education data science at the Stanford Graduate School of Education and in Computer Science (by courtesy)

Summary 

In this episode of Information Changes Everything, Dora Demszky, an assistant professor at Stanford University, discusses the unprecedented potential of large language models (LLMs) in scaling various aspects of education. Demszky synthesizes findings from three papers that evaluate the ability of LLMs to provide feedback to educators on their discourse. 

Resources and links mentioned

Reach out to us at [email protected]

Timestamps

Intro (0:00)

Information news from UMSI (1:16) 

Hear excerpts from Dora Demszky’s 2023 seminar, “Large Language Models for Teacher Feedback,” at UMSI (3:04) 

Next time: Essential lessons for entrepreneurs with Jason Blessing (24:56)

Outro (25:35)

Subscribe

Subscribe to “Information Changes Everything” on your favorite podcast platform for more intriguing discussions and expert insights.

About us

The “Information Changes Everything” podcast is a service of the University of Michigan School of Information, leaders and best in research and education for applied data science, information analysis, user experience, data analytics, digital curation, libraries, health informatics and the full field of information science. Visit us at si.umich.edu.

Questions or comments

If you have questions, comments, or topics you'd like us to cover, please reach out to us at [email protected].

Dora Demszky (00:00):

We found that 82% of the time, ChatGPT just repeated what the teacher already does. So it wasn't really insightful or novel in that way. So that just kind of showed us that ChatGPT isn't very good at channeling insight, expert insightful feedback for teachers.

Kate Atkins, host (00:18):

That was Dora Demszky, assistant professor in education, data science and computer science at Stanford University during a 2023 talk sponsored by UMSI. And this is information changes everything where we put the spotlight on news and research from the world of information science. You're going to hear from experts, students, researchers, and other people making a real difference. As always, we're presented by the University of Michigan School of Information, UMSI for short. Learn more about us at si.umich.edu. I'm your host, Kate Atkins. Today we’ll hear more from Demszky during a data science and computational social science seminar series sponsored by UMSI. She explores how large language models have an unprecedented potential in scaling many aspects of education, including the ability to facilitate high quality instruction. 

Before we jump in a few other people and projects that you should know about. First, AI is changing how we care for, monitor and spend time with our pets. New technologies include AI powered pet cameras, smart collars with disease detection and translators that can allegedly turn a cat's meow into human language. The Washington Post recently spoke with UMSI professor Lionel Robert, an expert on robots and AI, about his thoughts on the relationship between humans, AI and pet ownership.

(01:52):

Next RoosRoast is an Ann Arbor coffee shop with deep roots in the community, but during the pandemic online sales increased dramatically and their old website just couldn't handle it. UMSI students to the rescue. They redesigned the RoosRoast website to better capture its unique quirky vibe and help boost online sales. The students incorporated bold art, playful elements and enhanced the customer experience while still preserving what makes the coffee shop special. 

Finally, Women Who Code started as a community of women in tech in San Francisco back in 2011. City by city, the organization grew to membership in over 145 countries. Along the way, it gave out millions in scholarships, held conferences, and supported and coached women into leadership roles. So their announcement in April that it was shutting down due to a lack of funding came as a shock. Their farewell messages expressed sadness and a hope that their work will carry on. 

For more on all of these stories, check out si.umich.edu or click the link in our show notes. Now back to Dora Demszky.

Dora Demszky (03:06):

So as many of you I guess know, ChatGPT has blown up many things in the world, but I think that the main area where we hear a lot of conversations or a lot of discussions about how to turn things upside down is education. For many reasons. One, it has enabled instruction that was previously not really possible at scale using automated tutoring. It just kind of took it to the next level. No matter whether you agree or disagree with the applications of AI into education, it is definitely happening. And OpenAI has, for example, recently created an entire page about how you might leverage their technologies for supporting teachers. And there's been a lot of research showing and surveys that have shown that teachers are already using it for many tasks. The prime task being lesson planning, so generating exercises or brainstorming activities to do in the classes.

(04:05):

Just like three months after ChatGPT was released, a survey by the Walton Family Foundation showed that 40% of teachers were already using it on a weekly basis. And actually Black and Hispanic teachers were using it much more, about 60% of them. So a lot of teachers are using it. Question is, are they there yet and are they there especially to support underserved populations. Many teachers are also really worried. One area where you probably have heard a lot of discussion is plagiarism and teachers are rightfully worried about how it is going to make things worse. Like cheating is not new. There's a research team at Stanford who has actually for many years now been collecting data on cheating practices across the world and across different levels of education like K-12 and higher ed, and they found that it's about 70% pretty consistently everywhere, and that was even prior to ChatGPT.

(05:09):

So we know this has been an issue, it's just become harder to detect other problems that teachers voice as worries about it making the rich richer and exacerbating inequities in education. Also, tech fatigue has been an issue. And this sort of ChatGPT and the pressure to catch up is just adding onto that. So many people are talking about what if this ChatGPT’s arrival and LLMs’ arrival has finally got us to this point of revolution, which many people have been yearning for. So these are, again, not new. Education research has been pointing out that we shouldn't really be assessing students' outcomes of learning or scores on their exam or summative assessment, not very helpful, but we should really focus on evaluating the learning process. And there is another view which I don't think every researcher or every person would agree with, but it is still a dominant view that maybe we should shift the focus of education being mainly about knowledge transfer to more being about empowering students, giving them agency, building positive relationships, because especially with ChatGPT, now people can find.

(06:31):

So yeah, it's just focusing more on the empowerment and giving them agency, which is becoming more important, especially now that knowledge can be accessed so easily, so much more easily than it was before. So it makes sense that instead we want to give them the skills, the critical thinking, the way of agency around choosing what information to trust or not. And also in the relationship building, which we know is critical to learning. But why are these really difficult? Why are these changes really difficult to make? What has been the main source of resistance in education? Well, one is that it’s just much harder to measure gains in, for example, the learning process or empowerment. Obviously there might be measures around it that people have developed, especially in psychology, but they're not, I don't know if in my world different camps would disagree about whether those are really reliable and especially policy makers who are really, really focused on standardized test scores, wouldn't really think that those psychological measures are that useful. And another thing is it's just really impossible for one teacher to deeply understand and adapt their instruction to add each individual student. Let's say they're teaching 150 students, measuring their learning gains, learning process in situ in the classroom is just really, really challenging.

(07:58):

And then thirdly, it is really, even if let's say teachers had the time and let's say we could measure those gains, it is not really trivial to tell teachers or train teachers for knowing what they should say to, for example, empower students or to build positive relationships. There's very, very little research, surprisingly, on good practices in these spaces. So that's primarily because we just lack the data. So you would have to have very rigorous data that can help you disentangle the causal relationships between what the teacher says and how the student feels or what they know or what they learned. It is not trivial. So my kind of question or proposal is whether we could try leveraging AI or LLMs more specifically to solve each of these different barriers. And specifically I have this vision that what if we could build some type of a teacher aide that could support them both in terms of developing lessons or measuring students' learning or learning progress and giving them feedback on what type of practices they use in the classroom.

(09:16):

In this talk, I'm going to especially focus in on this third area, which is knowing what to say to students to empower them. So conventionally, the way teachers are taught or trained or even loosely professional learning is there's an instructional coach who sits into their classroom and then gives them feedback. This is actually after tutoring or actually alongside tutoring the most effective evidence-backed practice in education. We know that these two things, tutoring and teacher coaching, drive student learning by a large amount. So it's a great practice. However, it doesn't really scale. It requires expertise, it requires consistency. Coaches are at the moment not even incentivized to scale. They get paid the same amount if they have coached two teachers or six teachers, and obviously they have limited bandwidth. So they're not going to do six. And it's oftentimes it’s not really data driven or adaptive to that specific teacher’s instruction.

(10:24):

They might have a specific rubric that they use that might be hard to tailor. On the other hand, AI powered coaching could create scalable and low cost and consistent solutions. And it can also be personalized and data-driven, adaptive and more, and an additional benefit is it could be sort of private to the teacher so that, because some of them also are kind of reluctant to someone else sitting in and judging their lessons, and that can kind of remove that human judgment aspect. At the same time, it also lacks the human connection element, just like students are motivated by humans, human teachers, and that's something that AI is not going to replace, teachers are also motivated by human coaches. So ideally human coaches and AI could work in tandem and compliment each other to give teachers feedback. In my past work, a lot of my work in the past few years has been focused on providing teachers with automated feedback powered by NLP.

(11:29):

These are also LLMs, but they're not generative models. I didn't use them in a generative way, but more measurement focused. So I've now run like three RCTs focused on improving, giving feedback to teachers and specific instructional practices like their uptake of student ideas or use of focusing questions that probe students' thinking. And they have shown that this type of feedback that's completely just descriptive statistics, non-judgemental, non-evaluative, can actually improve both their instruction and student outcomes. So as a result of this type of automated feedback, we found that teachers take up student ideas about 16% more across these different contexts. They ask more questions and they also enjoy the teaching experience more. And students also talk more as a result of their teachers getting the feedback. These are intent to treat analysis, meaning that we are taking into account people who didn't even check the feedback but were just offered the feedback.

(12:31):

And then students also gave higher course ratings and they feel more optimistic about their academic future. So that's really promising. And now we wanted to understand how we could use these newer technologies like LLMs to give feedback to teachers and do so in even kind of going beyond just these descriptive statistics because that's what teachers were saying often, like, oh, it's cool to see these statistics. Maybe it's helpful for especially novice teachers to reflect on their instruction or raise awareness that taking up certain ideas is important. It's not, they want actionable suggestions. They want something that goes beyond just, here's the number of times you took up student ideas, here are the examples. So LLMs lend themselves to trying that out. However, we found when my student rose that ChatGPT isn't a very good teacher coach when you just use it off the shelf, even when you kind of prompt it with instructions that are derived from educational observation instruments that people use to score practice.

(13:37):

So specifically what we did was we asked ChatGPT, this was actually pre GPT-4, so this was 3.5 and we totally acknowledged that a lot might have improved, although I don't think, I mean based on our experiments or preliminary explanations, we don't see that it really qualitatively has improved. It has become slightly better from 3.5 to 4 in this area. But what we did was we added ChatGPT-3.5 to provide suggestions for helping elementary teachers improve. So we extracted certain definitions around how we define mathematical reasoning and asked the model to prompt, provide suggestions to the teachers on how to elicit more mathematical reasoning. We provided a segment from existing, like a classroom transcript dataset, elementary math classroom dataset that was rated by an expert observer to be low on teachers being able to elicit student mathematical explanations. So we know there was a lot of room for improvement.

(14:45):

And then we use the insights generated by the model. We showed those insights to expert math teachers. Both of them have decades of experience. One is an instructional coach to evaluate how novel, how relevant, how useful and how faithful are the models’ suggestions. And we found that 82% of the time, ChatGPT just repeated what the teacher already does. So it wasn't really insightful or novel in that way. However, it was not, most of the time it wasn't generating harmful advice or bad advice. It was just not useful. Not that it's not useful, but it's not really insightful advice. So that just kind of showed us that ChatGPT isn't very good at channeling expert insightful feedback for teachers. That makes sense because why would it? You can't find that as training data on the internet. Like instructional coaches and teachers usually talk in person, not online.

(15:47):

There's no data that really the model could have leveraged to learn those types of insights. So we're trying different approaches to adapt these models for better feedback. For pedagogically grounded feedback to teachers, one approach is measurement. So my worry with generative feedback is that one is just slightly trickier to evaluate. Two, sometimes I do think that the best approach might still be to just keep the numbers and statistics and have the coach, a human, provide the suggestions. And so I'm not sure how much we should be relying on the LLMs for generating suggestions like a coach, maybe down the line when the models are much better. That might work. So one approach we've been taking for the past few years is fine tuning them for measurements. So following mostly the traditional paradigm of annotating data with expert teachers, training fine tuning models, and then running inference to understand correlations with outcomes, et cetera.

(16:56):

So maybe as from not from someone not coming from an educational background, you may even wonder why is classroom management so important? But it's actually one of the key predictors of many things. One is to create a predictor of student learning, especially if classroom management is really bad, at least really poor learning outcomes. Second of all, we know that from a lot of research that teachers are really struggling with classroom management because it is the hardest to take from their abstract training into the real classroom when there's lots of students and it can get crazy and it's something that often teachers learn over time, but many of them never really quite get there. And then another very important reason is that there's a lot of research on how in the US at least, and probably other countries, there's still a predominant use of exclusionary disciplining practices and racial disparities around discipline.

(18:01):

Those mainly pertain to students literally being suspended, kicked out of the classroom, et cetera, which I don't have to tell you I guess how it affects their learning or belonging in the classroom. And classroom management is a precursor to those things. So improving those practices can prevent these types of escalations. So these are just some aspects why classroom management was, we wanted to focus on that. So what we found was that this dataset that we used came with these expert observation labels for classroom positive climates, student engagement, productivity, whether the classroom work is connected to mathematics, these were not things we labeled. These came with the data and we found that the rate and amount of classroom management strongly predict observation scores for these dimensions. We also found that this data also came with these responses from teachers and students like survey responses that signal their perception of the classroom and climate and behavior.

(19:17):

So for example, whether the teacher thinks that they’re commending students often, or whether they're losing time or whether they feel disrespected. And the students also reported if they think their behavior is good in the class or it's a problem. So we found, again, these are kind of just external validations of our measure that punitive that rate of classroom management and punitiveness correlate in the way you would expect with these external measures. And finally, we also found that the rate of classroom management, as also previous research showed, but not in any quantitative way, that the more teachers use classroom management, the worse the students' exam scores get. And this is controlling for their prior test scores. So it's kind of a value added measure. And interestingly, we didn't find a relationship between punitiveness and exam scores. We also found significant racial disparities. So what we did was, so this data comes with aggregate classroom demographic information and teacher demographic information.

(20:33):

We found, because I don't want to go through all the details, but teachers’ identity, ethnic and racial identity, did not correlate significantly with their use of classroom management or punitiveness, but the student's demographics did. So what we found is that the more African-American students are in the classroom, the more classroom management the teacher uses and the more punitive it gets. So in other words, African American students in this dataset experience more classroom management and punitive classroom management and Hispanic students, they don't experience more classroom management, but they do experience more punitive classroom management. So this is pretty interesting and just something that we did find on the teacher side was that male teachers actually use less classroom management than female teachers.

(21:35):

This is a pretty striking finding that may not surprise many people, but it is something that kind of quantifies and corroborates previous more qualitative work on classroom management and racial disparities in that space and corroborates quantitative work on racial disparities in disciplining practices. So we also use these measures to look at how the amount of punitiveness of classroom management increases during the class period to better understand what might be areas where you could intervene to mitigate the use of these practices. So we found that both the rate of classroom management and punitiveness go up over time during the classroom and that there's a consistent racial disparity in that the classrooms with the top quartile of African-American students experience more consistent start out with more classroom management and with more punitive classroom management and end with it and kind of consistent throughout. It shows that maybe one area of intervention is the beginning of the classroom.

(22:50):

If you are able to start in a good place and be able to manage your classroom better, maybe you can actually prevent these types of escalations over time. So just to sum up this case study was just a way to illustrate how LLM empowered automated measures can help scalably quantify relationships between pedagogical moves and external factors. They can help identify potential areas of interventions both for teachers and for coaches and other stakeholders in education. And finally, they can be used down the line to provide adaptive feedback on these pedagogical moves. So in my current work and future work versus just kind of building on what I just showed you, so improving this feedback, working closely with educators, integrating the feedback into existing professional learning frameworks. So not just doing this is one more thing you should be doing or focusing on, but really integrating not just both physically integrate into an existing platform that they're using and integrating with, I dunno, professional learning frameworks or curricula that they're using, otherwise practices that they're told in other contexts to use more. And finally, facilitating equitable access and safe access. For example, one of my main worries with LLMs is that yes, sure we could say that they're going to finally give people access to educational opportunities who didn't have it. But I see kind of little work on people actually working on that. So I want to also make that the priority is to actually start out working with populations that otherwise wouldn't have access to these types of tools.

Kate Atkins, host (24:44):

You can watch the full talk by clicking the link in our show notes. To learn more about upcoming events like this, visit us at umsi.info/events and tune in next time to hear from Jason Blessing during the 2015 talk at UMSI focused on entrepreneurial success. Blessing, CEO of Plex Systems, shares his tips for success in what he learned building a startup from the ground up.

Jason Blessing (25:10):

Unless you are again Mark Zuckerberg and found the next Facebook, your career is going to be an ultra-marathon. It's not a sprint, it's not a marathon. It is an ultra marathon. And in my own experience as a CEO now, and just knowing a lot of CEOs and knowing a lot of software executives, the most successful ones have a balanced life.

Kate Atkins, host (25:32):

That's in our next episode. Before we go, remember that UMSI offers a fully online master's degree in applied data science. Join the leaders and the best by earning a University of Michigan master's from anywhere in the world. To find out more, click the link in our show notes. The University of Michigan School of Information creates and shares knowledge so that people like you will use information with technology to build a better world. Don't forget to subscribe to Information Changes Everything on your favorite podcast platform, and if you've got questions, comments, or episode ideas, send us an email at [email protected]. From all of us at the University of Michigan School of Information, thanks for listening.

Information Changes Everything: The Podcast

Information Changes Everything: The Podcast

News and research from the world of information science, presented by the University of Michigan School of Information.