Digging deep: A look at data mining on campus

Data Mining on Campus

Whether it's called "big data" or "data mining," the idea is basically the same: to collect as much information as possible and then find patterns, probabilities or problems. In the Information Age, this is easier than ever before because more and more people are carrying around GPS-enabled devices, logging onto the Internet and generally doing things that are easily trackable. The data is used to tailor ads to be more effective based on your Google searches, suggest products on Amazon or movies on Netflix you might be interested in because of your purchasing/viewing history, and even rearrange items in the grocery store based on who buys certain products together.


Now, data mining has moved to higher education, unearthing things that schools are doing to streamline educational methods, from how students get across campus to what classes they should take next and more. But how is this deep data digging going to change the educational landscape?

Tapping the potential

More than "big data," the phrase "data mining" seems apropos for what's going on. Before electronics permeated every little part of our society, market research had to be done at arm's length using surveys and focus groups. Think of this data as gold painstakingly sluiced from a river -- it's easy enough to collect, but you're just sure there's mountains more around you. Big data is the mother lode. There's so much data that those collecting it hardly know what to do with it all. What they do know, though, is that it's worth a lot. In marketing, this insight means making sales. In higher education, it could mean greater efficiency and efficacy.

So, just how is all of this data collected on campuses, both brick-and-mortar and virtual?

Online: Where you clicked, what you clicked on, how long you spent on the page, where you went afterwards, where you came from, what you hovered over, what you searched for, whether or not that search was successful (did you search again?), what other pages you visited, what kind of device/computer you used to access the site, where you accessed it from, how often you come back -- all of these things and much more can be tracked and recorded using cookies and code built into the pages themselves.

On-campus: Where students swipe ID cards, what classes they take, what grades they get, whether or not they dropped the class and when that happened, how they scored on various tests (e.g., SAT, ACT and proficiency tests), what they're majoring/minoring in, attendance rates -- colleges have access to a lot of information about students.

Data from classrooms, both physical and online, is also collected and put to use, which is one of the huge benefits of massive open online courses (MOOCs). As Daphne Koller, a co-founder of MOOC giant Cousera, says in her TED Talk "What we're learning from online education," "If two students in a class of one hundred gave the same wrong answer, you would never notice -- but if two thousand students give the same wrong answer, it's kind of hard to miss." This kind of massive data mining allows teachers to see where there might be a common misconception that can be corrected easily with a tailored error message or a clarifying lesson.

Putting all of that big data to work

Identifying less-than-clear points in lessons is an excellent use for data, but that barely scratches the surface of how colleges are putting this new found information to work.

  • Brown University has been collecting undergraduate course enrollment data, and what they found surprised them. Today's undergraduates take a huge variety of classes across many different departments and subjects. By itself, this isn't earth-shattering news. Undergrads are notorious for having trouble making up their minds. What was surprising about the data is that it showed that this doesn't stop after two years, when Brown students are forced to pick a concentration. It lasts for students' entire undergraduate careers. They're using this data to rethink how the campus is set up physically, hoping to eliminate unnecessary treks from one side of campus to the other and back, and to reassess majors that have a narrow focus on one discipline.
  • Arizona State University is huge. At 72,000+ students, it's the largest public university in the nation, which makes it a great place to collect and utilize a lot of data. Data driven experiments include what they called the eAdvisor, which they based on a University of Florida system. The eAdvisor keeps a record of what classes a student is taking and would actually send out an email alerting students that they were "off track" and would need to either get back to working on their chosen major or switch to a new field of study. This was meant to help students figure out whether or not they were in the right major early on, rather than three years in when they finally got around to statistics and found they couldn't hack it. The system would also recommend classes based on academic records, predicting how well a student might do in a given course before s/he set foot in the classroom. During the 2008-09 academic year, eAdvisor's first year, retention rates went up from 77 percent to 84 percent -- coincidence?

Systems like these are being used all over the globe in hopes of solving problems like entry-level classes that empty out after the first midterm, leaving professors left with a room of vacant seats wondering what went wrong. Behavior models are being developed that can put up red flags for professors when a student displaying signs that put her or him in the "likely to drop out or fail" category, allowing the professor to step in and catch these students before they slip through the cracks. Online courses are being tailored to enrolled students' weak spots mid-class, shifting lessons to work on areas that need it, rather than blinding plunging ahead or covering things that students have a good handle on. But as Marc Parry of The New York Times asks in his article "Big Data on Campus," "What role does a professor play when an algorithm recommends the next lesson?"

Data strip mining and the homogenization of academia

People are asking big questions about big data. With all of this prediction, suggestion and tracking going on, will students still explore new subjects? Will they challenge themselves and challenge the algorithm that is predicting a failing grade for them in a class that's a stretch for them? What happens when errors are made in these choice-influencing algorithms and students are sent off course? Today's students were raised in a society that praises and nurtures individuality -- and who doesn't feel the urge to rebel when pegged as a certain type of person? Time will tell about the future of data mining in higher education and whether or not our highly individualistic students settle for being just another bar code on campus.

About the Author: