Tackling Bad L&D Data: Strategies for Effective xAPI Governance

xAPI governance protects your learning record store (LRS) from bad data. But what can you do about bad data that’s already in your LRS? Whether it’s duplicated completion verbs, activity ID reuse, or another issue, let’s look at some options to help you tidy up when things go wrong.

Fix the leak before you mop up the puddle.

Whatever approach you choose to fix the bad data, the first step is always fixing the source of the bad data and improving processes (and application of those processes) to prevent it from happening again.

If you don’t take this step, you’ll be continually fixing bad data that comes into your LRS.

Edit the database.

Depending on your learning record store, it may be technically possible to go into the database and directly correct issues in the xAPI statement data that is already stored in your LRS. For example, you could replace all instances of a certain verb ID with a new one.

Edit xAPI database

As Dr. Ian Malcom explained in Jurassic Park, “Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.”

In other words, just because you can edit the database, doesn’t mean you should—you need to be aware that doing so could result in even more problems, such as:

1) Mismatched Statements

If your LRS is integrated with an ecosystem that includes another LRS, fixing the data in one LRS won’t fix it in another. This could lead to some serious confusion down the line when the statements in the two LRSs don’t match.

2) Database Integrity Problems

The way in which the data is stored may be complex enough to make database changes a lot more involved than a simple find and replace. Making changes could risk the integrity of the data and lead to unexpected problems.

For these reasons, making database edits to statements is not something we support or recommend.

Void and replace.

Fortunately, xAPI defines a way to get rid of bad data. Voiding is a means of sending an additional xAPI statement to mark an existing statement as “void.” You can then send a new, corrected statement to replace the voided statement. The voiding and replacement statements can then be forwarded to every LRS in the ecosystem, ensuring the data is consistent between LRSs.

(NOTE: Writing a script to void and replace xAPI data is a very technical task, so we won’t go into details in this blog post. Just know that it’s an option and a much safer option than a direct database edit.)

Of course, the void and replace approach is only possible if a script can identify and replace the problems with the data. For example, if an application has errantly added an extra “/” character on the end of a verb ID, a script could find all statements using those verbs in order to remove the extra slash mark.

Nuke from orbit.

In some circumstances, the problems with the data are more severe, making it impossible to determine the data's intended meaning. For example, using the same activity ID for every e-learning course might result in not knowing which courses were actually completed.

No data is better than bad xAPI data.

In these scenarios, the “only way to be sure” is to remove all of the affected data. In some cases, you may be able to re-import the data as a CSV file from another system and convert it into correct xAPI statements. In other cases, perhaps the data is lost.

Either way, not having any xAPI data is better than having misleading data. That's why removing bad data may be the least worst option.

If you decide to take the nuclear option, back up the data first in case there is something useful in there after all. Even in the scenario of every course using the same activity ID, there may be some value in knowing how many completions there were across all the e-learning courses for a given period of time.

Up Next: Use a flexible Learning Analytics Platform (Part 6)

In some cases, the data might be too difficult to fix, but isn’t bad enough to warrant deleting. In these instances, you may be able to work with the data—and your Learning Analytics Platform may have features to help accomplish this. In our next post, we explore these features, using Watershed as an example.

NOTE: Jurassic Park and Aliens are used here only to illustrate the examples in this blog post. Watershed is not associated with, sponsored by, or affiliated with Universal Pictures, Amblin Entertainment, or Twentieth Century Fox.

Subscribe to our blog

Rules Help Control the Fun!

We’ve compiled everything from our xAPI Governance blog series into this handy guide—including best practices, tools, and technology for cleaning up and maintaining good data.

This website stores cookies on your computer to improve your experience and the services we provide. To learn more, see our Privacy Policy