Figure 1: Part of the knowledge map in Junyi Academy. Each node is an interactive exercise problem, and the links between nodes represent the prerequisite structure in the knowledge tree .

Figure 2: The proposed work flow. (a) A screen shot of the local knowledge tree, (b) examples of usage logs, (c) a example of exercise pairs, (d) the sparse similarity matrices labeled by workers, (e) the dense similarity matrices predicted by a regression model, and (f) the color code of (d) and (e). Note that higher value of element *(a,b)* in the prerequisite similarity matrix represents exercise *b* is more likely to be a prerequisite of exercise *a*, and the exercise ID in similarity matrices is sorted by agglomerative hierarchical clustering.

In this work, our goal is to figure out the relationships which are able to
reflect as many viewpoints of students as possible. To solve the problems
generally, efficiently, and accurately, we advocate a hybrid method which
integrates the power of crowdsourcing and machine learning
as [24] did for finding prerequisite relationships
among documents. As illustrated in Figure 2, we first
quantify the similarity, difficulty, and prerequisite relationships of exercise
pairs using crowd wisdom. That is, we recruit workers to compare some samples
of mathematical exercise pairs, and use their average opinions as our labels.
Then, we characterize each exercise pair by various types of features extracted
from the title of exercises and user practice logs. After having labels and
features, relationships of every exercise pairs can be predicted by a
regression algorithm.
Generally, performing quantitative evaluation is challenging for the data-driven approaches which focus on usage logs like [21,3]. In our case, collected labels can be used to quantitatively evaluate the prediction of machines. Interestingly, experiment shows that predictions generated by regression models are much closer to crowd consensus than most of individuals. At the end of this work, we apply the relationship information between 370 mathematical exercises on Junyi Academy to recommend prerequisite exercises, to discover knowledge components, and to choose topics of new exercises.
Our contributions in this paper are three-fold:
- We propose an unified approach which efficiently quantify pairwise relationships between exercises.
- For similarity, difficulty, and prerequisite relationships of exercises, our method identifies which kinds of features are important for each type of relationships.
- Relationships predicted by machines are evaluated quantitatively and qualitatively in Junyi Academy.

- How similar is the knowledge required for answering these two exercises?
- How difficult is exercise B compared with exercises A (Scoring higher means B is more difficult than A, and scoring 5 indicate that they have the same difficulty)?
- After students learn how to answer exercise B, how appropriate do instructors utilize exercise A to improve the students' understanding on the topic step by step?

Figure 3: The scatter plot of workers' responses for the relationships of mathematical exercises on Junyi Academy. Each dot is a score rated by a worker.

It is worth mentioning that the relationships between exercises might not be independent with each other. For example, our intuition suggests that two exercises are very likely to have prerequisite relationships, if one is similar with and slightly easier than the other. As visualized in Figure 3, we can see that prerequisite and similarity relationships are highly correlated (with correlation coefficient 0.87), and a more difficult exercise in a pair is less likely to become the prerequisite of the other. If we perform principle component analysis (PCA) on the 3 dimensional relationship space, the last PCA component only explains less than 3% of data variance. These analyses verify the intuition that 2 degree of freedom in the distribution can explain most of workers' ratings.
Figure 4: Feature importance identified by random forest regressor [2] for our relationships prediction task in the training dataset of Junyi Academy. The red bar of each category means the summation of all the feature importance in the category, and blue error bar is the summation of the standard deviation for each feature importance in the forest. Symbol # represents "the number of".

To automatically predict the relationships, we extract the usage logs from Oct. 2012 to July 2014 in Junyi Academy, which contains over 10 million answering records from over 100 thousand users. Based on the practice history of each student, we model the student by predicting his/her accuracy on every exercise which is not done by the student. Using students' previous records on the other exercises as features, we can formulate the student modeling task as regression/classification problem like [28] did. We suppose that similar or prerequisite exercises would play important roles in such prediction, so feature importance in the random forest regressor [2] is extracted to help us choose exercise pairs while collecting labels and to help us predict the relationships between exercise pairs.
When describing relationships between exercise A and exercise B, we extract the following features from usage log and cluster them into 6 categories: Figure 5: The distribution of worker performances on 3 metrics (on different columns) and 3 relationships (on different rows) in the training set of Junyi Academy. Smaller value is better for the relative squared error, and larger value is better for the rank coefficients. Usual workers are labeled from W1 to W51. Teachers are denoted as E1 and E2 and their performances are highlighted in red color.

Table 1: Performance comparisons between different methods in training set of Junyi Academy. The regressors and features are evaluated using cross validation, and best performances among different regressors are highlighted in bold font. Note that RSE, PCR, nu-SVR, and GBR mean relative squared error, principle component regression, nu support vector regression, and gradient boosting regressor, respectively. In addition, KT, KM, ET, SM, AA, UN, PT, w/, and w/o represent the knowledge tree, the knowledge map, exercise titles, student modeling, answering accuracy, user numbers, numbers of problems taken, with, and without, respectively.

Similarity | Difficulty | Prerequisite | |||||||||

Methods | RSE | Spearman's ρ | Kendall's τ | RSE | Spearman's ρ | Kendall's τ | RSE | Spearman's ρ | Kendall's τ | ||

0.193 | 0.188 | 0.208 | 0.492 | 0.063 | 0.050 | 0.316 | 0.000 | -0.007 | |||

∼ 1.124 | ∼ 0.854 | ∼ 0.750 | ∼ 3.235 | ∼ 0.820 | ∼ 0.747 | ∼ 2.381 | ∼ 0.813 | ∼ 0.725 | |||

Mean | 0.574 | 0.598 | 0.524 | 1.096 | 0.516 | 0.439 | 0.986 | 0.458 | 0.387 | ||

0.493 | 0.648 | 0.560 | 0.619 | 0.625 | 0.539 | 0.858 | 0.571 | 0.504 | |||

∼ 0.543 | ∼ 0.718 | ∼ 0.625 | ∼ 0.741 | ∼ 0.634 | ∼ 0.540 | ∼ 1.054 | ∼ 0.684 | ∼ 0.594 | |||

Mean | 0.518 | 0.683 | 0.593 | 0.680 | 0.630 | 0.539 | 0.956 | 0.638 | 0.549 | ||

Linear Regression | 0.370 | 0.658 | 0.567 | 0.470 | 0.593 | 0.504 | 0.424 | 0.624 | 0.541 | ||

PCR (2 components) | 0.370 | 0.656 | 0.567 | 0.474 | 0.595 | 0.503 | 0.427 | 0.608 | 0.526 | ||

nu-SVR | 0.349 | 0.683 | 0.594 | 0.483 | 0.611 | 0.526 | 0.402 | 0.611 | 0.520 | ||

Random Forest | 0.320 | 0.662 | 0.575 | 0.493 | 0.576 | 0.493 | 0.376 | 0.608 | 0.516 | ||

GBR | 0.288 | 0.680 | 0.590 | 0.453 | 0.610 | 0.521 | 0.346 | 0.600 | 0.514 | ||

GBR (w/o Rounding) | 0.273 | 0.690 | 0.604 | 0.445 | 0.656 | 0.564 | 0.327 | 0.649 | 0.558 | ||

w/o KT and KM | 0.311 | 0.681 | 0.589 | 0.474 | 0.626 | 0.532 | 0.378 | 0.607 | 0.515 | ||

w/o KT, KM, and ET | 0.433 | 0.607 | 0.521 | 0.472 | 0.642 | 0.546 | 0.472 | 0.567 | 0.478 | ||

w/ SM, AA, UN, and PT | 0.548 | 0.516 | 0.438 | 0.502 | 0.610 | 0.516 | 0.595 | 0.463 | 0.377 | ||

w/ SM and AA | 0.598 | 0.524 | 0.446 | 0.632 | 0.486 | 0.400 | 0.666 | 0.417 | 0.346 | ||

w/ KT | 0.674 | 0.463 | 0.418 | 0.869 | 0.448 | 0.382 | 0.717 | 0.360 | 0.318 | ||

w/ ET and UN | 0.447 | 0.598 | 0.512 | 0.615 | 0.580 | 0.514 | 0.495 | 0.563 | 0.483 |

Table 2: Performance comparisons of different methods in the testing set of Junyi Academy. Note that the meaning of all abbreviations is the same as Table 1.

Similarity | Difficulty | Prerequisite | |||||||||

Methods | RSE | Spearman's ρ | Kendall's τ | RSE | Spearman's ρ | Kendall's τ | RSE | Spearman's ρ | Kendall's τ | ||

0.220 | 0.798 | 0.707 | 0.362 | 0.773 | 0.679 | 0.271 | 0.727 | 0.625 | |||

∼ 0.342 | ∼ 0.871 | ∼ 0.798 | ∼ 0.665 | ∼ 0.797 | ∼ 0.701 | ∼ 0.714 | ∼ 0.810 | ∼ 0.734 | |||

Mean | 0.269 | 0.845 | 0.762 | 0.484 | 0.785 | 0.688 | 0.421 | 0.758 | 0.666 | ||

Linear Regression | 0.327 | 0.767 | 0.664 | 0.602 | 0.606 | 0.500 | 0.353 | 0.791 | 0.683 | ||

PCR (2 components) | 0.316 | 0.769 | 0.667 | 0.599 | 0.597 | 0.487 | 0.367 | 0.786 | 0.675 | ||

nu-SVR | 0.294 | 0.778 | 0.674 | 0.577 | 0.591 | 0.486 | 0.338 | 0.798 | 0.693 | ||

Random Forest | 0.262 | 0.781 | 0.680 | 0.608 | 0.576 | 0.474 | 0.364 | 0.764 | 0.648 | ||

GBR | 0.264 | 0.780 | 0.677 | 0.577 | 0.592 | 0.487 | 0.329 | 0.781 | 0.672 | ||

GBR (w/o Rounding) | 0.258 | 0.768 | 0.653 | 0.563 | 0.610 | 0.506 | 0.323 | 0.783 | 0.666 |

Figure 6: An example of Recommendation for Prerequisite Exercises (a) A screen shot of the local knowledge tree, (b)-(d) screen shots of exercises with title "angle_types", "angle_addition_postulate", and "triangle_types", respectively. Note that blue words indicate the relationships predicted by our model, and red words are the translation of Chinese characters beside.

As discussed in Sec. 1, many important applications such as adaptive testing can be based on relationships between exercises. In the following subsections, several simple applications of the relationships are demonstrated mainly for quantitatively evaluating our approach and the quality of our labels in the training set of Junyi Academy.
Figure 7: Visualization of our similarity and difficulty relationships among 370 exercises in Junyi Academy. (a) The difficulty of each exercise, (b) the title of every exercise, and (c) the dendrogram of average linkage clustering (a type of agglomerative hierarchical clustering). More difficult exercise has longer red bar in (a), and more similar exercises are clustered sooner in (c).

[1]
J. Bayer, H. Bydzovská, and J. Géryk.
Towards course prerequisites refinement.
*IMEA*, 2012.
[2]
L. Breiman.
Random forests.
*Machine learning*, 2001.
[3]
E. Brunskill.
Estimating prerequisite structure from noisy data.
In *EDM*, 2011.
[4]
E. Brunskill and S. J. Russell.
RAPID: A reachable anytime planner for imprecisely-sensed domains.
In *UAI*, 2010.
[5]
C. Carmona, E. Millán, J.-L. P. de-la Cruz, M. Trella, and R. Conejo.
Introducing prerequisite relations in a multi-layered bayesian
student model.
In *User Modeling*, 2005.
[6]
C.-C. Chang and C.-J. Lin.
LIBSVM: A library for support vector machines.
*ACM Transactions on Intelligent Systems and Technology*, 2011.
[7]
C.-M. Chen, C.-Y. Liu, and M.-H. Chang.
Personalized curriculum sequencing utilizing modified item response
theory for web-based instruction.
*Expert Systems with applications*, 2006.
[8]
M. Desmarais, P. Meshkinfam, and M. Gagnon.
Learned student models with item to item knowledge structures.
*User Model. User-Adapt. Interact.*, 2006.
[9]
M. C. Desmarais.
Performance comparison of item-to-item skills models with the irt
single latent trait model.
In *UMAP*, 2011.
[10]
E. C. Fieller, H. O. Hartley, and E. S. Pearson.
Tests for rank correlation coefficients. i.
*Biometrika*, 1957.
[11]
J. H. Friedman.
Greedy function approximation: a gradient boosting machine.
*Annals of Statistics*, 2001.
[12]
T.-C. Hsieh and T.-I. Wang.
A mining-based approach on discovering courses pattern for
constructing suitable learning path.
*Expert Syst. Appl.*, 2010.
[13]
G.-J. Hwang.
A conceptual map model for developing intelligent tutoring systems.
*Computers & Education*, 2003.
[14]
I. T. Jolliffe.
A note on the use of principal components in regression.
*Applied Statistics*, 1982.
[15]
C. Koutsojannis, G. Beligiannis, I. Hatzilygeroudis, and C. Papavlasopoulos.
Using a hybrid ai approach for exercise difficulty level adaptation.
*International Journal of Continuing Engineering Education and
Life Long Learning*, 2007.
[16]
B. Lakshminarayanan and Y. W. Teh.
Inferring ground truth from multi-annotator ordinal data: a
probabilistic approach.
*CoRR*, 2013.
[17]
A. S. Lan, C. Studer, A. E. Waters, and R. G. Baraniuk.
Tag-aware ordinal sparse factor analysis for learning and content
analytics.
In *EDM*, 2013.
[18]
D. Lynch and C. P. Howlin.
Real world usage of an adaptive testing algorithm to uncover latent
knowledge.
In *ICERI*, 2014.
[19]
M. L. Nguyen, S. C. Hui, and A. C. M. Fong.
Content-based collaborative filtering for question difficulty
calibration.
In *PRICAI*, 2012.
[20]
Z. A. Pardos and N. T. Heffernan.
Determining the significance of item order in randomized problem
sets.
In *EDM*, 2009.
[21]
P. I. Pavlik, H. Cen, L. Wu, and K. R. Koedinger.
Using item-type performance covariance to improve the skill model of
an existing tutor.
In *EDM*, 2008.
[22]
M. H. Pinson and S. Wolf.
Comparing subjective video quality testing methodologies.
In *Visual Communications and Image Processing*, 2003.
[23]
R. Scheines, E. Silver, and I. Goldin.
Discovering prerequisite relationships among knowledge components.
In *EDM*, 2014.
[24]
P. P. Talukdar and W. W. Cohen.
Crowdsourced comprehension: predicting prerequisite structure in
wikipedia.
In *Proceedings of the Seventh Workshop on Building Educational
Applications Using NLP*, 2012.
[25]
W. J. van der Linden and R. K. Hambleton.
Handbook of modern item response theory.
*New York*, 1997.
[26]
A. Vuong, T. Nixon, and B. Towle.
A method for finding prerequisites within a curriculum.
In *EDM*, 2011.
[27]
K. Wauters, P. Desmet, and W. van den Noortgate.
Acquiring item difficulty estimates: a collaborative effort of data
and judgment.
In *EDM*, 2011.
[28]
H.-F. Yu et al.
Feature engineering and classifier ensemble for kdd cup 2010.
In *In JMLR Workshop and Conference Proceedings*, 2011.
### Footnotes:

1. http://www.junyiacademy.org/">http://www.junyiacademy.org/
2. https://www.khanacademy.org/">https://www.khanacademy.org/

Sheng-Wei Chen (also known as Kuan-Ta Chen)

http://www.iis.sinica.edu.tw/~swc

Last Update September 19, 2017

Sheng-Wei Chen (also known as Kuan-Ta Chen)

http://www.iis.sinica.edu.tw/~swc

Last Update September 19, 2017