Stage implement · Topics connectivity

Optimize Learning Path. Stage 7/8

Topics connectivity

Report a typo

Description

Great work! You're almost done. In this stage, you will complete the topological sorting algorithm.

Let's compare the current graph with the previous one:

Finally, no more 2. Introduction to OOP topic at the start of the graph. It moved to 105. Introduction to OOP right before the topic 106. Declaring classes in Kotlin. It's actually the best place for it, the various abstract topics need to be learned right before their "in-practice" counterpart topics, and Declaring classes in Kotlin seems to be the first topic that uses OOP. Another example is the 970. Files abstract topic and the 971. Reading files in Kotlin topic that uses files in practice: they are placed one after another.
There are 1242 topics that were chosen without alphabetic comparison, and 396 that were chosen alphabetically. Compare this result to the previous algorithm in Stage 5, where there were 842 and 796 of such topics, respectively. And in Stage 4, there were 190 and 1448 of such topics, respectively, so the improvements are very much visible!

Actually, the algorithm sorts topics quite well now, but you can fine-tune it a bit more. Let's prioritize topics that have higher connectivity of dependencies and higher connectivity of topics that depend on them. Here's an example:

complex graph with entangled dependencies

Higher connectivity means that topics that are more interconnected should be more important than topics that are less interconnected. In the example above, Topic 5 is visibly more interconnected than Topic 6.

The final rule is, the higher the interconnection of a set of topics is (meaning a topic's dependencies and topics that depend on it), the more important the topic is.

Calculating such importance value is a bit different from calculating the previous importance value. You just need to replace the max function with the sum function. Resulting value will grow rapidly if the topics are more interconnected, and grow slower (but also quite fast) if topics are less interconnected. It's easy for such importance value to go above million, since it grows exponentially. That's why you need to normalize it. After calculating 2 importance values (one for the topic's dependencies and the other for topics that depend on it), you should apply the following formula to every value: $ln(v + 1)$ , where $ln$ is a natural logarithm function. After that, you can sum up these 2 importance values.

An example of calculating this value (before applying the logarithm) is presented in the following picture. The first value (before |) is the importance value calculated among the topic's dependencies, and the second value (after |) is the one calculated among the topics that depend on it. Remember that the former value is calculated from roots to leafs (from top to bottom) and the latter is calculated from leafs to roots (from bottom to top).

calculation of importance value in the complex graph

Examples for the first importance value (before the | symbol):

For Topic F the values for its dependencies B and C are 13 and 11, respectively. Thus, the value for Topic F would be sum(13, 11) + 1 = 25.
For Topics 0 and 1 such value is equal to 1, since they have no dependencies and therefore the sum is 0, but you should still add 1 according to the formula.

Examples for the second importance value (after the | symbol):

For Topic 3 the value of topics that directly depend on it – Topics 5 and 9 – are 17 and 4, respectively. Thus, the value for Topic 3 would be sum(17, 4) + 1 = 22.
For topics D, G, F, and H such value is equal to 1, since they don't have topics that depend on them, and therefore the maximum is 0, but you should still add 1 according to the formula.

In the end, the importance value for Topic 5 would be $ln(5 + 1) + ln(17 + 1) = 4.68...$ , and for Topic 6 the value would be $ln(2 + 1) + ln(8 + 1) = 3.29...$ Indeed, Topic 5 is more important than Topic 6, according to this rule.

The algorithm of prioritizing topics is the following:

In case there are multiple candidates to be enumerated in the topological sorting algorithm, you should find the ones with the highest "mainline/sideline" importance value. It's the same as you did in the previous stage.
In case there are still multiple candidates, compare them according to the following formula:
$v =c / c_{max} + d/d_{max}$ , where
1. $c$ is the connectivity importance value that you get to know in this stage;
2. $c_{max}$ is the maximum connectivity importance value among all topics in the graph;
3. $d$ is the "sum of dependencies and dependents" importance value;
4. $d_{max}$ is the maximum "sum of dependencies and dependents" importance value among all topics in the graph.
In case there are still multiple candidates, use alphabetic order.

Objectives

In this stage, you need to apply a topological sorting algorithm using three importance values of topic as sorting factor, as described in the section above. Here are the steps you should perform:

Download the dataset of topics and their dependencies. It's the same dataset as in the previous stages. The dataset contains dependencies only, and they are presented in the format Topic 1 -> Topic 2, which means Topic 2 depends on Topic 1. In every line, two topics are separated by the -> sign, please ignore all spaces before and after this separator. Assume that topics with the same name are essentially the same topic. Assume that there are no topics that have no dependencies at all.
Apply a topological sorting algorithm.
Paste the resulting list of topics, each on a separate line, into the solution window. Please put a topic's number, followed by a dot and a space, before the topic's name, for example, 15. Topic 9, which means this topic should be 15th in the learning order. Enumerate topics starting from 1 and print them in the ascending order.

Examples

Notice that the fine-grained improvements described in this stage are visible only on large-scale graphs (like the dataset you are processing), not small ones. For this reason, the examples below have the same output as in the previous stage.

Example 1

For the following file:

Topic 1 -> Topic 3
Topic 1 -> Topic 4
Topic 1 -> Topic 6
Topic 2 -> Topic 4
Topic 2 -> Topic 5
Topic 4 -> Topic 3
Topic 4 -> Topic 7

Which corresponds to the following graph:

simple graph with basic dependencies

You should get the following result:

1. Topic 1
2. Topic 2
3. Topic 4
4. Topic 3
5. Topic 7
6. Topic 5
7. Topic 6

Example 2

For the following file:

Topic 0 -> Topic 1
Topic 0 -> Topic 5
Topic 1 -> Topic 2
Topic 1 -> Topic 7
Topic 2 -> Topic 3
Topic 3 -> Topic 4
Topic 5 -> Topic 2
Topic 5 -> Topic 6
Topic 6 -> Topic 7
Topic 7 -> Topic 4
Topic 8 -> Topic 2
Topic 9 -> Topic 7

Which corresponds to the following graph:

graph with 10 topics and many dependencies

You should get the following result:

1. Topic 0
2. Topic 5
3. Topic 1
4. Topic 6
5. Topic 8
6. Topic 2
7. Topic 3
8. Topic 9
9. Topic 7
10. Topic 4

Example 3

For the following file:

Topic 0 -> Topic 2
Topic 0 -> Topic 3
Topic 0 -> Topic 4
Topic 1 -> Topic 4
Topic 1 -> Topic 6
Topic 2 -> Topic 5
Topic 3 -> Topic 5
Topic 3 -> Topic 9
Topic 4 -> Topic 9
Topic 5 -> Topic 7
Topic 5 -> Topic 8
Topic 6 -> Topic 9
Topic 6 -> Topic C
Topic 7 -> Topic A
Topic 7 -> Topic B
Topic 8 -> Topic A
Topic 8 -> Topic B
Topic 9 -> Topic C
Topic A -> Topic D
Topic A -> Topic G
Topic B -> Topic E
Topic B -> Topic F
Topic C -> Topic F
Topic C -> Topic H
Topic E -> Topic G

Which corresponds to the graph in the description of this stage, you should get the following result:

1. Topic 0
2. Topic 3
3. Topic 2
4. Topic 5
5. Topic 7
6. Topic 8
7. Topic B
8. Topic E
9. Topic A
10. Topic G
11. Topic D
12. Topic 1
13. Topic 4
14. Topic 6
15. Topic 9
16. Topic C
17. Topic F
18. Topic H

Enter a short text

___

Create a free account to access the full topic