Make creation of extraction opportunities faster. #96

aravij · 2020-11-12T10:48:30Z

Changed algorithm of creation of extraction opportunities to speed it up.

Now algorithm is working in the following way:

Calculate for each statement location of next similar statement. We store a list of steps (int), where adding a step to statement index, we get index of next similar statement. Not all statements may have next similar statements.
Create initial statement ranges. Split a sequence of statements int a non overlapping sorted sequence of statements ranges without gaps between them. Initial ranges are ranges where each statement, except first one, is similar to the previous one. That way we split all statements and add such ranges to extraction opportunities. They correspond to opportunities created during step one.
Collect all similarity gaps - statements, which next similar statement does not follow them immediately.
For each such gap:
1. Identify ranges of statements where first and second statements belong.
2. Merge those two ranges and all between them into a single one.
3. If previous opportunity, created due to handling gap of the same size, starts from the same statement as newly created one, overwrite that opportunity with new one. Otherwise append new one, to already created. This step is done, because some gaps may overlap, i.e. range of second statement of first gap is equals to range of first statement of second gap. If that happens, both such gaps should belong to the same opportunity, as running previous version of algorithm would pass through them at once, because they are of the same size. We identify overlapping of gaps, as second opportunity would be large than the first one, but starts from the same statement. So if newly created opportunity starts from the same statements, created during handling of the gap of the same size, we simply overwrite that opportunity with newly created one, as it contains both gaps.

Applying new version of algorithm we get the following gain:

For file InternalMetaDataParser with 1721 methods the average speed up of create_extraction_opportunities step was 88.6% or 0.0086 seconds. The total time saved on that step is 14.8 seconds. The total processing of this file with SEMI algorithm takes 2.5 minutes.
For file TomlParser with 87 methods the average speed up of create_extraction_opportunities step was 68.3% or 0.0052 seconds. The total time saved on that step is 0.45 seconds. The total processing of this file with SEMI algorithm takes 7 seconds.

The relative speed up us quite good, while in absolute numbers it is quite irrelevant.

Further speeding up the algorithms might be done through seeding up other steps and, may be, ast framework.
Here is comparison of time taken by create_extraction_opportunities to other steps.

step name	InternalMetaDataParser old version	InternalMetaDataParser new version	TomlParser old version	TomlParser new version
Extract semantic	3.4 ms	3.4 ms	4.2 ms	4 ms
Create opportunities	9.4 ms	0.8 ms	5.9 ms	0.7 ms
Filter opportunities	13 ms	14 ms	18.7 ms	18 ms
Rank opportunities	51 ms	52 ms	47.8 ms	47 ms

lyriccoder · 2020-11-13T13:52:47Z

@aravij Let's discuss it on Monday. It is necessary to test it on large number of files.

lyriccoder · 2020-11-20T13:14:10Z

With increased speed:
Elapsed: 7889 secs
Soon, I will count without increased speed

lyriccoder · 2020-11-23T10:59:56Z

Without increased speed:
Elapsed: 8415

lyriccoder

seems it has become faster

aravij added 2 commits November 12, 2020 13:48

[62] Make creation of extraction opportunities faster.

1f22df4

Merge branch 'master' into issue-62

7dc05e2

aravij self-assigned this Nov 12, 2020

aravij linked an issue Nov 12, 2020 that may be closed by this pull request

SEMI Baseline. Finding opportunities takes too much time #62

Open

aravij requested review from acheshkov, KatGarmash and lyriccoder November 12, 2020 13:40

lyriccoder approved these changes Nov 23, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make creation of extraction opportunities faster. #96

Make creation of extraction opportunities faster. #96

aravij commented Nov 12, 2020 •

edited

Loading

lyriccoder commented Nov 13, 2020

lyriccoder commented Nov 20, 2020

lyriccoder commented Nov 23, 2020

lyriccoder left a comment

Make creation of extraction opportunities faster. #96

Are you sure you want to change the base?

Make creation of extraction opportunities faster. #96

Conversation

aravij commented Nov 12, 2020 • edited Loading

lyriccoder commented Nov 13, 2020

lyriccoder commented Nov 20, 2020

lyriccoder commented Nov 23, 2020

lyriccoder left a comment

Choose a reason for hiding this comment

aravij commented Nov 12, 2020 •

edited

Loading