SubTerra


Adds a subterranean layers and special entities to transfer items, players, and power between levels..

Overhaul
5 years ago
0.16 - 0.17
32

g Adding to established game + lua perf tips

6 years ago
(updated 6 years ago)

Wife and I have a local lan game we've been working on for about a month. After taking subterra for a spin in a sandbox I tried adding it to my local copy of the game. It appears to get stuck in the initial migration. Maybe because of the amount of revealed map it has to populate?

I copied it on the Linux VM on my i7 NUC that hosts the lan game, and had the same problem.

A quick watch of top tells me that without subterra, factorio is using 737MB of ram. But if it triggers the subterra migration, it rapidly consumes all 4G of ram the machine has.

Some lua feedback:

function remove_value(t, value)
    local index
    for i, v in pairs(t) do
        if value == v then
            index = i
            break
        end
    end
    if index then
        table.remove(t, index)
        return true
    end
    return false
end

is cleaner as:

function remove_value(t, value)
    for i, v in pairs(t) do
        if value == v then
            table.remove(t, i)
            return true
        end
    end
    return false
end

and

    for chunk in surface.get_chunks() do
        minx = math.min(minx, chunk.x)
        miny = math.min(miny, chunk.y)
        maxx = math.max(maxx, chunk.x)
        maxy = math.max(maxy, chunk.y)
    end

is about 120x slower (notr 1.2x) than:

    local min, max = math.min, math.max
    for chunk in surface.get_chunks() do
        minx = min(minx, chunk.x)
        miny = min(miny, chunk.y)
        maxx = max(maxx, chunk.x)
        maxy = max(maxy, chunk.y)
    end

            if counts[n] then
                counts[n] = counts[n] + 1
                events[n][counts[n]] = event.callback
            else
                counts[n] = 1
                events[n] = {}
                events[n][1] = event.callback
            end

could be

            local count = counts[n] or 0
            if count == 0 then
                count = 1
                events[n] = {}
            else
                count = count + 1
            end
            counts[n] = count
            events[n][count] = event.callback

Capture the 'script.on_nth_tick' here, and possibly counts[n]:

        for n, callbacks in pairs(events) do
            if counts[n] == 1 then
                script.on_nth_tick(n, callbacks[1])
                print("Wired nth tick event for n = " .. n)
            elseif counts[n] > 1 then
                script.on_nth_tick(n,
                function(event)
                    for _, callback in pairs(callbacks) do
                        callback(event)
                    end
                end)
            end
        end

Is the double insertion of the event callback here intentional? If so, a comment might be a good idea:

        table.insert(events[id], event.callback)
        table.insert(events[id], event.callback)
        counts[id] = counts[id] + 1

Is the "bounding box" object your own or Factorios? It's a kind of expensive representation :(

    bb = {
        left_top = { x = ..., y = ... },
        right_bottom = { x = ..., y = ... },
    }

either

    bb = { ltx, lty, rbx, rby }

or at least

    bb = { ltx = ..., lty = ..., rbx = ..., rby = ... }

would be significantly more efficient in terms of memory and cache coherency.

function QuadtreeNode:new_root (new_children)
    local left = new_children[1].extents.left_top.x
    local top = new_children[1].extents.left_top.y
    local right = new_children[4].extents.right_bottom.x
    local bottom = new_children[4].extents.right_bottom.y

OUCH! If we take the following:

qtn = { __index = qtn }

function qtn:new_root (new_children)
  local left = new_children[1].extents.left_top.x
  local top = new_children[1].extents.left_top.y
  local right = new_children[4].extents.right_bottom.x
  local bottom = new_children[4].extents.right_bottom.y

  return left + top + right + bottom  -- stop compiler being too clever
end

function qtn:alt_root (new_children)
  local slice = table.unpack
  local left, top = slice(new_children[1].extents, 1, 2)
  local right, bottom = slice(new_children[4].extents, 3, 4)

  return left + top + right + bottom
end

function qtn:alt_root2 (new_children)
  local lt, rb = new_children[1].extents, new_children[4].extents
  local left, top, right, bottom = lt.ltx, lt.lty, rb.rbx, rb.rby

  return left + top + right + bottom
end

and compile it with "luac -l -l -s -v" we can see that each of the above has different costs, with the first having the most and the most expensive:

2 params, 7 slots, 0 upvalues, 6 locals, 7 constants, 0 functions
        1       [4]     GETTABLE        2 1 -1  ; 1
        2       [4]     GETTABLE        2 2 -2  ; "extents"
        3       [4]     GETTABLE        2 2 -3  ; "left_top"
        4       [4]     GETTABLE        2 2 -4  ; "x"
        5       [5]     GETTABLE        3 1 -1  ; 1
        6       [5]     GETTABLE        3 3 -2  ; "extents"
        7       [5]     GETTABLE        3 3 -3  ; "left_top"
        8       [5]     GETTABLE        3 3 -5  ; "y"
        9       [6]     GETTABLE        4 1 -6  ; 4
        10      [6]     GETTABLE        4 4 -2  ; "extents"
        11      [6]     GETTABLE        4 4 -7  ; "right_bottom"
        12      [6]     GETTABLE        4 4 -4  ; "x"
        13      [7]     GETTABLE        5 1 -6  ; 4
        14      [7]     GETTABLE        5 5 -2  ; "extents"
        15      [7]     GETTABLE        5 5 -7  ; "right_bottom"
        16      [7]     GETTABLE        5 5 -5  ; "y"
        17      [9]     ADD             6 2 3
        18      [9]     ADD             6 6 4
        19      [9]     ADD             6 6 5
        20      [9]     RETURN          6 2
        21      [10]    RETURN          0 1

---

function <test.lua:12,18> (19 instructions at 0x7fffc6a0d6c0)
2 params, 9 slots, 1 upvalue, 7 locals, 7 constants, 0 functions
        1       [13]    GETTABUP        2 0 -1  ; _ENV "table"
        2       [13]    GETTABLE        2 2 -2  ; "unpack"
        3       [14]    MOVE            3 2
        4       [14]    GETTABLE        4 1 -3  ; 1
        5       [14]    GETTABLE        4 4 -4  ; "extents"
        6       [14]    LOADK           5 -3    ; 1
        7       [14]    LOADK           6 -5    ; 2
        8       [14]    CALL            3 4 3
        9       [15]    MOVE            5 2
        10      [15]    GETTABLE        6 1 -6  ; 4
        11      [15]    GETTABLE        6 6 -4  ; "extents"
        12      [15]    LOADK           7 -7    ; 3
        13      [15]    LOADK           8 -6    ; 4
        14      [15]    CALL            5 4 3
        15      [17]    ADD             7 3 4
        16      [17]    ADD             7 7 5
        17      [17]    ADD             7 7 6
        18      [17]    RETURN          7 2
        19      [18]    RETURN          0 1

---

function <test.lua:20,25> (13 instructions at 0x7fffc6a0e410)
2 params, 9 slots, 0 upvalues, 8 locals, 7 constants, 0 functions
        1       [21]    GETTABLE        2 1 -1  ; 1
        2       [21]    GETTABLE        2 2 -2  ; "extents"
        3       [21]    GETTABLE        3 1 -3  ; 4
        4       [21]    GETTABLE        3 3 -2  ; "extents"
        5       [22]    GETTABLE        4 2 -4  ; "ltx"
        6       [22]    GETTABLE        5 2 -5  ; "lty"
        7       [22]    GETTABLE        6 3 -6  ; "rbx"
        8       [22]    GETTABLE        7 3 -7  ; "rby"
        9       [24]    ADD             8 4 5
        10      [24]    ADD             8 8 6
        11      [24]    ADD             8 8 7
        12      [24]    RETURN          8 2
        13      [25]    RETURN          0 1

(more)

6 years ago
(updated 6 years ago)

Next add_proxy:

function QuadtreeNode:add_proxy (proxy)
    if self.children == nil then
        -- print ("self: " .. tostring(self))
        -- print ("proxies: " .. tostring(self.proxies))
        table.insert(self.proxies, proxy)
        if #self.proxies >= MAX_ITEMS_PER_QUADTREE_NODE then
            self:split()
        end
    else
        if proxy.bbox.left_top.y < self.center.y then
            if proxy.bbox.left_top.x < self.center.x then
                self.children[1]:add_proxy(proxy)
            end
            if proxy.bbox.right_bottom.x > self.center.x then
                self.children[2]:add_proxy(proxy)
            end
        end
        if proxy.bbox.right_bottom.y > self.center.y then
            if proxy.bbox.left_top.x < self.center.x then
                self.children[3]:add_proxy(proxy)
            end
            if proxy.bbox.right_bottom.x > self.center.x then
                self.children[4]:add_proxy(proxy)
            end
        end
    end
end

This compiles as:

function <test.lua:3,29> (93 instructions at 0x7fffcc873a80)
1 param, 4 slots, 1 upvalue, 1 local, 19 constants, 0 functions
        1       [4]     GETTABUP        1 0 -1  ; _ENV "self"
        2       [4]     GETTABLE        1 1 -2  ; "children"
        3       [4]     EQ              0 1 -3  ; - nil
        4       [4]     JMP             0 16    ; to 21
        5       [7]     GETTABUP        1 0 -4  ; _ENV "table"
        6       [7]     GETTABLE        1 1 -5  ; "insert"
        7       [7]     GETTABUP        2 0 -1  ; _ENV "self"
        8       [7]     GETTABLE        2 2 -6  ; "proxies"
        9       [7]     MOVE            3 0
        10      [7]     CALL            1 3 1
        11      [8]     GETTABUP        1 0 -1  ; _ENV "self"
        12      [8]     GETTABLE        1 1 -6  ; "proxies"
        13      [8]     LEN             1 1
        14      [8]     GETTABUP        2 0 -7  ; _ENV "MAX_ITEMS_PER_QUADTREE_NODE"
        15      [8]     LE              0 2 1
        16      [8]     JMP             0 76    ; to 93
        17      [9]     GETTABUP        1 0 -1  ; _ENV "self"
        18      [9]     SELF            1 1 -8  ; "split"
        19      [9]     CALL            1 2 1
        20      [10]    JMP             0 72    ; to 93
        21      [12]    GETTABLE        1 0 -9  ; "bbox"
        22      [12]    GETTABLE        1 1 -10 ; "left_top"
        23      [12]    GETTABLE        1 1 -11 ; "y"
        24      [12]    GETTABUP        2 0 -1  ; _ENV "self"
        25      [12]    GETTABLE        2 2 -12 ; "center"
        26      [12]    GETTABLE        2 2 -11 ; "y"
        27      [12]    LT              0 1 2
        28      [12]    JMP             0 28    ; to 57
        29      [13]    GETTABLE        1 0 -9  ; "bbox"
        30      [13]    GETTABLE        1 1 -10 ; "left_top"
        31      [13]    GETTABLE        1 1 -13 ; "x"
        32      [13]    GETTABUP        2 0 -1  ; _ENV "self"
        33      [13]    GETTABLE        2 2 -12 ; "center"
        34      [13]    GETTABLE        2 2 -13 ; "x"
        35      [13]    LT              0 1 2
        36      [13]    JMP             0 6     ; to 43
        37      [14]    GETTABUP        1 0 -1  ; _ENV "self"
        38      [14]    GETTABLE        1 1 -2  ; "children"
        39      [14]    GETTABLE        1 1 -14 ; 1
        40      [14]    SELF            1 1 -15 ; "add_proxy"
        41      [14]    MOVE            3 0
        42      [14]    CALL            1 3 1
        43      [16]    GETTABLE        1 0 -9  ; "bbox"
        44      [16]    GETTABLE        1 1 -16 ; "right_bottom"
        45      [16]    GETTABLE        1 1 -13 ; "x"
        46      [16]    GETTABUP        2 0 -1  ; _ENV "self"
        47      [16]    GETTABLE        2 2 -12 ; "center"
        48      [16]    GETTABLE        2 2 -13 ; "x"
        49      [16]    LT              0 2 1
        50      [16]    JMP             0 6     ; to 57
        51      [17]    GETTABUP        1 0 -1  ; _ENV "self"
        52      [17]    GETTABLE        1 1 -2  ; "children"
        53      [17]    GETTABLE        1 1 -17 ; 2
        54      [17]    SELF            1 1 -15 ; "add_proxy"
        55      [17]    MOVE            3 0
        56      [17]    CALL            1 3 1
        57      [20]    GETTABLE        1 0 -9  ; "bbox"
        58      [20]    GETTABLE        1 1 -16 ; "right_bottom"
        59      [20]    GETTABLE        1 1 -11 ; "y"
        60      [20]    GETTABUP        2 0 -1  ; _ENV "self"
        61      [20]    GETTABLE        2 2 -12 ; "center"
        62      [20]    GETTABLE        2 2 -11 ; "y"
        63      [20]    LT              0 2 1
        64      [20]    JMP             0 28    ; to 93
        65      [21]    GETTABLE        1 0 -9  ; "bbox"
        66      [21]    GETTABLE        1 1 -10 ; "left_top"
        67      [21]    GETTABLE        1 1 -13 ; "x"
        68      [21]    GETTABUP        2 0 -1  ; _ENV "self"
        69      [21]    GETTABLE        2 2 -12 ; "center"
        70      [21]    GETTABLE        2 2 -13 ; "x"
        71      [21]    LT              0 1 2
        72      [21]    JMP             0 6     ; to 79
        73      [22]    GETTABUP        1 0 -1  ; _ENV "self"
        74      [22]    GETTABLE        1 1 -2  ; "children"
        75      [22]    GETTABLE        1 1 -18 ; 3
        76      [22]    SELF            1 1 -15 ; "add_proxy"
        77      [22]    MOVE            3 0
        78      [22]    CALL            1 3 1
        79      [24]    GETTABLE        1 0 -9  ; "bbox"
        80      [24]    GETTABLE        1 1 -16 ; "right_bottom"
        81      [24]    GETTABLE        1 1 -13 ; "x"
        82      [24]    GETTABUP        2 0 -1  ; _ENV "self"
        83      [24]    GETTABLE        2 2 -12 ; "center"
        84      [24]    GETTABLE        2 2 -13 ; "x"
        85      [24]    LT              0 2 1
        86      [24]    JMP             0 6     ; to 93
        87      [25]    GETTABUP        1 0 -1  ; _ENV "self"
        88      [25]    GETTABLE        1 1 -2  ; "children"
        89      [25]    GETTABLE        1 1 -19 ; 4
        90      [25]    SELF            1 1 -15 ; "add_proxy"
        91      [25]    MOVE            3 0
        92      [25]    CALL            1 3 1
        93      [29]    RETURN          0 1

(more)

6 years ago
(updated 6 years ago)

which is pretty damn expensive, compared to

function qtn:add_proxy2 (proxy)
    local children = self.children
    if not children then
        -- print ("self: " .. tostring(self))
        -- print ("proxies: " .. tostring(self.proxies))
        table.insert(self.proxies, proxy)
        if #self.proxies >= MAX_ITEMS_PER_QUADTREE_NODE then
            self:split()
        end
    else
        bbox, center = proxy.bbox, cself.center
        local cx, cy = center[1], center[2]
        if bbox[2] < cy then
            if bbox[1] < cx then
                children[1]:add_proxy(proxy)
            end
            if bbox[3] > cx then
                children[2]:add_proxy(proxy)
            end
        end
        if bbox[4] > cy[2] then
            if bbox[1] < cx then
                children[3]:add_proxy(proxy)
            end
            if bbox[3] > cx then
                children[4]:add_proxy(proxy)
            end
        end
    end
end

which compiles down to 67 much cheaper instructions:

function <test.lua:31,60> (67 instructions at 0x7fffcc875380)
2 params, 8 slots, 1 upvalue, 5 locals, 14 constants, 0 functions
        1       [32]    GETTABLE        2 0 -1  ; "children"
        2       [33]    TEST            2 1
        3       [33]    JMP             0 13    ; to 17
        4       [36]    GETTABUP        3 0 -2  ; _ENV "table"
        5       [36]    GETTABLE        3 3 -3  ; "insert"
        6       [36]    GETTABLE        4 0 -4  ; "proxies"
        7       [36]    MOVE            5 1
        8       [36]    CALL            3 3 1
        9       [37]    GETTABLE        3 0 -4  ; "proxies"
        10      [37]    LEN             3 3
        11      [37]    GETTABUP        4 0 -5  ; _ENV "MAX_ITEMS_PER_QUADTREE_NODE"
        12      [37]    LE              0 4 3
        13      [37]    JMP             0 53    ; to 67
        14      [38]    SELF            3 0 -6  ; "split"
        15      [38]    CALL            3 2 1
        16      [39]    JMP             0 50    ; to 67
        17      [41]    GETTABLE        3 1 -7  ; "bbox"
        18      [41]    GETTABUP        4 0 -9  ; _ENV "cself"
        19      [41]    GETTABLE        4 4 -8  ; "center"
        20      [41]    SETTABUP        0 -8 4  ; _ENV "center"
        21      [41]    SETTABUP        0 -7 3  ; _ENV "bbox"
        22      [42]    GETTABUP        3 0 -8  ; _ENV "center"
        23      [42]    GETTABLE        3 3 -10 ; 1
        24      [42]    GETTABUP        4 0 -8  ; _ENV "center"
        25      [42]    GETTABLE        4 4 -11 ; 2
        26      [43]    GETTABUP        5 0 -7  ; _ENV "bbox"
        27      [43]    GETTABLE        5 5 -11 ; 2
        28      [43]    LT              0 5 4
        29      [43]    JMP             0 16    ; to 46
        30      [44]    GETTABUP        5 0 -7  ; _ENV "bbox"
        31      [44]    GETTABLE        5 5 -10 ; 1
        32      [44]    LT              0 5 3
        33      [44]    JMP             0 4     ; to 38
        34      [45]    GETTABLE        5 2 -10 ; 1
        35      [45]    SELF            5 5 -12 ; "add_proxy"
        36      [45]    MOVE            7 1
        37      [45]    CALL            5 3 1
        38      [47]    GETTABUP        5 0 -7  ; _ENV "bbox"
        39      [47]    GETTABLE        5 5 -13 ; 3
        40      [47]    LT              0 3 5
        41      [47]    JMP             0 4     ; to 46
        42      [48]    GETTABLE        5 2 -11 ; 2
        43      [48]    SELF            5 5 -12 ; "add_proxy"
        44      [48]    MOVE            7 1
        45      [48]    CALL            5 3 1
        46      [51]    GETTABUP        5 0 -7  ; _ENV "bbox"
        47      [51]    GETTABLE        5 5 -14 ; 4
        48      [51]    GETTABLE        6 4 -11 ; 2
        49      [51]    LT              0 6 5
        50      [51]    JMP             0 16    ; to 67
        51      [52]    GETTABUP        5 0 -7  ; _ENV "bbox"
        52      [52]    GETTABLE        5 5 -10 ; 1
        53      [52]    LT              0 5 3
        54      [52]    JMP             0 4     ; to 59
        55      [53]    GETTABLE        5 2 -13 ; 3
        56      [53]    SELF            5 5 -12 ; "add_proxy"
        57      [53]    MOVE            7 1
        58      [53]    CALL            5 3 1
        59      [55]    GETTABUP        5 0 -7  ; _ENV "bbox"
        60      [55]    GETTABLE        5 5 -13 ; 3
        61      [55]    LT              0 3 5
        62      [55]    JMP             0 4     ; to 67
        63      [56]    GETTABLE        5 2 -14 ; 4
        64      [56]    SELF            5 5 -12 ; "add_proxy"
        65      [56]    MOVE            7 1
        66      [56]    CALL            5 3 1
        67      [60]    RETURN          0 1

(more)

6 years ago
(updated 6 years ago)

rebuild_metatables and rebuild_index could both benefit from captures.

split would gain hugely from the simpler bbox representation and some captures,

you can get a 700% speed increase in check_proxy_collision real easily.

And all the above logic applies to quadtree expand, but you might also want to consider:

    local new_children = {}   -- generates:         1       [4]     NEWTABLE        0 0 0
    new_children[1] = { 1, 2, 3, 4 }
    new_children[2] = { 3, 4, 5, 6 }
    new_children[3] = { 5, 6, 7, 8 }
    new_children[4] = { 7, 8, 9, 10 }

is significantly more expensive than

    local new_children = { nil, nil, nil, nil }
    new_children[1] = { 1, 2, 3, 4 }
    new_children[2] = { 3, 4, 5, 6 }
    new_children[3] = { 5, 6, 7, 8 }
    new_children[4] = { 7, 8, 9, 10 }

the second costs 2 more instructions, which pre-allocate the new_children array. That allocation is going to happen anyway, but it's going to happy four times in your version, instead of once in the new one.

Note that 'self' is a special case, lua knows that it refers to the current object so you don't need to do the pythonism of

    local x = self.x

-Oliver

New response