Large total conversion mods.
Wife and I have a local lan game we've been working on for about a month. After taking subterra for a spin in a sandbox I tried adding it to my local copy of the game. It appears to get stuck in the initial migration. Maybe because of the amount of revealed map it has to populate?
I copied it on the Linux VM on my i7 NUC that hosts the lan game, and had the same problem.
A quick watch of top tells me that without subterra, factorio is using 737MB of ram. But if it triggers the subterra migration, it rapidly consumes all 4G of ram the machine has.
Some lua feedback:
function remove_value(t, value)
local index
for i, v in pairs(t) do
if value == v then
index = i
break
end
end
if index then
table.remove(t, index)
return true
end
return false
end
is cleaner as:
function remove_value(t, value)
for i, v in pairs(t) do
if value == v then
table.remove(t, i)
return true
end
end
return false
end
and
for chunk in surface.get_chunks() do
minx = math.min(minx, chunk.x)
miny = math.min(miny, chunk.y)
maxx = math.max(maxx, chunk.x)
maxy = math.max(maxy, chunk.y)
end
is about 120x slower (notr 1.2x) than:
local min, max = math.min, math.max
for chunk in surface.get_chunks() do
minx = min(minx, chunk.x)
miny = min(miny, chunk.y)
maxx = max(maxx, chunk.x)
maxy = max(maxy, chunk.y)
end
if counts[n] then
counts[n] = counts[n] + 1
events[n][counts[n]] = event.callback
else
counts[n] = 1
events[n] = {}
events[n][1] = event.callback
end
could be
local count = counts[n] or 0
if count == 0 then
count = 1
events[n] = {}
else
count = count + 1
end
counts[n] = count
events[n][count] = event.callback
Capture the 'script.on_nth_tick' here, and possibly counts[n]:
for n, callbacks in pairs(events) do
if counts[n] == 1 then
script.on_nth_tick(n, callbacks[1])
print("Wired nth tick event for n = " .. n)
elseif counts[n] > 1 then
script.on_nth_tick(n,
function(event)
for _, callback in pairs(callbacks) do
callback(event)
end
end)
end
end
Is the double insertion of the event callback here intentional? If so, a comment might be a good idea:
table.insert(events[id], event.callback)
table.insert(events[id], event.callback)
counts[id] = counts[id] + 1
Is the "bounding box" object your own or Factorios? It's a kind of expensive representation :(
bb = {
left_top = { x = ..., y = ... },
right_bottom = { x = ..., y = ... },
}
either
bb = { ltx, lty, rbx, rby }
or at least
bb = { ltx = ..., lty = ..., rbx = ..., rby = ... }
would be significantly more efficient in terms of memory and cache coherency.
function QuadtreeNode:new_root (new_children)
local left = new_children[1].extents.left_top.x
local top = new_children[1].extents.left_top.y
local right = new_children[4].extents.right_bottom.x
local bottom = new_children[4].extents.right_bottom.y
OUCH! If we take the following:
qtn = { __index = qtn }
function qtn:new_root (new_children)
local left = new_children[1].extents.left_top.x
local top = new_children[1].extents.left_top.y
local right = new_children[4].extents.right_bottom.x
local bottom = new_children[4].extents.right_bottom.y
return left + top + right + bottom -- stop compiler being too clever
end
function qtn:alt_root (new_children)
local slice = table.unpack
local left, top = slice(new_children[1].extents, 1, 2)
local right, bottom = slice(new_children[4].extents, 3, 4)
return left + top + right + bottom
end
function qtn:alt_root2 (new_children)
local lt, rb = new_children[1].extents, new_children[4].extents
local left, top, right, bottom = lt.ltx, lt.lty, rb.rbx, rb.rby
return left + top + right + bottom
end
and compile it with "luac -l -l -s -v" we can see that each of the above has different costs, with the first having the most and the most expensive:
2 params, 7 slots, 0 upvalues, 6 locals, 7 constants, 0 functions
1 [4] GETTABLE 2 1 -1 ; 1
2 [4] GETTABLE 2 2 -2 ; "extents"
3 [4] GETTABLE 2 2 -3 ; "left_top"
4 [4] GETTABLE 2 2 -4 ; "x"
5 [5] GETTABLE 3 1 -1 ; 1
6 [5] GETTABLE 3 3 -2 ; "extents"
7 [5] GETTABLE 3 3 -3 ; "left_top"
8 [5] GETTABLE 3 3 -5 ; "y"
9 [6] GETTABLE 4 1 -6 ; 4
10 [6] GETTABLE 4 4 -2 ; "extents"
11 [6] GETTABLE 4 4 -7 ; "right_bottom"
12 [6] GETTABLE 4 4 -4 ; "x"
13 [7] GETTABLE 5 1 -6 ; 4
14 [7] GETTABLE 5 5 -2 ; "extents"
15 [7] GETTABLE 5 5 -7 ; "right_bottom"
16 [7] GETTABLE 5 5 -5 ; "y"
17 [9] ADD 6 2 3
18 [9] ADD 6 6 4
19 [9] ADD 6 6 5
20 [9] RETURN 6 2
21 [10] RETURN 0 1
---
function <test.lua:12,18> (19 instructions at 0x7fffc6a0d6c0)
2 params, 9 slots, 1 upvalue, 7 locals, 7 constants, 0 functions
1 [13] GETTABUP 2 0 -1 ; _ENV "table"
2 [13] GETTABLE 2 2 -2 ; "unpack"
3 [14] MOVE 3 2
4 [14] GETTABLE 4 1 -3 ; 1
5 [14] GETTABLE 4 4 -4 ; "extents"
6 [14] LOADK 5 -3 ; 1
7 [14] LOADK 6 -5 ; 2
8 [14] CALL 3 4 3
9 [15] MOVE 5 2
10 [15] GETTABLE 6 1 -6 ; 4
11 [15] GETTABLE 6 6 -4 ; "extents"
12 [15] LOADK 7 -7 ; 3
13 [15] LOADK 8 -6 ; 4
14 [15] CALL 5 4 3
15 [17] ADD 7 3 4
16 [17] ADD 7 7 5
17 [17] ADD 7 7 6
18 [17] RETURN 7 2
19 [18] RETURN 0 1
---
function <test.lua:20,25> (13 instructions at 0x7fffc6a0e410)
2 params, 9 slots, 0 upvalues, 8 locals, 7 constants, 0 functions
1 [21] GETTABLE 2 1 -1 ; 1
2 [21] GETTABLE 2 2 -2 ; "extents"
3 [21] GETTABLE 3 1 -3 ; 4
4 [21] GETTABLE 3 3 -2 ; "extents"
5 [22] GETTABLE 4 2 -4 ; "ltx"
6 [22] GETTABLE 5 2 -5 ; "lty"
7 [22] GETTABLE 6 3 -6 ; "rbx"
8 [22] GETTABLE 7 3 -7 ; "rby"
9 [24] ADD 8 4 5
10 [24] ADD 8 8 6
11 [24] ADD 8 8 7
12 [24] RETURN 8 2
13 [25] RETURN 0 1
(more)
Next add_proxy:
function QuadtreeNode:add_proxy (proxy)
if self.children == nil then
-- print ("self: " .. tostring(self))
-- print ("proxies: " .. tostring(self.proxies))
table.insert(self.proxies, proxy)
if #self.proxies >= MAX_ITEMS_PER_QUADTREE_NODE then
self:split()
end
else
if proxy.bbox.left_top.y < self.center.y then
if proxy.bbox.left_top.x < self.center.x then
self.children[1]:add_proxy(proxy)
end
if proxy.bbox.right_bottom.x > self.center.x then
self.children[2]:add_proxy(proxy)
end
end
if proxy.bbox.right_bottom.y > self.center.y then
if proxy.bbox.left_top.x < self.center.x then
self.children[3]:add_proxy(proxy)
end
if proxy.bbox.right_bottom.x > self.center.x then
self.children[4]:add_proxy(proxy)
end
end
end
end
This compiles as:
function <test.lua:3,29> (93 instructions at 0x7fffcc873a80)
1 param, 4 slots, 1 upvalue, 1 local, 19 constants, 0 functions
1 [4] GETTABUP 1 0 -1 ; _ENV "self"
2 [4] GETTABLE 1 1 -2 ; "children"
3 [4] EQ 0 1 -3 ; - nil
4 [4] JMP 0 16 ; to 21
5 [7] GETTABUP 1 0 -4 ; _ENV "table"
6 [7] GETTABLE 1 1 -5 ; "insert"
7 [7] GETTABUP 2 0 -1 ; _ENV "self"
8 [7] GETTABLE 2 2 -6 ; "proxies"
9 [7] MOVE 3 0
10 [7] CALL 1 3 1
11 [8] GETTABUP 1 0 -1 ; _ENV "self"
12 [8] GETTABLE 1 1 -6 ; "proxies"
13 [8] LEN 1 1
14 [8] GETTABUP 2 0 -7 ; _ENV "MAX_ITEMS_PER_QUADTREE_NODE"
15 [8] LE 0 2 1
16 [8] JMP 0 76 ; to 93
17 [9] GETTABUP 1 0 -1 ; _ENV "self"
18 [9] SELF 1 1 -8 ; "split"
19 [9] CALL 1 2 1
20 [10] JMP 0 72 ; to 93
21 [12] GETTABLE 1 0 -9 ; "bbox"
22 [12] GETTABLE 1 1 -10 ; "left_top"
23 [12] GETTABLE 1 1 -11 ; "y"
24 [12] GETTABUP 2 0 -1 ; _ENV "self"
25 [12] GETTABLE 2 2 -12 ; "center"
26 [12] GETTABLE 2 2 -11 ; "y"
27 [12] LT 0 1 2
28 [12] JMP 0 28 ; to 57
29 [13] GETTABLE 1 0 -9 ; "bbox"
30 [13] GETTABLE 1 1 -10 ; "left_top"
31 [13] GETTABLE 1 1 -13 ; "x"
32 [13] GETTABUP 2 0 -1 ; _ENV "self"
33 [13] GETTABLE 2 2 -12 ; "center"
34 [13] GETTABLE 2 2 -13 ; "x"
35 [13] LT 0 1 2
36 [13] JMP 0 6 ; to 43
37 [14] GETTABUP 1 0 -1 ; _ENV "self"
38 [14] GETTABLE 1 1 -2 ; "children"
39 [14] GETTABLE 1 1 -14 ; 1
40 [14] SELF 1 1 -15 ; "add_proxy"
41 [14] MOVE 3 0
42 [14] CALL 1 3 1
43 [16] GETTABLE 1 0 -9 ; "bbox"
44 [16] GETTABLE 1 1 -16 ; "right_bottom"
45 [16] GETTABLE 1 1 -13 ; "x"
46 [16] GETTABUP 2 0 -1 ; _ENV "self"
47 [16] GETTABLE 2 2 -12 ; "center"
48 [16] GETTABLE 2 2 -13 ; "x"
49 [16] LT 0 2 1
50 [16] JMP 0 6 ; to 57
51 [17] GETTABUP 1 0 -1 ; _ENV "self"
52 [17] GETTABLE 1 1 -2 ; "children"
53 [17] GETTABLE 1 1 -17 ; 2
54 [17] SELF 1 1 -15 ; "add_proxy"
55 [17] MOVE 3 0
56 [17] CALL 1 3 1
57 [20] GETTABLE 1 0 -9 ; "bbox"
58 [20] GETTABLE 1 1 -16 ; "right_bottom"
59 [20] GETTABLE 1 1 -11 ; "y"
60 [20] GETTABUP 2 0 -1 ; _ENV "self"
61 [20] GETTABLE 2 2 -12 ; "center"
62 [20] GETTABLE 2 2 -11 ; "y"
63 [20] LT 0 2 1
64 [20] JMP 0 28 ; to 93
65 [21] GETTABLE 1 0 -9 ; "bbox"
66 [21] GETTABLE 1 1 -10 ; "left_top"
67 [21] GETTABLE 1 1 -13 ; "x"
68 [21] GETTABUP 2 0 -1 ; _ENV "self"
69 [21] GETTABLE 2 2 -12 ; "center"
70 [21] GETTABLE 2 2 -13 ; "x"
71 [21] LT 0 1 2
72 [21] JMP 0 6 ; to 79
73 [22] GETTABUP 1 0 -1 ; _ENV "self"
74 [22] GETTABLE 1 1 -2 ; "children"
75 [22] GETTABLE 1 1 -18 ; 3
76 [22] SELF 1 1 -15 ; "add_proxy"
77 [22] MOVE 3 0
78 [22] CALL 1 3 1
79 [24] GETTABLE 1 0 -9 ; "bbox"
80 [24] GETTABLE 1 1 -16 ; "right_bottom"
81 [24] GETTABLE 1 1 -13 ; "x"
82 [24] GETTABUP 2 0 -1 ; _ENV "self"
83 [24] GETTABLE 2 2 -12 ; "center"
84 [24] GETTABLE 2 2 -13 ; "x"
85 [24] LT 0 2 1
86 [24] JMP 0 6 ; to 93
87 [25] GETTABUP 1 0 -1 ; _ENV "self"
88 [25] GETTABLE 1 1 -2 ; "children"
89 [25] GETTABLE 1 1 -19 ; 4
90 [25] SELF 1 1 -15 ; "add_proxy"
91 [25] MOVE 3 0
92 [25] CALL 1 3 1
93 [29] RETURN 0 1
(more)
which is pretty damn expensive, compared to
function qtn:add_proxy2 (proxy)
local children = self.children
if not children then
-- print ("self: " .. tostring(self))
-- print ("proxies: " .. tostring(self.proxies))
table.insert(self.proxies, proxy)
if #self.proxies >= MAX_ITEMS_PER_QUADTREE_NODE then
self:split()
end
else
bbox, center = proxy.bbox, cself.center
local cx, cy = center[1], center[2]
if bbox[2] < cy then
if bbox[1] < cx then
children[1]:add_proxy(proxy)
end
if bbox[3] > cx then
children[2]:add_proxy(proxy)
end
end
if bbox[4] > cy[2] then
if bbox[1] < cx then
children[3]:add_proxy(proxy)
end
if bbox[3] > cx then
children[4]:add_proxy(proxy)
end
end
end
end
which compiles down to 67 much cheaper instructions:
function <test.lua:31,60> (67 instructions at 0x7fffcc875380)
2 params, 8 slots, 1 upvalue, 5 locals, 14 constants, 0 functions
1 [32] GETTABLE 2 0 -1 ; "children"
2 [33] TEST 2 1
3 [33] JMP 0 13 ; to 17
4 [36] GETTABUP 3 0 -2 ; _ENV "table"
5 [36] GETTABLE 3 3 -3 ; "insert"
6 [36] GETTABLE 4 0 -4 ; "proxies"
7 [36] MOVE 5 1
8 [36] CALL 3 3 1
9 [37] GETTABLE 3 0 -4 ; "proxies"
10 [37] LEN 3 3
11 [37] GETTABUP 4 0 -5 ; _ENV "MAX_ITEMS_PER_QUADTREE_NODE"
12 [37] LE 0 4 3
13 [37] JMP 0 53 ; to 67
14 [38] SELF 3 0 -6 ; "split"
15 [38] CALL 3 2 1
16 [39] JMP 0 50 ; to 67
17 [41] GETTABLE 3 1 -7 ; "bbox"
18 [41] GETTABUP 4 0 -9 ; _ENV "cself"
19 [41] GETTABLE 4 4 -8 ; "center"
20 [41] SETTABUP 0 -8 4 ; _ENV "center"
21 [41] SETTABUP 0 -7 3 ; _ENV "bbox"
22 [42] GETTABUP 3 0 -8 ; _ENV "center"
23 [42] GETTABLE 3 3 -10 ; 1
24 [42] GETTABUP 4 0 -8 ; _ENV "center"
25 [42] GETTABLE 4 4 -11 ; 2
26 [43] GETTABUP 5 0 -7 ; _ENV "bbox"
27 [43] GETTABLE 5 5 -11 ; 2
28 [43] LT 0 5 4
29 [43] JMP 0 16 ; to 46
30 [44] GETTABUP 5 0 -7 ; _ENV "bbox"
31 [44] GETTABLE 5 5 -10 ; 1
32 [44] LT 0 5 3
33 [44] JMP 0 4 ; to 38
34 [45] GETTABLE 5 2 -10 ; 1
35 [45] SELF 5 5 -12 ; "add_proxy"
36 [45] MOVE 7 1
37 [45] CALL 5 3 1
38 [47] GETTABUP 5 0 -7 ; _ENV "bbox"
39 [47] GETTABLE 5 5 -13 ; 3
40 [47] LT 0 3 5
41 [47] JMP 0 4 ; to 46
42 [48] GETTABLE 5 2 -11 ; 2
43 [48] SELF 5 5 -12 ; "add_proxy"
44 [48] MOVE 7 1
45 [48] CALL 5 3 1
46 [51] GETTABUP 5 0 -7 ; _ENV "bbox"
47 [51] GETTABLE 5 5 -14 ; 4
48 [51] GETTABLE 6 4 -11 ; 2
49 [51] LT 0 6 5
50 [51] JMP 0 16 ; to 67
51 [52] GETTABUP 5 0 -7 ; _ENV "bbox"
52 [52] GETTABLE 5 5 -10 ; 1
53 [52] LT 0 5 3
54 [52] JMP 0 4 ; to 59
55 [53] GETTABLE 5 2 -13 ; 3
56 [53] SELF 5 5 -12 ; "add_proxy"
57 [53] MOVE 7 1
58 [53] CALL 5 3 1
59 [55] GETTABUP 5 0 -7 ; _ENV "bbox"
60 [55] GETTABLE 5 5 -13 ; 3
61 [55] LT 0 3 5
62 [55] JMP 0 4 ; to 67
63 [56] GETTABLE 5 2 -14 ; 4
64 [56] SELF 5 5 -12 ; "add_proxy"
65 [56] MOVE 7 1
66 [56] CALL 5 3 1
67 [60] RETURN 0 1
(more)
rebuild_metatables and rebuild_index could both benefit from captures.
split
would gain hugely from the simpler bbox representation and some captures,
you can get a 700% speed increase in check_proxy_collision
real easily.
And all the above logic applies to quadtree expand, but you might also want to consider:
local new_children = {} -- generates: 1 [4] NEWTABLE 0 0 0
new_children[1] = { 1, 2, 3, 4 }
new_children[2] = { 3, 4, 5, 6 }
new_children[3] = { 5, 6, 7, 8 }
new_children[4] = { 7, 8, 9, 10 }
is significantly more expensive than
local new_children = { nil, nil, nil, nil }
new_children[1] = { 1, 2, 3, 4 }
new_children[2] = { 3, 4, 5, 6 }
new_children[3] = { 5, 6, 7, 8 }
new_children[4] = { 7, 8, 9, 10 }
the second costs 2 more instructions, which pre-allocate the new_children array. That allocation is going to happen anyway, but it's going to happy four times in your version, instead of once in the new one.
Note that 'self' is a special case, lua knows that it refers to the current object so you don't need to do the pythonism of
local x = self.x
-Oliver