Ghost Scanner

4 years ago

(updated 4 years ago)

Hi,
I just had to write a similar script for my new "Market - AutoBuy" mod and taking a look at your mod was very helpful.
But i was able to shave off 50% performance impact with a few tweaks:
- I merged most of the find_entities_filtered calls into one that searches for all of the types
- I created a local database to avoid items_to_place_this calls (made a huge difference)
I also iterated over construction robots in the cell to subtract their contents from the requests
Regards, Ownly :)

  local items_to_place_this = {}

  function get_items_to_place_this(prototype)
    items_to_place_this[prototype.name] = prototype.items_to_place_this
    return items_to_place_this[prototype.name]
  end

  function get_cell_requests(cell, force)
    local result_limit = 500
    local pos = cell.owner.position
    local r = cell.construction_radius
    local requested_items = {}

    if r > 0 then
      local bounds = { { pos.x - r, pos.y - r, }, { pos.x + r, pos.y + r } }

      local entities = cell.owner.surface.find_entities_filtered{area=bounds, limit=result_limit, to_be_upgraded=true, force=force}
      for _, e in pairs(entities) do
        local upgrade_prototype = e.get_upgrade_target()
        if upgrade_prototype then
          for _, item_stack in pairs(items_to_place_this[upgrade_prototype.name] or get_items_to_place_this(upgrade_prototype)) do
            local name = item_stack.name
            requested_items[name] = (requested_items[name] or 0) + item_stack.count
          end
        end
      end
      local entities = cell.owner.surface.find_entities_filtered{area=bounds, limit=result_limit, type={"entity-ghost","item-request-proxy","tile-ghost","construction-robot"}, force=force}
      for _, e in pairs(entities) do
        local type = e.type
        if type == "entity-ghost" then
          for _, item_stack in pairs(items_to_place_this[e.ghost_name] or get_items_to_place_this(e.ghost_prototype)) do
            local name = item_stack.name
            requested_items[name] = (requested_items[name] or 0) + item_stack.count
          end
          for name, count in pairs(e.item_requests) do
            requested_items[name] = (requested_items[name] or 0) + count
          end
        elseif type == "item-request-proxy" then
          for request_item, count in pairs(e.item_requests) do
            requested_items[name] = (requested_items[name] or 0) + count
          end
        elseif type == "tile-ghost" then
          for _, item_stack in pairs(items_to_place_this[e.ghost_name] or get_items_to_place_this(e.ghost_prototype)) do
            local name = item_stack.name
            requested_items[name] = (requested_items[name] or 0) + item_stack.count
          end
        elseif type == "construction-robot" then
          for name, count in pairs(e.get_inventory(defines.inventory.robot_cargo).get_contents()) do
            requested_items[name] = (requested_items[name] or 0) - count
          end
        end
      end
    end
    for a,b in pairs(requested_items) do
      if b <= 0 then
        requested_items[a] = nil
      end
    end
    return requested_items
  end

Optera

4 years ago

(updated 4 years ago)

How big a difference are we talking here for each change?
When I ran the original benchmarks, calling find_entities_filtered for each type was 0.008ms faster.
items_to_place_this should be a fairly fast lookup, but caching probably wouldn't hurt.

OwnlyMe

4 years ago

(updated 4 years ago)

merging the types gave me up to (at best) 10% better performance, but not accessing the prototypes every time was almost another 40%, so instead of 0.140+ ups i was running at 0.080-0.090 (including the other script load of my mod)
i think the game handles most of it's data like metatables, so as soon as you access something from the game it needs to generate that information for you, that's why i store as much as possible in local variables, even if it's just accessed twice

Optera

4 years ago

(updated 4 years ago)

I'd be interested in how you measured those improvements.
On a test map with ~3k ghosts average time/tick is reduced from 0.145 to 0.139 (evaluation in function) or 0.134 (inline evaluation as in your snippet). In percent that's a 4.5% - 7.6% improvement. Still good but nowhere near 40%.

Sidenote:
You should run into desyncs with a local lookup like that.
New clients with an empty lookup table would run a different code than clients with filled tables.

Sidenote 2:
When finding all entities at once you are unable to control exactly which types are processed first.
Either you have to search without limit and limit in lua which will be slower, or search with limit and might get a request list containing modules but not the entity to put them in.

OwnlyMe

4 years ago

i'm not sure, but i thought it only desynced when the game states didn't match anymore, which they shouldn't - only that new clients need longer to process... but i might test it later.
So you were not getting a big improvement from caching the prototypes? i'm interested to see how your new code looks like, because that seems weird

Optera

4 years ago

(updated 4 years ago)

Caching items_to_place_this is already merged to master on github.
I ended up using a get and set function, marginally slower than inline or evaluation but a lot more readable.

PS: Would be nice if you mentioned me when taking inspiration from/copying my code.

OwnlyMe

4 years ago

(updated 4 years ago)

sure, i added you and your mod to the credits
i'll take a look at your git page, but i tested it once more with a copy of your code vs my code and it's almost halved with a destroyed mining site and 6 calls per second (0.340 vs 0.190) -0.017 from the rest of the code
(this is the slow version of my code:)

function get_cell_requests(cell, force)
    local result_limit = 500
    local pos = cell.owner.position
    local r = cell.construction_radius
    local requested_items = {}
    local found_entities = {}
    function add_signal(name, count)
        requested_items[name] = (requested_items[name] or 0) +count
    end

    if r > 0 then
      local bounds = { { pos.x - r, pos.y - r, }, { pos.x + r, pos.y + r } }
      -- finding each entity by itself is slightly (0.008ms) faster than finding all and selecting later

      -- upgrade requests (requires 0.17.69)
      local entities = cell.owner.surface.find_entities_filtered{area=bounds, limit=result_limit, to_be_upgraded=true, force=force}
      local count_unique_entities = 0
      for _, e in pairs(entities) do
        local uid = e.unit_number
        local upgrade_prototype = e.get_upgrade_target()
        if not found_entities[uid] and upgrade_prototype then
          found_entities[uid] = true

          for _, item_stack in pairs(upgrade_prototype.items_to_place_this) do
            add_signal(item_stack.name, item_stack.count)
            count_unique_entities = count_unique_entities + item_stack.count
          end
        end
      end
      -- log("found "..tostring(count_unique_entities).."/"..tostring(result_limit).." upgrade requests." )
      if MaxResults then
        result_limit = result_limit - count_unique_entities
        if result_limit <= 0 then return requested_items end
      end

      -- entity-ghost knows items_to_place_this and item_requests (modules)
      local entities = cell.owner.surface.find_entities_filtered{area=bounds, limit=result_limit, type="entity-ghost", force=force}
      local count_unique_entities = 0
      for _, e in pairs(entities) do
        local uid = e.unit_number
        if not found_entities[uid] then
          found_entities[uid] = true

          for _, item_stack in pairs(e.ghost_prototype.items_to_place_this) do
            add_signal(item_stack.name, item_stack.count)
            count_unique_entities = count_unique_entities + item_stack.count
          end

          for request_item, count in pairs(e.item_requests) do
            add_signal(request_item, count)
            count_unique_entities = count_unique_entities + count
          end
        end
      end
      -- log("found "..tostring(count_unique_entities).."/"..tostring(result_limit).." ghosts." )
      if MaxResults then
        result_limit = result_limit - count_unique_entities
        if result_limit <= 0 then return requested_items end
      end

      -- item-request-proxy holds item_requests (modules) for built entities
      local entities = cell.owner.surface.find_entities_filtered{area=bounds, limit=result_limit, type="item-request-proxy", force=force}
      local count_unique_entities = 0
      for _, e in pairs(entities) do
        local uid = e.proxy_target.unit_number
        if not found_entities[uid] then
          found_entities[uid] = true
          for request_item, count in pairs(e.item_requests) do
            add_signal(request_item, count)
            count_unique_entities = count_unique_entities + count
          end
        end
      end
      -- log("found "..tostring(count_unique_entities).."/"..tostring(result_limit).." request proxies." )
      if MaxResults then
        result_limit = result_limit - count_unique_entities
        if result_limit <= 0 then return requested_items end
      end

      -- tile-ghost knows only items_to_place_this
      local entities = cell.owner.surface.find_entities_filtered{area=bounds, limit=result_limit, type="tile-ghost", force=force}
      local count_unique_entities = 0
      for _, e in pairs(entities) do
        local uid = e.unit_number
        if not found_entities[uid] then
          found_entities[uid] = true

          -- add_signal(next(e.ghost_prototype.items_to_place_this), 1)
          for _, item_stack in pairs(e.ghost_prototype.items_to_place_this) do
            add_signal(item_stack.name, item_stack.count)
            count_unique_entities = count_unique_entities + item_stack.count
          end
        end
      end
      -- log("found "..tostring(count_unique_entities).."/"..tostring(result_limit).." tile-ghosts." )
      if MaxResults then
        result_limit = result_limit - count_unique_entities
        if result_limit <= 0 then return requested_items end
      end

    end
    return requested_items
end

Optera

4 years ago

I still don't get your test procedure.
Mine consists of a save with ghosts and upgrades ran with -benchmark option against different versions of Ghost Scanner.
Here's the results in docs: https://docs.google.com/spreadsheets/d/1nDwTh1nlD6fsOlhRz2xXU8oUOOA0LcRQPMRPCIORJUc/edit?usp=sharing

OwnlyMe

4 years ago

(updated 4 years ago)

you're still acessing the prototype every time, even when you're just checking for it's name.
if you change

get_items_to_place(e.ghost_prototype)

to

global.Lookup_items_to_place_this[e.ghost_name] or get_items_to_place(e.ghost_prototype)

you'll already see a huge improvement

Optera

4 years ago

(updated 4 years ago)

Like i said while inline or is faster it's not maintainable.
I write my code so even in several years I can instantly get what it does without thinking in depth on how I might have thought and optimized back then.

OwnlyMe

4 years ago

the difference is that i query with the ghost_name instead of ghost_prototype.name

Optera

4 years ago

(updated 4 years ago)

The difference is marginal.
If i recall correctly from chatting with rseding, accessing name and type properties don't need to do a full object creation.
My benchmarks seem to confirm that as well.

OwnlyMe

4 years ago

try it yourself, i'm getting a difference of 0.180 vs 0.240 from just changing it for entity-ghosts and tile-ghosts

Optera

4 years ago

I did, compare inline vs get/set function.

OwnlyMe

4 years ago

even when i'm putting it in a function i still get a big difference

local function get_items_to_place(prototype_name) -- slower than inline check, but much more readable
  return global.Lookup_items_to_place_this[prototype_name] or add_items_to_place(game.entity_prototypes[prototype_name])
end

and passing e.ghost_name for tile-ghosts and entity-ghosts and upgrade_prototype.name for the to_be_upgraded part

Optera

4 years ago

(updated 4 years ago)

Ok now I understand.
Seems like I used e.ghost_prototype.name in my benchmarks.
Switching to e.ghost_name the difference is indeed huge 0.109 with inline evaluation and 0.112 with get/set functions.

OwnlyMe

4 years ago

(updated 4 years ago)

yep.
the rest of my performance gains are hard to isolate, probably a combination of everything.. but there's still quite some performance to gain (even with global variables) (0.240 vs 0.400 with a huge lot of ghosts)
with the code i first sent you
but i'll try to list the changes and each performance increase

Optera

4 years ago

(updated 4 years ago)

Like I said before, gains from running find_entities_filtered only once will prove detrimental if you have to find 100k entities first and then filter down to 1k in lua compared to simply skipping following find_entities after reaching result_limit.

Tweaks to use ghost_name are in master.

OwnlyMe

4 years ago

(updated 4 years ago)

skipping the whole unique part yielded me 0.080 ups (from 450)
making the item_to_place this lookup inline yielded at best 0.005 ups
counting up the items inline yielded at best another 0.005 ups
not counting the items: 0.010 ups
2 find_entities_filtered calls instead of 4: 0.060 ups

but i'm still not sure how necessary that unique part is, when i destroyed my miners that were all equipped with modules it didn't find a single dublicate.. maybe because i'm just using 1 cell
if it's purpose is just for counting the results you could just use "#entities"

Optera

4 years ago

(updated 4 years ago)

By unique part you mean found_entities[uid] ?
How could you correctly count requests without making sure everything is only counted once?
When finding ghosts in each logistics cell, aka roboport coverage area, there are bound to be overlaps. Counting 20 nuclear reactors because 20 roboports see that one ghost would be quite silly.

OwnlyMe

4 years ago

i see, then i guess there's not much more to optimize

Optera

4 years ago

Then I'll release that and call it a day. thanks for the help.

OwnlyMe

4 years ago

you're welcome and thanks for the inspiration :)

g Optimizations

New response