Contents
- 1 The quiet cost of arrays in PHP
- 2 Why PHP arrays feel so friendly (and why that’s dangerous)
- 3 Pitfall 1: Using arrays for everything
- 4 Pitfall 2: Huge arrays and the illusion of “just memory”
- 5 Pitfall 3: Copy-on-write and the subtle cost of “just passing arrays”
- 6 Pitfall 4: Nested loops and accidental O(n²) nightmares
- 7 Pitfall 5: Sorting, reshaping, and other hidden transforms
- 8 Pitfall 6: “Small for now” arrays in long-lived processes
- 9 Pitfall 7: JSON as a transport for arrays, everywhere
- 10 Reading array-heavy code with a performance eye
- 11 Practical patterns: making peace with arrays
- 12 Pattern 1: Normalize your data early
- 13 Pattern 2: Introduce simple value objects in hot paths
- 14 Pattern 3: Build indexes once, reuse everywhere
- 15 Pattern 4: Prefer generators in pipelines
- 16 Pattern 5: Be honest about array size in logs and metrics
- 17 Pattern 6: Think about arrays when you design APIs
- 18 Teams, careers, and the people behind the arrays
- 19 A quiet checklist for your next piece of array-heavy code
The quiet cost of arrays in PHP
There is a particular kind of silence that only developers know.
It’s 23:47.
The office is empty, or your kitchen is pretending to be one.
The laptop fan breathes a little heavier than usual.
You run a script that should be fast. It isn’t.
You stare at the screen, watch the progress bar crawl, and think:
“But it’s just arrays. How bad can it be?”
Friends, this article is about that moment.
PHP arrays are one of the language’s greatest gifts and biggest traps. They let us prototype quickly, move fast, and model data without ceremony. But used carelessly, they quietly burn CPU, eat memory, and turn “should be fine” code into time bombs in production.
If you’re hiring PHP developers, looking for a serious PHP job, or simply trying to level up your craft, understanding array performance is not an academic exercise. It’s how you write code that still holds up when your traffic chart stops being cute and starts being scary.
Let’s talk about the subtle, human side of PHP array performance: where we cut corners, what it costs us, and how to do better without turning our code into unreadable wizardry.
Why PHP arrays feel so friendly (and why that’s dangerous)
One of the reasons many of us stay loyal to PHP is how forgiving it is.
You can do this:
$user = [
'id' => 123,
'name' => 'Anna',
'tags' => ['admin', 'editor'],
];
and then five minutes later do this:
$user['last_login'] = time();
$user[] = 'unexpected';
PHP shrugs and lets it slide. Indexed array, associative array, mixed types — everything goes into this one flexible structure called an array.
But under the hood, a PHP array is not a simple C-style array. It’s a hash table with a lot of machinery:
- it stores keys and values,
- it keeps insertion order,
- it handles collisions,
- it maintains reference counts,
- it juggles zvals (the internal value type).
That flexibility has a price. It is not a lightweight structure.
And we forget that, because the syntax feels so… innocent.
So the first performance pitfall is psychological: we treat PHP arrays like cheap containers, when in reality they are heavy-duty Swiss army knives.
The day traffic spikes, we discover the bill.
Pitfall 1: Using arrays for everything
Imagine a mid-sized PHP codebase built over 5–7 years. Different developers, different moods, different deadlines.
You’ll see patterns like:
$useras an array instead of a value object$configas a giant nested array spread across files$dataarrays passed through 5–6 function layers- magic indexes like
$rowor$itemthat no one remembers
When arrays are used for everything:
- performance degrades subtly (copy-on-write, hash lookups, memory fragmentation)
- maintainability suffers (no type safety, no clear contracts)
- bugs hide in the gaps (wrong keys, missing indexes, silent assumptions)
Performance and readability are not enemies here.
Most of the time, the fix is not “more clever code”, but more honest modeling.
When an array is the wrong tool
If you notice one of these smells, you’re probably misusing arrays:
- You pass the same associative array shape through 10 different functions.
- You need to remember that
['id'],['user_id']and['uid']are all “the same thing”. - You keep adding flags like
['is_active'],['is_verified'],['status']to the same structure. - IDE autocomplete looks like a guessing game.
In these cases, a simple class often outperforms a hash table plus guesswork:
class UserDto
{
public function __construct(
public int $id,
public string $name,
/** @var string[] */
public array $tags = [],
) {}
}
Besides being clearer and safer, this may reduce array allocations and key hashing, especially in hot paths.
Is it micro-optimization? Sometimes. But once your code runs for thousands of users per second, little decisions accumulate into real numbers.
Pitfall 2: Huge arrays and the illusion of “just memory”
You know that moment in a standup when someone says:
“It’s just 100k items in memory, that should be fine.”
And nobody wants to be the person asking:
“Are we sure about that?”
PHP’s arrays are memory-heavy. Roughly speaking, each element carries overhead for:
- the key (string or integer),
- the value (zval),
- the hash bucket,
- internal pointers.
So when you create a “just 100k items” array, you’re not storing 100k tiny values. You’re storing a 100k-element hash table with everything that implies.
The classic “read everything into memory” trap
Consider code like this:
$rows = $pdo->query('SELECT * FROM big_table')->fetchAll(PDO::FETCH_ASSOC);
foreach ($rows as $row) {
processRow($row);
}
It’s clean. It’s readable.
And it may silently crash your script with an out-of-memory error once the table grows.
We forget that fetchAll() builds a huge array of hash tables (rows) each containing multiple hash tables (columns as strings).
In contrast, using a cursor/streaming approach:
$stmt = $pdo->query('SELECT * FROM big_table');
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
processRow($row);
}
keeps memory flat and predictable, at the cost of slightly more verbose code. Over millions of rows, this difference isn’t theoretical. It’s whether your job finishes or dies at 82%.
In interview settings on platforms like Find PHP, this kind of sensitivity to memory patterns is what separates “writes working code” from “writes code that survives production traffic”.
Pitfall 3: Copy-on-write and the subtle cost of “just passing arrays”
PHP uses copy-on-write. If you assign an array to another variable, it doesn’t immediately copy all elements; it just points both variables to the same internal structure with an increased reference count.
But the moment you modify one of them, PHP creates a separate copy.
$data = getBigArray(); // big array
$copy = $data; // no real copy yet
$copy['new'] = 'value'; // triggers a copy of the whole array
On small arrays, that’s fine. On large arrays, that’s a performance cliff you only see in production metrics.
Hidden copy-on-write traps
- Passing large arrays into functions that modify them.
- Modifying arrays after logging or caching them.
- Using
$arrayand$arrayCopywhile assuming “this probably doesn’t matter”.
Better patterns:
- Return new arrays instead of modifying arguments.
- Use objects or iterators when you know the structure is large.
- Be deliberate about where mutation happens.
One very simple guardrail: when a function is meant to modify input, name it honestly:
function enrichUserData(array &$user): void
{
// explicitly mutates
}
Suddenly, both your future self and your teammates know to tread carefully. Performance awareness starts with naming.
Pitfall 4: Nested loops and accidental O(n²) nightmares
We’ve all done this at some point:
foreach ($users as $user) {
foreach ($orders as $order) {
if ($order['user_id'] === $user['id']) {
// match user with order
}
}
}
On test data with 10 users and 30 orders, this is instant.
In production with 50k users and 200k orders, this is where time goes to die.
The pattern is simple and deadly:
- Two arrays.
- Nested loops.
- Linear scan in the inner loop.
Suddenly you’re at O(n²) or worse.
The hash table is already there — use it intentionally
PHP arrays are hash tables. That means lookups by key are cheap. Instead of scanning:
$ordersByUser = [];
foreach ($orders as $order) {
$ordersByUser[$order['user_id']][] = $order;
}
foreach ($users as $user) {
$userOrders = $ordersByUser[$user['id']] ?? [];
// process
}
Same data, different structure.
One line indexing the data often turns a “this is slow under load” ticket into “this is boringly fast”.
This is the kind of refactoring that doesn’t show up in fancy architecture diagrams, but hiring managers and team leads notice it in code reviews when they’re looking for senior PHP developers.
Another common trap: repeated sorting and reshaping of arrays.
usort($items, static function ($a, $b) {
return $a['created_at'] <=> $b['created_at'];
});
Sorting itself is not evil. But if you:
- sort the same data multiple times,
- convert between formats repeatedly (indexed → associative → grouped),
- or build “temporary” arrays in tight loops,
you’re paying extra costs every time: allocations, comparisons, hash operations.
Recognizing “array gymnastics”
When you see code like:
$ids = array_column($items, 'id');
$itemsById = array_combine($ids, $items);
$sortedIds = $ids;
sort($sortedIds);
foreach ($sortedIds as $id) {
$item = $itemsById[$id];
// ...
}
ask yourself:
- Do we need all these transformations?
- Can we precompute once and reuse?
- Is the sort key stable enough to sort only at the source (SQL ORDER BY, for example)?
Those questions are not just about micros. They’re about respect for the runtime that your code lives in.
Pitfall 6: “Small for now” arrays in long-lived processes
In traditional PHP-FPM setups, each request is short-lived. Memory gets wiped after the response. That hides some sins.
But once you move to:
- long-running CLI scripts,
- queue workers,
- daemons (ReactPHP, Swoole, RoadRunner, etc.),
array behavior starts to matter more. Leaks accumulate.
Imagine a worker that:
- reads messages from a queue,
- accumulates them in an array “for batching”,
- forgets to clear or reinitialize that array properly.
After a few hours, that “small” array is a bloated timeline of forgotten work. And you sit there wondering why the worker that was fine in staging slowly starts lagging behind in production.
In job descriptions on platforms like Find PHP, when you see “experience with high-load systems” or “long-running PHP workers”, a big part of that is simply not making these quiet mistakes.
Pitfall 7: JSON as a transport for arrays, everywhere
We love this pattern:
$json = json_encode($data);
$redis->set('cache:key', $json);
Or we ship JSON from PHP to JavaScript, to another PHP service, to a log file, and back.
Each time:
- the array is traversed,
- values and keys are converted,
- strings are allocated.
Encoding big nested arrays can be surprisingly expensive. Decoding them on every request just to grab two fields is a tax you don’t notice until your p95 latency graph gives you a side-eye.
Sometimes the solution is architectural (store data in a dedicated system). Sometimes it’s small and local:
- Cache smaller derived structures instead of the entire array.
- Store only the IDs, recompute details.
- Avoid storing the same massive array under multiple keys “just in case”.
The more we treat arrays as raw “data blobs”, the more we forget they are complex structures with nontrivial serialization costs.
Reading array-heavy code with a performance eye
Here’s a simple exercise you can do tomorrow at work.
Open a file where you know performance matters:
an import script, a report generator, a queue worker, a controller that gets a lot of traffic.
Then scan through it and pay attention only to arrays:
- Where are they created?
- Where are they copied?
- Where are they iterated?
- Where are they transformed?
Ask yourself:
- Is this array bigger than I think?
- Does this loop run more times than I think?
- Does this function silently create new arrays each time?
You’ll start noticing:
array_mapthat allocates new arrays that could be streamed.array_mergeinside loops.- multiple
array_columnoperations on the same input. foreachover data that is never used fully.
The act of reading code this way changes how you write it. You start seeing arrays not as simple boxes of values, but as structures with costs and tradeoffs.
That mindset is exactly what hiring teams look for when they say they want “engineers who understand performance, not just frameworks”.
Practical patterns: making peace with arrays
PHP arrays are not villains. They’re just powerful tools that demand respect. The goal is not to avoid them, but to use them on purpose.
Let’s talk about some pragmatic habits that help.
Pattern 1: Normalize your data early
When data comes in, it’s often messy:
- inconsistent keys from external APIs,
- mixed types from forms,
- different structures for similar concepts.
Instead of passing this mess around, normalize once:
function normalizeUser(array $raw): array
{
return [
'id' => (int)($raw['id'] ?? 0),
'name' => trim((string)($raw['full_name'] ?? $raw['name'] ?? '')),
'email' => strtolower((string)($raw['email'] ?? '')),
];
}
Then the rest of the code deals with a predictable shape:
- fewer checks,
- fewer transformations,
- fewer array rebuilds.
Normalized data means fewer chances for accidental large copies and fewer passes over the same array to “clean it up just in case”.
Pattern 2: Introduce simple value objects in hot paths
This is the one that often gets pushback:
“Why create a class if an array works?”
Because that class can:
- carry behavior with the data,
- be passed by reference more predictably,
- avoid key lookups everywhere.
class Money
{
public function __construct(
public int $cents,
public string $currency,
) {}
public function add(self $other): self
{
if ($other->currency !== $this->currency) {
throw new RuntimeException('Currency mismatch');
}
return new self($this->cents + $other->cents, $this->currency);
}
}
Instead of arrays like ['amount' => 1000, 'currency' => 'USD'] thrown around everywhere, you now have something that’s:
- clearer,
- harder to misuse,
- often cheaper than a flexible hash table for every operation.
You don’t have to turn the whole project into a DDD museum. Start with the hot paths: pricing, billing, critical workflows. The payoff is both performance and mental load reduction.
Pattern 3: Build indexes once, reuse everywhere
This is one of those tricks that feels almost embarrassingly simple.
Instead of doing this kind of thing in multiple places:
foreach ($orders as $order) {
if ($order['user_id'] === $userId) {
$result[] = $order;
}
}
do the work once:
function indexOrdersByUserId(array $orders): array
{
$byUser = [];
foreach ($orders as $order) {
$byUser[$order['user_id']][] = $order;
}
return $byUser;
}
Then reuse that index wherever you need it.
The benefits:
- lower algorithmic complexity,
- fewer repeated scans,
- clearer intention when you read the code three months later.
If you’re leading a team or reviewing code, this is a habit worth spreading.
Pattern 4: Prefer generators in pipelines
Some codebases have “data pipelines” that look like this:
$items = fetchItems();
$items = filterItems($items);
$items = transformItems($items);
$items = enrichItems($items);
foreach ($items as $item) {
saveItem($item);
}
Each step creates a new array. For large datasets, it’s death by a thousand cuts.
With generators, you can keep the same logical flow but avoid building intermediate arrays:
function fetchItems(): iterable { /* yield ... */ }
function filterItems(iterable $items): iterable { /* yield ... */ }
function transformItems(iterable $items): iterable { /* yield ... */ }
function enrichItems(iterable $items): iterable { /* yield ... */ }
foreach (enrichItems(transformItems(filterItems(fetchItems()))) as $item) {
saveItem($item);
}
The code is still readable (once the team gets used to generators), and the memory profile becomes much flatter.
In environments with heavy imports, ETL scripts, or data migrations — common tasks in many PHP jobs — this change is not cosmetic. It’s whether your script can run for hours without slowly dying.
Pattern 5: Be honest about array size in logs and metrics
Performance stories rarely start with “we knew exactly where the problem was”.
They start with:
- “The queue started lagging behind.”
- “Memory usage looked weird.”
- “The cron job sometimes finished in 2 minutes, sometimes in 20.”
If you’re working on systems where arrays can get large, it’s worth adding lightweight logging around them:
$items = fetchItems();
if (count($items) > 10000) {
error_log('Large items array: ' . count($items));
}
Or use metrics (Prometheus, StatsD, etc.) to track typical sizes. Over time, you learn what “normal” looks like and can spot outliers.
For leaders hiring senior PHP engineers, seeing this kind of defensive awareness in code is a green flag. It shows someone who’s not just fighting fires but setting up the environment to prevent them.
Pattern 6: Think about arrays when you design APIs
Both internal functions and external APIs benefit from clear contracts.
Vague function:
function search(array $filters): array
{
// ...
}
Better:
/**
* @param array{
* query: string,
* page?: int,
* per_page?: int,
* sort?: 'asc'|'desc'
* } $filters
*
* @return array{
* items: array<int, array>,
* total: int
* }
*/
function search(array $filters): array
{
// ...
}
Even better in some contexts: dedicated request/response DTOs.
When you know exactly what goes in and out, you can avoid:
- stuffing “just one more” field into existing arrays,
- passing giant blobs of data “to be safe”,
- overfetching or oversharing data between layers.
Clear contracts reduce unnecessary array juggling and force you to think about shape and size.
Teams, careers, and the people behind the arrays
Underneath all of this talk about hash tables and memory, the real subject is us. The people writing the code.
We reach for arrays when:
- we’re tired,
- the deadline is close,
- the task is unclear,
- we don’t want to argue about design.
Arrays let us defer decisions. They are the “we’ll clean it up later” of data modeling.
Sometimes that’s the right tradeoff. Shipping matters. Businesses move.
But as your projects grow — and as you move from junior roles to mid, senior, lead — the cost of those deferrals becomes visible.
On platforms like Find PHP, companies looking for PHP developers are not just checking if you know syntax. They’re looking at how you think:
- Do you understand what your code does to memory and CPU?
- Can you anticipate the behavior of your arrays under load?
- Do you model data in a way that supports change, not fights it?
Your relationship with arrays is a surprisingly honest mirror of your relationship with complexity.
A quiet checklist for your next piece of array-heavy code
Next time you’re about to write or review PHP code that leans heavily on arrays, pause for a moment and ask:
- Is this really an array, or should it be an object?
- Will this array stay small, or can it grow beyond what I expect?
- Are we scanning this data more than once?
- Can we index instead of nesting loops?
- Do we really need to load all of this into memory at the same time?
- Are we clear about the shape of this array, or are we guessing?
You won’t get perfect answers every time.
Sometimes you’ll still choose the quick, dirty array.
Sometimes you’ll refactor too early.
That’s fine.
What matters is that you see the tradeoff. That you no longer treat arrays as harmless background noise, but as first-class participants in the performance story.
Late at night, when the office is quiet and the monitors glow a little too brightly, there’s a small kind of satisfaction in knowing that the code you’re writing respects both the machine and the people who will live with it.
Somewhere in that balance — between readability and performance, between moving fast and thinking deeply — is where good PHP work quietly turns into great work.