A lot of it is this.
People always make these retarded comparisons without context. It's like comparing a muscle car from 1970 to an EV and wondering why it needs more engineers, testing, software, etc.
Back in 2001 you were developing for ONE platform (usually PC or ONE console) - now you have PC + 2-3 consoles.
You usually had fixed resolution targets (remember when you could FINALLY upgrade to 800x600?) - now you have multiple performance targets of 30/60/90/120/144FPS etc., Ultrawide, HDR, VRR, etc.
Very minimal post launch support in most cases - now you have live services, seasons, hotfixes, etc.
Limited to no accessibility features - now you have colorblind modes, mouse free modes, blind modes, controller support, etc.
Localization - Now you have to translate to 10-20 different localized languages, etc.
Online infrastructure, telemetry, analytics, anti-cheat, etc.
That doesn't even touch on how much assets have changed either. Typical polygon counts in 2001 were 500-2000, one texture, baked lighting, etc. Now things are 50k-150k polygons, multiple textures, facial rigs, motion capture, cloth/hair physics interactions, etc. Art production is closer to VFX now.
It's a fucking apples to mangosteens comparison.