I'm not versed in coding at all, but could that be simplified by assembling the picture during character TF's (one changed picture at a time) and only bringing up a copy of the premade whole when required?
Could it be done? Possibly. The issue like you said, a metric fuck-ton and a lot of work. Some artist would have to make the modular images. There'd be human faces, elven faces, cat faces, horsey faces, wolf faces, demonic faces, foxy faces.... then probably human ears, pointy ears, sail-like ears, then eye shapes and color, hair length and styles and color, claws, wings, digitigrade legs (depending on where the bust image ended), skin, scale furs, and all the color options of those. Pregnancy stages 1, 2, and 3 tummy and breasts, not to mention breasts and their sizes and the bodytype (bodybuilder, fluffy, fit, etc.). Then wings or not and what kind. Then maybe even tails and their amount if they're a kitsune.
Having to make all the options and having to make sure they all blended and meshed right for those characters with chimera features (fin-ears, pink mohawk, blue scales, golden eyes, tapered canid cock, horse legs, and bat wings.... then coding it all into the Appearance tab to have it compile all those parts into a comprehensive image (that doesn't look like garbage) is a really big ask. And someone would have to pay the artist for all that work (and the coders). I'd rather the artists focus on making more sexy busts for existing NPCs (like Liulfr) or making some sexy CG scenes images.
I think that ultimately, it's better for the player to take advantage of the option to have their own avatar added into the token at least if they really need to see something. That way they get an image they like, at least in the form they likely most commonly have, or one they don't mind seeing even if it doesn't perfectly match.