After some of testing it seems that unit experience is far most important factor. I would dare to say it is too much powerful and needs to be nerfed down. To prove what I'm talking about I made a simple test. I throwed two different legions in battle and recorded battle outcomes. Here are the results:
TEST 1: POP vs OPT legions in meeting engagement
Figure 1: Basic test setting

Basic setting for this test consist of two legions from opposite factions marching same distance (2 days) into same province (clear terrain, fair weather) with attack posture (see figure 1). They have no commanders (15% command penalty) and they have exactly the same organizational structure with same number of men and horses. Populares (POP) have just a one point advantage in NM over Optimates (OPT): 90 vs 89.
Figure 2: Stat difference between two legions

Difference comes primarily from origin (ITA for POP faction) and experience. POP legion has 10 cohorts with one star experience. OPT legion has 7 cohorts with one star experience and 3 cohorts with two star experience. This is reflected in slightly higher discipline, assault and cohesion values for OPT cohort elements (figure 2). Judging by numbers this differences are very small. That's why POP legion has 153 PWR (Combat Efficiency of the Force) while OPT legion has 166 PWR. Again, nothing dramatic.
However, difference in battle between these two units is HUGE.
Figure 3: Performance difference between two legions (meeting engagement)

As you may see from figure 3, only slight stat difference is leading to a major performance difference during battle. And please note, OPT legion in this test is relatively unexperienced compared to other Sulla's legions.
TEST 2: POP legion is defending while OPT legion is attacking
I wondered what would happen if I put POP legion on defense? Result of this test is presented at figure 4:
Figure 4: Performance difference between two legions (POP legion defending)

It's clear that defending posture is not enough to compensate disadvantages. Optimates lost every single battle and retreated.
TEST 3: POP legion has one-star commander (3-2-2) and it's on defense while OPT legion is attacking
Finally, in order to boost POP legion even more I appointed (3-2-2) commander which eliminated 15% command penalty and raised legion PWR to 233 (compared to OPT 166 PWR). Now, everything is in favor of POP legion. It has more Combat Efficient Force (PWR), it has commander and it is defending. Where would you bet your money this time? On POP legion? Wrong!!!
Figure 5: Performance difference between two legions (POP legion defending, has commander and more PWR)

Although losses are more balanced now, Populares again lost every battle and retreated (expect one occasion where OPT AI decided to retreat before battle and battle didn't occur).
---
From above tests it's clear that we have two potential problems:
1.) Experience seems to be too powerful. As experience influence discipline, assault and cohesion, even slight difference leads to a huge casualties difference in a battle. I'm not sure how much is this realistic?
2.) Game have problem informing a player about this effects. According to manual, PWR is numerical representation of the relative power of the Force. These tests shows why it is misleading element in decision making process. That's why you will continue to see posts where players are asking about weird battle results.
Any other thoughts?