Spaces:

HPAI-BSC
/

TuRTLe-Leaderboard

Running

Miquel Albertí commited on May 15

Commit

2c02057

1 Parent(s): 9f92a32

Update READEM

Files changed (1) hide show

metrics.md CHANGED Viewed

@@ -11,7 +11,7 @@ For those designs that pass SYN, PSQ is measured with the PPA report, which is c
 Since STX, FNC and SYN are binary evaluations, pass or fail, their score is computed using Pass@1. Remember that failures in previous stages are reported as automatic fails in the next ones.
-On the other hand, PPA is represented as numerical values that must be evaluated against a golden solution (crafted by humans in our case). Therefore, we need to introduce a new formulation that takes this into account.
 ### PPA-Score
@@ -31,9 +31,9 @@ $$
 $$
 which has the following interpretation:
-- $\hat{p}_{i,j} = 0$ : designs that have twice or more the area, the power, or the delay (performance), when compared to the human reference.
-- $\hat{p}_{i,j} = 1$ : designs with an area, power or delay equal to that of the human reference.
-- $\hat{p}_{i,j} = 2$ : can only be obtained by chips which occupy no space, execute in no time, and consume no energy (perfect but impossible).
 The final formula considering all generations of an LLM for a given benchmark is just the average of these scores:
 $$

 Since STX, FNC and SYN are binary evaluations, pass or fail, their score is computed using Pass@1. Remember that failures in previous stages are reported as automatic fails in the next ones.
+On the other hand, PPA is represented as real values that must be evaluated against a golden solution (crafted by humans in our case). Therefore, we need to introduce a new formulation that takes this into account.
 ### PPA-Score
 $$
 which has the following interpretation:
+- $\hat{p}_{i,j} = 0$ : means that the generation has not passed the previous stages, or that it requires twice or more the area, the power, or the delay (performance), when compared to the human reference.
+- $\hat{p}_{i,j} = 1$ : are designs with an area, power, or performance equal to that of the human reference.
+- $\hat{p}_{i,j} = 2$ : can only be obtained by chips which occupy no space, execute in no time, or consume no energy (perfect, but impossible to achieve).
 The final formula considering all generations of an LLM for a given benchmark is just the average of these scores:
 $$