You’ve made all your test data, ran all your tests and now all you have to do is say what solver or what set of parameters are the best (or at the least the best in what circumstances. Faced with a slab of data, albeit nicely put into a spreadsheet, its not so easy to draw those conclusions. Humans, well at least this human, are fairly visual creatures. We don’t like looking at long lists of numbers but given a nice graph or diagram we can grasp the concept quite quickly.

Performance Profiles are one way to turn these slab of data into a pretty picture. On the horizontal profile you have some sort of performance metric, like time taken to run or the value of the best solution found by a solver. On the vertical axis you have the percentage of tests. You read the performance profile as saying “This solver found a solution in x time y% of the time” or “This solver found a solution with this objective gap, this amount of the time”. The best solution would look like a vertical line on the far left hand side of the graph, the worse will be a vertical on the far right hand side of the graph. For the most part, your best solution would be on the line that is in the top left hand corner the most.

The paper I linked to earlier says something slightly different to what I just did: on the horizontal axis it has the ratio to the best solution in each test case. Either one works, both have their problems. With what I described, if one test has a performance metric value around one, while another test has a value around 100, the performance profile is going to look somewhat stretched out. This is more of a matter of how do you interpret it. Just be careful. With comparing ratios, you will run into issues if your best value happens to be zero (cause you can’t divide by zero). To get around this you can add an epsilon to each value, but its a bit dodgy and as if you are comparing values 0 and 100 in one test, well you are going to get a crazy big value.

**How to make performance profiles in a spreadsheet**

Currently I’m booted up in Windows (Linux transition doesn’t happen when I haven’t worked out how to make latex tables out of Libre Office Calc yet), so this will be written for Microsoft Excel, but I imagine it would translate easily to other spreadsheets.

1) Get your data. Nothing special, just choose one performance profile you want to compare over. Each different solver should have their own column, and each test case should be in the same row

2) (optional, for ratios) Do you have any zeroes in your data? This is problematic as we are going to divide by the smallest or largest value in each row, and well, dividing by zero is a bad idea. If you have zeroes, replace them with a similarly small number, how small will depend on what the rest of your data looks like. I use the formula =IF(A2=0, 0.00001, A2) and a new set of columns to make life easier for me.

2) (optional, for ratios) We are now going to find the ratios with the best solution. For each row (i.e. each test case), divide each value by the best value on that line. The best value could be the smallest value or the largest. As I’m looking for the solver that produces the smallest objective values, I want the best value is the smallest. To find the ratios, I use the formula =F2/MIN($F2:$I2) (change the cell references as needed) and create a new set of columns with the ratios.

3) We now want a list of each ratio/value that appears.To make this happen, I just copy and paste the values (not formulas) into one big long column, sort it and delete any duplicates I see. I like to them sort the data so that I can check for any obvious mistakes later. Normally I delete the ones as there are heaps of them. Duplicates don’t really matter in this case as they won’t affect our final result.

4) Now we want to find what percentage of tests have achieved a value of at least each ratio. I simple countif statement (to get the cumulative frequency) divided by the total number of tests (to get the percentage) to do the work. I use =COUNTIF(K:K, “<=”&$P2)/38. If you sorted the column in the previous step, you should see that each column is monotonically non-decreasing (i.e. getting bigger of staying the same)

5) We can now graph! Simply highlight your data columns and make a line graph. Now you have a performance profile from which you can easily make some observations about each of your solvers. From this one I can easily rank the solvers from solver 1 being the best, to solver 4 being the worse. Yes the graph I have is not a brilliant one: it doesn’t have a title or any of the axis labelled, but it is good enough to get results. If I was publishing it I would tidy it up (especially that x-axis)

6) (optional) you may notice in the performance profile that when each line steps up it does so at an angle. Really we want it to go straight up. To do this, we add in some dummy points. So for each x value there should be two y values: one for the percentage of cases that reach a value of x (what you already have below) and the other percentage being the previous percentage. Admittedly I could get excel to play nicely with me here.

**Performance Profiles in Gnuplot**

Yesterday when I started to write this, I initially tried to do it all in excel. In the end I couldn’t get the x axis to be on a sensible scale and couldn’t get step 6 working (the performance profile would look a bit like a heart rate monitor). So I turned to gnuplot. Really, so much easier.

1) Organise your data. Start with steps 1-4 in the spreadsheet instructions. When you have all your percentages organised copy and paste into a text file (although conventionally called a .dat file). Each column should be the percentage reached by the different solvers, with the first column being the x value reach. Well really it doesn’t matter where each of the columns are, but it is nice to keep to convention.

2) Make your .plt file. This is the file that contains all your plot commands. Mine looks like:

#Always put these two at the beginning as otherwise gnuplot might remember something without you realising clear reset #set terminal png set terminal pdfcairo font "Gill Sans,9" linewidth 4 set output 'timePP.pdf' set key bot right #This puts the legend on the bottom right hand corner of the plot #set logscale x 10 #This gives the x axis an log base 2 scale set key autotitle columnheader #title columnhead take the name of the graph from the top of the y-axis column set termoption dashed #I had to include this to get the differnet line styles sorted set ylabel "%of instances" #x-axis label set xlabel "Time" #y axis label #This block sets different line styles, so later when I plot, I can just say "ls 1" (using linestyle 1) instead of writting all the style criteria #set style line x == define line style x, which I'll later call using ls x # lt y == set the line type as y. 1 is a solid line, 2 a dashed line, 3 a dotted line, 4 a dash dot line #lc rgb "black" == line colour is black. You can put in other line colours (like red, orange, yellow etc.), I'm not sure what the limits are #lw set the line width set style line 1 lc rgb "gray" lw 1 set style line 2 lc rgb "black" lw 1 set style line 3 lc rgb "black" lw 1 set style line 4 lc rgb "black" lw 1 #Plot them. Each line is a sepearte line on the one plot. #DPL.dat is the data file we are getting the data from #using a:b means data from column a is the x-axis, data from column b is the y axis #With steps means we want vertical/horizontal steps, no diagonal lines between points #ls x : use linestyle x plot "time.dat" using 1:2 with steps ls 1,\ "time.dat" using 1:3 with steps ls 2,\ "time.dat" using 1:4 with steps ls 3,\ "time.dat" using 1:5 with steps ls 4 set output

And that gives me:

Of course it is easy enough to add in colours, but as I’m publishing this, no colours allowed.

—-

So that is all it takes. Performance Profiles are convenient way to compare many cases to see which one is the best. Of course profiles don’t let you see why different solvers perform the way they do and in which circumstances, but they are a good first step.

## Leave a Reply