Jump to content
PcPerf.fr
Sign in to follow this  
PcPerf bot

More info about the GPU1 to GPU2 transition

Recommended Posts

There's been several questions regarding the GPU1 client and why we

decided to shut it down.  I hope I can shed some light here at least on why we're doing what we're doing, such that even if people disagree with our decisions, they can at least see where we're coming from. 

 

Some people have asked "why shutdown the client if it's working?" The bottom line here is that the GPU1 results

are no longer scientifically useful. It's pretty clear now that DirectX (DX) is

not sufficiently reliable for scientific calculations. This was not

known before (and some people wouldn't believe this until we proved

it). With the GPU1 results, we can now show what the limitations are

pretty decisively.

 

GPU1 also did help us a lot in terms of

developing its successor and what's needed to run GPU's in a

distributed computing fashion.  The good news here is that GPU2 is behaving very well, on both ATI and NVIDIA hardware, and this is a direct result of what we've learned with GPU1 WU's.  In the end, however, GPU1 will not be

able to help us understand protein misfolding, Alzheimer's Disease, etc

due to this unresolvable limitations. We could keep GPU1 live just

crunching away in its current form, but that would be wasting people's

electricity at this point, as we've learned everything we can learn

from those cards can do.

 

In the past, we had a somewhat similar

shutdown situation, i.e. when QMD core projects stopped. In that case, donors were

left hanging since we didn't give any warning for stopping QMD

projects. We did try (perhaps unsuccessfully) to handle the GPU1 situation better than QMD. In

QMD, we stopped needing that core and so we stopped the calculation

without warning, not realizing the impact that would cause. With GPU1, we gave a several month warning (indeed,

note that GPU1 is still actively running, so all of this is information

in advance to shutting down GPU1). We tried to avoid the QMD situation

by giving advance warning, but it looks like donors would like even

more advance warning. However, there's limits to how much in advance we

know the situation ourselves.

 

Indeed, the knowledge that it made sense to end GPU1 came reasonably recently to us.  We have been working on CAL for a while and it seemed

that CAL might be a solution, but we only knew until we got some

testing "in the wild."  DirectX (DX -- what GPU1 is based on) works

much better in the lab than in the wild, and it was possible that CAL

behaved that way too.  After seeing that CAL behaved well in the wild,

it became clear that the GPU1 path was obsolete.  However, this is a

relatively recent finding and we made the announcement about the

situation relatively shortly thereafter.

 

It was a tough decision. Some

suggested we just leave GPU1 running, even though people's electricity

really would be going to waste, other than generating points. I didn't

think that was a good idea. We did know it would be a tough PR hit, but

when people talk about the history of FAH, I want to make it clear that

we're here to address AD and other diseases, not just running

calculations for the sake of points and nothing more (which has been

the critique of some other distributed computing projects).

 

So, what's the right thing to do?  I guess it comes to this: would GPU1 donors be

happier if we just keep GPU1 servers running, doing with no scientific

value for points? We could do that, at a cost of taking away personnel

from improving existing clients, keeping existing servers going, etc

for the sake of keeping GPU1 running. However, that's not what FAH is

for and I think it's important that FAH not devolve into a big points

game, losing sight of why we're doing what we're doing.

 

PS Some further discussion can be found here

 

Voir l'article complet

Share this post


Link to post
Share on other sites
Sign in to follow this  

×
×
  • Create New...