Unable to get Rosetta work on one Linux server

e
entity ID: 7097150 Posts: 29
08 Aug 2022 08:59 PM

I haven't received any Rosetta work on one Linux server since shortly after joining CE. I believe the problem stems from the fact that I had been running BOINC prior to joining CE and had a local preferences file in use that overrode the global preferences setup by CE. This resulted in copious amounts of errors due to the /var/lib/boinc filesystem filling up. Because of all the errors returned by my host to Rosetta it was blacklisted by Rosetta. I could normally fix this by logging into the Rosetta site using my userid and going to the hosts list and "unblacklisting" the host. Since I can't log into the CE6931730 id, I can't fix the problem. This will have to be fixed by the CE admins. Can someone help? 

Tristan Olive ID: 22 Posts: 391
10 Aug 2022 02:30 AM

I'm not familiar with the "unblacklisting" process. Do you do this by deleting the host? If so, can you tell me the host ID of the server in question?

e
entity ID: 7097150 Posts: 29
10 Aug 2022 12:49 PM

I have never seen the blacklisting indication but have heard about it from other users. The host in question hostname=golf

EX:

Depending on how the server "feels" it can be as little as one failed task that gets you knocked off or it can be up to 3-4 before it knocks you off. I've had this happen to me a few times.

Just go into your profile, click the nice button and update the project in BOINC and everything is fine again.

 

Tristan Olive ID: 22 Posts: 391
10 Aug 2022 04:13 PM

If I recall, that is the host that was giving you trouble due to a lot of large VirtualBox jobs being downloaded, so you disabled VirtualBox jobs in its local settings to optimize it. Currently, the only jobs Rosetta has waiting to be sent are VirtualBox jobs (their "rosetta python projects" application). 

Now, it seems questionable that this has been the issue for a couple of months, especially since I think VirtualBox jobs only make up 10% or less of the total tasks in progress at Rosetta (going by their server status page), but if your compute preferences are set on that host such that it can handle VirtualBox jobs, we could try enabling them to see if that works.

That is the only difference standing out between your host that is getting work and your host that isn't. If it is really blacklisted by Rosetta, there is a max_results_day field in the project database that could, in theory, be set to prevent your host from getting any work from the scheduler, but that is not visible anywhere, as far as I can tell (and I'm not sure the scheduler even uses it anymore; looks like it was deprecated in 2010!). 

https://boinc.berkeley.edu/trac/wiki/BlackList

Another option could be to purge BOINC from your host, removing all configurations, and starting over fresh, so that it looks like a new host to Rosetta? This would affect other projects, too, so you could also just try to remove only the rosetta files from your boinc directory so that it can reattach and recreate them.

e
entity ID: 7097150 Posts: 29
10 Aug 2022 05:18 PM

I disabled VirtualBox for a short period of time to get a handle on why the filesystem was filling up when it almost a TB in size. I found out that Rosetta wants to copy the 8 to 10GB vdi file to every slot directory in addition to everything that was downloaded to the various projects directory. On this server there are 128 slot directories times 10GB. Once I found this and implemented an app_config to limit the data movement, I re-enabled VirtualBox on this server.  It has been running fine on all the LHC work but hasn't gotten any Rosetta since the initial batch. There may be a lot of non-VBox Rosetta work outstanding but it is available very sporadically. My other host receives both versions of the work and has only received a non-VBox batch twice in 2 months. Almost ALL work from Rosetta is VBox work. 

On second thought, disregard this request as the Rosetta work is using an old vboxwrapper (which is why it is coping data all over the filesystem) instead of using the multi-attach. The project admins don't seem interested in updating there processes and people are moving off to other more responsive projects. I kind of see it as a dying project. 

Additionally, I'll probably be disconnecting from CE as it is very difficult to manage work from IDs that I can't logon to. I can't set project preferences. I can't diagnose VBox job failures as the stderr.tx is uploaded with the job and deleted from my system. I can't select other projects when one stops sending me work (Rosetta). This might be good for Windows users who have the native app and get CE work in addition to BOINC work. On Linux, I only get BOINC work but I'm limited to 3 projects when they work but in my case only 2. Unfortunately, this doesn't seem to be a good fit for me. Another observation, when is the drawing supposed to happen as we hit the 100% mark way back in July sometime. Even that seems to be broke.

Mark McA ID: 179 Posts: 228
16 Aug 2022 09:17 PM

Hi,

Latest List of Entries going live later today. We always double-check everything with the prize draw, which sometimes takes a little while. The gauge never stops counting in the background regardless.

Cheers,

Mark

Matt ID: 44 Posts: 302
16 Aug 2022 09:44 PM

Re: your technical concerns:

Our system is engineered to be a simple experience for volunteers running on their home PCs (ex, fewest clicks possible to contribute).  You're running a high-performance Linux server, and wanting (and in some cases needing) to highly customize your setup.  Also, I think your primary goal is scientific computing, and while we commit substantial resources to those causes it's not the primary reason volunteers join Charity Engine.

(*I can't speak for Rosetta, and how they run their project: but I do think you are correct that they are using older VBox code - which is more consequential for a server like yours than for a home PC. )

Bottom line: Sorry if we don't appear to be a fit for your hardware and your goals; if we launch a program for higher-performance systems, we'll let you know.

Graham Jenkins ID: 1626 Posts: 164
15 Feb 2023 04:12 AM

Rosetta seems to be stuck in retrying downloads. Last time this happened, the issue turned out to be an outage on the Rosetta downloas server as shown at: https://boinc.bakerlab.org/rosetta/server_status.php

But that doesn't seem to be the problem this time. Suggestions?

Matt ID: 44 Posts: 302
15 Feb 2023 08:34 PM

Hi Graham.  On that server status page (*a good place to check!) all services are online, as you note.  But the page also reports "Tasks ready to send" as "0".  So maybe give them a few days; I expect they'll have more work soon.  (*How long has this been an issue for you?)

Graham Jenkins ID: 1626 Posts: 164
15 Feb 2023 08:48 PM

I can see six tasks, and it's been trying to download them for about six days .. retrying each after a delay of typically 30 minutes. Tasks for Number Fields and LHC have been happily downloading throughout that period.

I'm wondering whether others are seeing similar issues. Should I delete all the pending Rosetta tasks? Or do a Project Reset for Rosetta?

Graham Jenkins ID: 1626 Posts: 164
16 Feb 2023 08:08 PM

Downloading seems to have completed satisfactorily overnight, but it's unlikely that the tasks in question will now finish before their respective deadlines. Oh well ..

Matt ID: 44 Posts: 302
16 Feb 2023 08:59 PM

Ah - well, glad you're back on track -