Lone Wolf Development Forums

Lone Wolf Development Forums (http://forums.wolflair.com/index.php)
-   Hero Lab Online Discussion (http://forums.wolflair.com/forumdisplay.php?f=95)
-   -   [COMPLETED] 5 Mins Planned downtime Thurs (07/30) @9 PM Pacific (4 AM UTC 7/31) (http://forums.wolflair.com/showthread.php?t=64584)

SteveT July 30th, 2020 06:12 PM

[COMPLETED] 5 Mins Planned downtime Thurs (07/30) @9 PM Pacific (4 AM UTC 7/31)
 
We are planning a quick hotfix deployment for Hero Lab Online on Thursday (July 30th) at 9pm pacific (4 am UTC 7/31). We expect downtime to be 5 minutes or less.

EDIT: This has been completed.

flyteach July 30th, 2020 08:03 PM

HLO Down
 
No notice, and right in the middle of Gencon sessions.

rob July 30th, 2020 08:09 PM

Within the product itself, there should have appeared a notice regarding the planned outage. That notice should have appeared roughly TWO HOURS prior to the time and provided an ongoing update of when the outage would occur.

The actual outage lasted for only a few minutes. And given the two hours of advance notice within the product, it should have been practical for GMs to plan for a 5-minute "bio break" at 9pm.

We are in the middle of GenCon, so there is simply no good time to deploy anything. However, there were issues that absolutely needed to be addressed. So we waited until later in the night as a "less bad" option.

Parody July 30th, 2020 08:51 PM

As with PaizoCon Online, anyone can go look at Gen Con Online's list of events. Paizo (who is organizing the majority of Starfinder and Pathfinder 2nd Edition events) runs their Thursday/Friday/Saturday events starting at 8 AM, 2 PM, and 8 PM Eastern. Slots are 5 hours long, so shutting off the server at 9 PM Pacific (Midnight Eastern, 4 hours into the slot) meant you probably interrupted the final encounter of a bunch of events.

Sunday's events start at 9 AM Eastern, if there's an emergency.

I played in our normal non-virtual game (with paper character sheets!) tonight so I don't know what warnings went out.

rob July 30th, 2020 10:29 PM

The information we had showed numerous games going late into the night. So there was simply no "good" time to do it. We consulted with the person on staff most familiar with the convention gaming schedule (the rest of us have been working round the clock), and she didn't flag a conflict with the outage timing. So we did our best to pick a "less bad" time, and it sounds like we could have been more thorough. I apologize for that.

We have literally been working around the clock to get everything into place for GenCon. And to address the rough edges over the past couple of days for things we didn't catch during our own testing. There are limits to what a tiny team like ours can achieve, and I'm proud of what we've managed to put into place this week. It hasn't been perfect, but it's been pretty darn good.

We're all exhausted on this end. I hope everybody has a great weekend gaming and that all of that hard work pays off overall.

rob July 30th, 2020 10:32 PM

Addendum: If there are things we can do to improve the outage notification mechanism, please share your suggestions. We've striven to achieve a balance that accurately conveys upcoming outages without being obtrusive. If we need to adjust that balance, or if there's a use-case we haven't covered adequately, we can make the appropriate changes.

slate July 30th, 2020 11:17 PM

Hey Rob,

Not knowing your infrastructure, is it possible that you could spin up a second front end cluster, deploy update to front end cluster, drain traffic from A to B and then remove A?

Obviously, if there are DB migrations, this might be less ideal and would require a lot more heavy lifting.

flyteach July 31st, 2020 05:25 AM

Rob,
We had 4 in our group, all using HLO. If there was a warning, none of us caught it. Maybe making it a persistent toast until we close it? I know I've seen it in the past, but didn't seem to yesterday, when it was most critical.
Also, yeah, the convention schedule, just like at Paizocon, has been out for weeks and is very public. Please at least consider going outside of the main 5 hour blocks as @Parody suggests.
And there are no release notes, so we don't even know what was fixed.

rob July 31st, 2020 02:09 PM

Quote:

Originally Posted by slate (Post 290000)
Not knowing your infrastructure, is it possible that you could spin up a second front end cluster, deploy update to front end cluster, drain traffic from A to B and then remove A?

Obviously, if there are DB migrations, this might be less ideal and would require a lot more heavy lifting.

You make it sound so easy when you say it like that! ;)

This was something I wanted in place more than a year. Alas, I then found out the server code had to be completed rewritten (see my comments here for more info). During the rewrite process, I've probably put about 50% of the necessary infrastructure into place to accomplish this, but there's still a meaningful chunk of work left to do. And then a TON of testing.

As you surmised, an additional factor has been that most releases (aside from these GenCon hotfixes) entail a bunch of database changes to incorporate the new capabilities we've been steadily adding. That increases the complexity greatly, and definitely wouldn't be supported at first, but we could still use the transition approach for hotfixes that are code changes only, like we've needed the past few days.

So it's definitely something I want to do - and have been working towards in pieces - but we're not there yet. My goal is to be there by the end of the year, finishing up the missing pieces interspersed with all the other new stuff that's in the queue. :)

rob July 31st, 2020 02:12 PM

Quote:

Originally Posted by flyteach (Post 290007)
Rob,
We had 4 in our group, all using HLO. If there was a warning, none of us caught it. Maybe making it a persistent toast until we close it? I know I've seen it in the past, but didn't seem to yesterday, when it was most critical.
Also, yeah, the convention schedule, just like at Paizocon, has been out for weeks and is very public. Please at least consider going outside of the main 5 hour blocks as @Parody suggests.
And there are no release notes, so we don't even know what was fixed.

We'll be changing the behavior to make the toast persistent henceforth.

We're gonna figure out a better way to get the convention game schedule clearly known by the dev team in the future.

The release notes went out this morning. We were wiped yesterday. The release notes were properly staged in advance, but we forgot to unveil them once the hotfix was officially deployed.


All times are GMT -8. The time now is 02:10 AM.

Powered by vBulletin® - Copyright ©2000 - 2024, vBulletin Solutions, Inc.
wolflair.com copyright ©1998-2016 Lone Wolf Development, Inc. View our Privacy Policy here.