[COMPLETED] 5 Mins Planned downtime Thurs (07/30) @9 PM Pacific (4 AM UTC 7/31)
We are planning a quick hotfix deployment for Hero Lab Online on Thursday (July 30th) at 9pm pacific (4 am UTC 7/31). We expect downtime to be 5 minutes or less.
EDIT: This has been completed. |
HLO Down
No notice, and right in the middle of Gencon sessions.
|
Within the product itself, there should have appeared a notice regarding the planned outage. That notice should have appeared roughly TWO HOURS prior to the time and provided an ongoing update of when the outage would occur.
The actual outage lasted for only a few minutes. And given the two hours of advance notice within the product, it should have been practical for GMs to plan for a 5-minute "bio break" at 9pm. We are in the middle of GenCon, so there is simply no good time to deploy anything. However, there were issues that absolutely needed to be addressed. So we waited until later in the night as a "less bad" option. |
As with PaizoCon Online, anyone can go look at Gen Con Online's list of events. Paizo (who is organizing the majority of Starfinder and Pathfinder 2nd Edition events) runs their Thursday/Friday/Saturday events starting at 8 AM, 2 PM, and 8 PM Eastern. Slots are 5 hours long, so shutting off the server at 9 PM Pacific (Midnight Eastern, 4 hours into the slot) meant you probably interrupted the final encounter of a bunch of events.
Sunday's events start at 9 AM Eastern, if there's an emergency. I played in our normal non-virtual game (with paper character sheets!) tonight so I don't know what warnings went out. |
The information we had showed numerous games going late into the night. So there was simply no "good" time to do it. We consulted with the person on staff most familiar with the convention gaming schedule (the rest of us have been working round the clock), and she didn't flag a conflict with the outage timing. So we did our best to pick a "less bad" time, and it sounds like we could have been more thorough. I apologize for that.
We have literally been working around the clock to get everything into place for GenCon. And to address the rough edges over the past couple of days for things we didn't catch during our own testing. There are limits to what a tiny team like ours can achieve, and I'm proud of what we've managed to put into place this week. It hasn't been perfect, but it's been pretty darn good. We're all exhausted on this end. I hope everybody has a great weekend gaming and that all of that hard work pays off overall. |
Addendum: If there are things we can do to improve the outage notification mechanism, please share your suggestions. We've striven to achieve a balance that accurately conveys upcoming outages without being obtrusive. If we need to adjust that balance, or if there's a use-case we haven't covered adequately, we can make the appropriate changes.
|
Hey Rob,
Not knowing your infrastructure, is it possible that you could spin up a second front end cluster, deploy update to front end cluster, drain traffic from A to B and then remove A? Obviously, if there are DB migrations, this might be less ideal and would require a lot more heavy lifting. |
Rob,
We had 4 in our group, all using HLO. If there was a warning, none of us caught it. Maybe making it a persistent toast until we close it? I know I've seen it in the past, but didn't seem to yesterday, when it was most critical. Also, yeah, the convention schedule, just like at Paizocon, has been out for weeks and is very public. Please at least consider going outside of the main 5 hour blocks as @Parody suggests. And there are no release notes, so we don't even know what was fixed. |
Quote:
This was something I wanted in place more than a year. Alas, I then found out the server code had to be completed rewritten (see my comments here for more info). During the rewrite process, I've probably put about 50% of the necessary infrastructure into place to accomplish this, but there's still a meaningful chunk of work left to do. And then a TON of testing. As you surmised, an additional factor has been that most releases (aside from these GenCon hotfixes) entail a bunch of database changes to incorporate the new capabilities we've been steadily adding. That increases the complexity greatly, and definitely wouldn't be supported at first, but we could still use the transition approach for hotfixes that are code changes only, like we've needed the past few days. So it's definitely something I want to do - and have been working towards in pieces - but we're not there yet. My goal is to be there by the end of the year, finishing up the missing pieces interspersed with all the other new stuff that's in the queue. :) |
Quote:
We're gonna figure out a better way to get the convention game schedule clearly known by the dev team in the future. The release notes went out this morning. We were wiped yesterday. The release notes were properly staged in advance, but we forgot to unveil them once the hotfix was officially deployed. |
Rob, thanks. Yeah, I think a persistent toast would be nice. You could also get rid of the persistent toast about multiple logins.....I'd think that one only needs to be there for 5 or 10 seconds. Right now, I have to x it each time it comes up. I'll also suggest that the system stabilize for the week of a premium convention. It would certainly prevent last minute changes, especially during the final battle. While it's nice for a few to have nice shiny things on day 1, the rest of us have to contend with the fallout of any issues, usually manifesting in several outages over the past couple of years.
|
Quote:
Given that a LOT users want the nice shiny things the day it's released, it's a no-win situation for us when the book launches in the middle of a big show (e.g. PaizoCon or GenCon). |
Quote:
|
Quote:
|
Rob, I guess we'll have to agree to disagree. Sure a LOT of users want shiny the first day. But aren't there a LOT MORE users who want stability during the biggest game convention of the year and not having a session interrupted? Are you saying that the majority of your customer base has already purchased APG?
|
Quote:
The bigger factor to consider is that a large contingent of our users consider it a huge selling point of Hero Lab to always have access to the latest shiny bits the day they get released by the publisher. To them, if we didn't have the new books available the day they become available, then they would view HLO as "not usable" for the ENTIRE GenCon weekend. Which is a whole different calculus compared to a 5-minute outage once in a 24-hour period to deploy a hotfix. So this is definitely a no-win situation. And we will continue releasing the books on the publisher street dates for the reasons above. Hopefully, by next GenCon (ideally PaizoCon), we'll have the transparent server transition solution in place, and this will be a non-concern. Everyone will get the books they want on the street date, and there will be no service interruptions for anyone. <fingers crossed> |
Rob, I will certainly cross my fingers with you. I know it's been a longer, hard road than originally anticipated. OTOH, the new Starfinder book is not available, so I guess you can put me in the box of the user that didn't get new shiny on release day. And yet, HLO is perfectly usable, the same as it was before, albeit without shiny new stuff. But, it's certainly NOT unusable.
Also, FWIW, that 5 minute outage was mainly during the boss fight at a major con. I do appreciate the one last night being outside of main table games. |
As for the bad timing of the release on Thursday, it was regrettable. I apologized already, and I'm happy to do it again. :)
FWIW, I circled back on our end and discovered that we weren't given complete information by the publisher regarding event timing. That's why our person who supposedly was "in the know" didn't flag a conflict when we asked her. Could we have double-checked all that information ourselves? Yes, we could have. SHOULD we have needed to double-check it? That's a separate question that we'll be discussing with the publisher. Suffice to say that we were so focused on bug fixing that we (wrongly) assumed we were given accurate and complete information. The rest, as they say, is history. At least we got our timing corrected for the next night's hotfix. :) |
All times are GMT -8. The time now is 02:53 AM. |
Powered by vBulletin® - Copyright ©2000 - 2024, vBulletin Solutions, Inc.
wolflair.com copyright ©1998-2016 Lone Wolf Development, Inc. View our Privacy Policy here.