RR Down: Sending Alert Messages about Genome Browser Being Offline
Overview
This page has reminders of what to do if the RR is down for a long period. You want to verify the problem, contact cluster-admin. cc'ing the team, and then if it isn't fixed in a reasonable amount of time, consider additional messages.
Contact cluster-admin/cc qateam
Check logs
See Checking_RR_status_through_hgTracksRandom where you can tail -100 /hive/users/qateam/perf/hgTracksRandom.log
to see the history of the RR over 15 minute intervals. Check out the Apache error log output page to learn more about trying to figure out what a user might be doing.
Confirm issue
Navigate to the machines to confirm there is a problem.
- One approach is to have a secondary browser open new windows with all of the machines open as tabs for the home page.
- For example, if Chrome is your main browser and Firefox is your secondary under Preferences/General "Home page:" and When Firefox starts: Show my homepage: paste the following for your homepage:
hgw0.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38&measureTiming=1|hgw1.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38&measureTiming=1|hgw2.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38&measureTiming=1|genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&measureTiming=1|genome-asia.ucsc.edu/cgi-bin/hgTracks?db=hg38&measureTiming=1|genome-preview.ucsc.edu/cgi-bin/hgTracks?db=hg38&measureTiming=1
- These open hgw0-hgw2, genome-euro, genome-asia, and genome-preview.
Send email
If things look serious send an email to cluster-admin and qateam & browser-dev sharing that the RR (or specific machine, say hgw5 if that what you checking shows) is down.
Things are bad: update twitter
If cluster-admin do not come back with a fix within half an hour, it is probably a good idea to start thinking about notifying the greater community. If the error is minor, for example, only one machine is out (say hgw5) then perhaps it isn't as important to notify the community. But if it is bad, for example mailing list questions start coming in, it might be time to update twitter.
Be sure to say genome-asia and genome-euro are available (if they are).
See this note about our twitter account. Here are some example twitter updates:
- The Genome Browser is unexpectedly down. Please rest assured we are working on having it back up ASAP!
- Our mirrors in Europe and Asia (http://genome-euro.ucsc.edu http://genome-asia.ucsc.edu) are up and available while we work on returning our main site.
- We have now resolved the problem on our main site. We apologize for any inconvenience and thank you for your understanding.
Things are really bad (over an hour+ offline): Ask cluster-admin to update to display the maintenance page
This RM has some history about this page. There is a file maintenance.html at /usr/local/apache/htdocs/ that gets turned on when admin touches another file (maintenance.enable?). Possible example email (be sure to CC the QAteam and other relevant parties):
- Dear cluster-admin,
- With the current issue on the RR, can we update the site to have the /usr/local/apache/htdocs/maintenance.html page display with the maintenance.enable mechanism since it looks like it is not going to be resolved soon.
- Thanks!
P.S. If you ever want to edit this page, when you push it, ask for it to be pushed to:
Dear Pushers, Please push: /usr/local/apache/htdocs/maintenance.html to /usr/local/apache/htdocs/maintenance.html Reason: Update to the maintenance.html page.
Also, if we are putting up the maintenance.html we should send an email to genome-announce as there is a line on that page that suggests our "forum may contain details about this outage."
Example email to genome announce:
Browser Maintenance Today, Dec 3rd @ 4 pm We will be performing some hardware maintenance this afternoon, the 3rd of December from 4 - 5 pm (UTC-8) Pacific Standard Time during our scheduled Thursday maintenance window. Due to recent power outages, we need to restart replication setups which may be experienced as a 30-minute service interruption. Thank you in advance for your understanding. Regards,
Get the PST or PDT here and the - UTC/GMT: https://www.timeanddate.com/time/zone/usa/santa-cruz