sinya 發表於 2013-5-16 01:35 AM
就是沒備份 才搶修那麼久4~5天了 可能 已悲劇
May 8 @ 1:37 PM EST
Server has unscheduled downtime for unknown reason.
State is now: ERROR
State before: OK (was about 1 month)
May 8 @ 7:01 PM EST
Brian noticed that one of the HDDs in RAID has failed again and sends Maki an email.
Brian cannot confirm via SSH because of work and send a ticket until the following morning.
May 9 @ 1:24 AM EST
Maki learns that the server is down.
Maki also cannot confirm because server passwords were recently changed and his is outdated.
May 9 @ 1:52 AM EST
Maki sends host an email notifying them that the server is down because of HDD failure.
May 9 @ 1:58 AM EST
LARA (remote console) is connected to server.
May 9 @ 1:43 PM EST
Hetzner responds by asking to check/replace the HDD in 30 minutes.
May 9 @ 2:05 PM EST
Brian shuts down the server upon request by host.
May 9 @ 2:40 PM EST
Hetzner has finished checking both drives and reports that both drives are faulty and must be replaced.
-----------------%<-----------------
HDDTEST-W1F0ME41 ERROR Finished (Selftest, Device: sda); [09.05. 20:13 -
20:26h] Log 09.05.2013/20:13h
HDDTEST-W1F0DP03 ERROR Finished (Selftest, Device: sdb); [09.05. 20:13 -
20:22h] Log 09.05.2013/20:13h
-----------------%<-----------------
May 9 @ 3:08 PM EST
First HDD (W1F0DP03) is replaced and booted with installed OS.
Rebuilding RAID on new HDD commences; ETA 24 hours (normally 6~ but we turned on rAthena).
rAthena is back online for a bit.
May 10 @ 7:29 PM EST
Rebuilding/Resync of RAID failed.
Host recommends to rebuild RAID while server stays in rescue mode; ETA 6 hours.
Rebuilding RAID commences for second time.
May 11 @ 1:05 PM EST
Rebuilding/Resync of RAID fails again.
Hetzner responds by asking to check the drives again.
May 11 @ 3:10 PM EST
Hetzner has finished checking the drives and reports that the rebuild failed because of the second HDD (and we tried to rebuild from it).
-----------------%<-----------------
HDDTEST-Z1F1Q26P OK Finished (Mode: short, Device: sdb); [11.05. 19:53 -
20:57h] Log 11.05.2013/19:53h
HDDTEST-W1F0ME41 ERROR Finished (Selftest, Device: sda); [11.05. 19:53 -
20:47h] Log 11.05.2013/19:53h
-----------------%<-----------------
May 13 @ 10:01 AM EST
Hetzner replaced the broken drive(W1F0ME41) and changed the ports of the sata cables.
Server does not boot at all, only in rescue mode.
May 14 @ 11 AM EST
Hetzner temporarily reconnects one of the old drives.
May 14 @ 3 PM EST (最新的消息)
HDD is not reallocating bad sectors to their reserved space.
Brian begins to force-reallocate those bad sectors.