it. Instead, first learn facts. Solutions come later.
Currently you don't even know if this is a data problem or a
hardware problem. Until you know that, then you don't even
know where to begin with a solution. Those adjacent UPSes or
power strip protectors do nothing useful and may even may
contribute to the problem. Don't - not for one minute -
assume surge protector is same as surge protection. They are
two different components of a protection 'system'.
First necessary are facts. For example, exactly when the
problem occurs, what is happening simultaneously. That means
you need a tester that will see the problem and record when
the failure happens.
A simplest diagnostic tool is ping that comes with every
OS. Ping can be setup to ping repeatedly. Then one can
observe when problems happen. Some programs can do repeat
pings and record failure with a time code.
Another test involves stressing the system. All
(responsible) ethernet manufacturers provide comprehensive
diagnostics. Setup two (or more) NICs with diagnostic from
same manufacturer. One will output continuous and worst case
data patterns that other NIC(s) will echo back. Does the
network stay stable with this worst case testing ongoing?
You currently have provided only one useful fact. The hub
appears to be locking - reset by power cycling. Apparently a
different hub suffers the same failure. OK. So either the
problem is incoming on network wires or is an AC power
problem. Numerous types of power problems exist. UPS would
only address two - brownouts and blackouts. UPS does not
address noise, surges, or harmonics.
This problem need not be created on AC power wires either.
Problem could be in safety ground wire. But again, don't even
try to fix anything. First what is also on that circuit?.
Using a multimeter, what are voltages between every one of
three AC prongs on that wall receptacle? Consider later an
expensive series mode filter as a temporary solution - a test
- to determine if AC power is even related.
When failure happens, what are all indicators on the hub
front panel? What do the indicator lights on each computer's
ethernet NIC report? How do these lights change as each
computer is disconnected and reconnected to the network -
while problem is ongoing? Again, solve things both faster and
the first time by recording all such details. Then make only
one minimal change to see how each change affects the
problem. Solutions come later. Don't fall for those mythical
UPS and surge protector solutions. Collect facts so that
problem (and not its symptoms) is clearly identified.
Solutions come later.
First thing to perform: computers performing massive data
exchanges using the NIC manufacturer's diagnostic program -
all this while others are still using the network. This worst
case data test runs without complications of an operating
system - strictly a hardware test- AND has been observed to
find hardware problems immediately in a network that otherwise
was working. This test only to make the problem hard and
repeatable. Fixing comes later.
BC wrote:
Thanks for the response Paul. Interesting idea about data "spikes".
I'm not aware of any high network utlising apps that are coinciding
with the network outages, however, it is a distinct possibility. We
have a document management app that IS used infrequently, and if used
incorrectly, can fire enormous amounts of data across the wire. I
think this is worth investigating.
I don't think theres a problem with the computer closet electrics.
I've recently had these tested in lieu of a new backup generator and
nothing has shown up. The actual hardware is protected by UPS with
surge protectors. Although again power spikes could be an issue. I'll
ask our sparkies to monitor over a longer period.
Thanks for your valid input!
