Network Failure - No Idea How to Troubleshoot
DComTalk.com Forum Index DComTalk.com
Discussion of VoIP, VPN, Video Conferencen, DSL and other data commucations.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web dcomtalk.com
Network Failure - No Idea How to Troubleshoot
Goto page Previous  1, 2
 
Post new topic   Reply to topic    DComTalk.com Forum Index -> Ethernet
Author Message
w_tom
Guest





Posted: Thu Dec 08, 2005 4:36 am    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

You are currently trying to fix a problem by first solving
it. Instead, first learn facts. Solutions come later.
Currently you don't even know if this is a data problem or a
hardware problem. Until you know that, then you don't even
know where to begin with a solution. Those adjacent UPSes or
power strip protectors do nothing useful and may even may
contribute to the problem. Don't - not for one minute -
assume surge protector is same as surge protection. They are
two different components of a protection 'system'.

First necessary are facts. For example, exactly when the
problem occurs, what is happening simultaneously. That means
you need a tester that will see the problem and record when
the failure happens.

A simplest diagnostic tool is ping that comes with every
OS. Ping can be setup to ping repeatedly. Then one can
observe when problems happen. Some programs can do repeat
pings and record failure with a time code.

Another test involves stressing the system. All
(responsible) ethernet manufacturers provide comprehensive
diagnostics. Setup two (or more) NICs with diagnostic from
same manufacturer. One will output continuous and worst case
data patterns that other NIC(s) will echo back. Does the
network stay stable with this worst case testing ongoing?

You currently have provided only one useful fact. The hub
appears to be locking - reset by power cycling. Apparently a
different hub suffers the same failure. OK. So either the
problem is incoming on network wires or is an AC power
problem. Numerous types of power problems exist. UPS would
only address two - brownouts and blackouts. UPS does not
address noise, surges, or harmonics.

This problem need not be created on AC power wires either.
Problem could be in safety ground wire. But again, don't even
try to fix anything. First what is also on that circuit?.
Using a multimeter, what are voltages between every one of
three AC prongs on that wall receptacle? Consider later an
expensive series mode filter as a temporary solution - a test
- to determine if AC power is even related.

When failure happens, what are all indicators on the hub
front panel? What do the indicator lights on each computer's
ethernet NIC report? How do these lights change as each
computer is disconnected and reconnected to the network -
while problem is ongoing? Again, solve things both faster and
the first time by recording all such details. Then make only
one minimal change to see how each change affects the
problem. Solutions come later. Don't fall for those mythical
UPS and surge protector solutions. Collect facts so that
problem (and not its symptoms) is clearly identified.
Solutions come later.

First thing to perform: computers performing massive data
exchanges using the NIC manufacturer's diagnostic program -
all this while others are still using the network. This worst
case data test runs without complications of an operating
system - strictly a hardware test- AND has been observed to
find hardware problems immediately in a network that otherwise
was working. This test only to make the problem hard and
repeatable. Fixing comes later.

BC wrote:
Quote:
Thanks for the response Paul. Interesting idea about data "spikes".
I'm not aware of any high network utlising apps that are coinciding
with the network outages, however, it is a distinct possibility. We
have a document management app that IS used infrequently, and if used
incorrectly, can fire enormous amounts of data across the wire. I
think this is worth investigating.

I don't think theres a problem with the computer closet electrics.
I've recently had these tested in lieu of a new backup generator and
nothing has shown up. The actual hardware is protected by UPS with
surge protectors. Although again power spikes could be an issue. I'll
ask our sparkies to monitor over a longer period.

Thanks for your valid input!
Back to top
Al Dykes
Guest





Posted: Thu Dec 08, 2005 8:15 am    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

In article <M%ilf.31996$tV6.6007@newssvr27.news.prodigy.net>,
Robert Redelmeier <redelm@ev1.net.invalid> wrote:
Quote:
BC <bcharlton@pchenderson.com> wrote:
Hmmm.....replaced the suspect 48 port switch with a spare
24 port switch (turns out only 23 ports were in use on the
48 port). Everything fine for 25 days, then bang, network
failure again yesterday. Reset of the switch restores
connectivity. The only thing I haven't change on this level
is the server gigaswitch to which the 48porter uplinks.
I intend to test this out with a spare at the weekend.
If this fails I'm at a total lost. I'm pretty sure there are
no physical loops in the network. The fact that the
switches are unmanaged makes it difficult to troubleshoot.


As you say, difficult. One reset in 25 days isn't that horrible,
but may not be acceptable in a commercial environment.

There are two general causes of switches needing resets:
hardware and software. The hardware side would be things like
static electricity, lightening, poor grounding/interbuilding
These can also be permanent failures.

The software side of things is more likely to be cause by
unpredicted behaviour from high loads, buffer overflows,
evil packets. Make sure jumbo packets are turned off.




Check the manufacturer's web site for tech notes?

--
a d y k e s @ p a n i x . c o m

Don't blame me. I voted for Gore.
Back to top
Robert Redelmeier
Guest





Posted: Thu Dec 08, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

BC <bcharlton@pchenderson.com> wrote:
Quote:
Thanks for the advice guys. It really is appreciated. The problem
that I have is that when the failure occurs it is expected of me to
restore network connectivity immediately. Therefore, difficult to
troubleshoot what is actually happening at the time of failure. I've
taken onboard the advice about NIC tests and am currently setting up
stress testing sender/responder tests to see if I can replicate the
problem. I've also taken onboard advice about potential power issues
and am asking our electrical experts to investigate. If only I could
isolate the problem and make the failure repeatable .....

Further to Tom's advice, you should try to make this problem
repeat. That would make it software (including firmware).
If it won't repeat, it is some electrical/hardware transient.

Try some stress testing in off-peak hours (eves, wends?).
`ttcp` can easily saturate a network, and I would run it
on at least four stations going through the suspect switch.

-- Robert



>
Back to top
BC
Guest





Posted: Thu Dec 08, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Thanks for the advice guys. It really is appreciated. The problem
that I have is that when the failure occurs it is expected of me to
restore network connectivity immediately. Therefore, difficult to
troubleshoot what is actually happening at the time of failure. I've
taken onboard the advice about NIC tests and am currently setting up
stress testing sender/responder tests to see if I can replicate the
problem. I've also taken onboard advice about potential power issues
and am asking our electrical experts to investigate. If only I could
isolate the problem and make the failure repeatable .....
Back to top
w_tom
Guest





Posted: Fri Dec 09, 2005 9:02 am    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

BC wrote:
Quote:
... If only I could isolate the problem and make the failure
repeatable .....

And that is your #1 task. Unfortunately, others don't have
sufficient technical experience to appreciate how things are
solved. An so we have this other problem - teaching others
about reality. Good luck with your testing. And don't forget
to report back. Its a two way street. This is how we all
learn.
Back to top
 
Post new topic   Reply to topic    DComTalk.com Forum Index -> Ethernet All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Solutions: Telephone Systems Electronics Satellite TV Tech & Gadgets
Powered by phpBB