Network Failure - No Idea How to Troubleshoot
DComTalk.com Forum Index DComTalk.com
Discussion of VoIP, VPN, Video Conferencen, DSL and other data commucations.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web dcomtalk.com
Network Failure - No Idea How to Troubleshoot
Goto page 1, 2  Next
 
Post new topic   Reply to topic    DComTalk.com Forum Index -> Ethernet
Author Message
BC
Guest





Posted: Thu Nov 10, 2005 5:20 pm    Post subject: Network Failure - No Idea How to Troubleshoot Reply with quote

Hi there

Hope some kind soul out there can help or point me in the right
direction.

For the last month or so our users have experienced a network failure
about once per week. Rebooting the main 48 port unmanaged switch
(netgear) resolves the problem.

I would like to inspect what's going on at the switch to try and get to
the bottom of the failure. I've downloaded and installed ethereal,
however, I have no idea what I should be looking for in the log files.
Can anyone help?

At present we have our W2K servers running on a copper gigabit switch,
this switch is then connected to the 48 port switch. Various other
switches are downstream of this switch.

I plan on hanging a hub between a downstream switch and its uplink to
the 48 port switch and then using ethereal to analyse what's happening.

However, I've no idea what I should be looking for! Can anyone help
please??

Thanks

BC
Back to top
Robert Redelmeier
Guest





Posted: Thu Nov 10, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

BC <bcharlton@pchenderson.com> wrote:
Quote:
For the last month or so our users have experienced a
network failure about once per week. Rebooting the main
48 port unmanaged switch (netgear) resolves the problem.

This is a very tough problem to troubleshoot. First, what
has changed? Any new hardware, software or usage pattern?

There are some simple hardware things to check: have you
tried swapping out the switch? Is it's powerfeed [UPS] good?
Are any of the ports dead? Or likely to die, 'cuz they're
on long/outdoor runs?

Quote:
I plan on hanging a hub between a downstream switch and
its uplink to the 48 port switch and then using ethereal
to analyse what's happening.

Unfortunately, this will only sniff traffic on that branch
of the network. And may not catch malformed packets. This
is why people buy managed switches.

When do the hangs occur? During heavy usage, idle times?

-- Robert
Back to top
Alexey G. Khramkov
Guest





Posted: Thu Nov 10, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

"BC" <bcharlton@pchenderson.com> writes:

Quote:
For the last month or so our users have experienced a network failure
about once per week. Rebooting the main 48 port unmanaged switch
(netgear) resolves the problem.

First of all, recall all new h/w, especially different (non netgear)
verdors. Sometime new NICs can kill the switch.

Quote:
I would like to inspect what's going on at the switch to try and get to
the bottom of the failure. I've downloaded and installed ethereal,
however, I have no idea what I should be looking for in the log files.
Can anyone help?

Any nonstandard stuff. Our programmers added some IP and TCP
options which killed the switch. Usually that options are not
documented by firmware vendor.

Quote:
At present we have our W2K servers running on a copper gigabit switch,
this switch is then connected to the 48 port switch. Various other
switches are downstream of this switch.

Check the MAC capacity. If it overburdened the switch starts to work
like the hub. Thus collisions can kill all domain of collisions.

Quote:
I plan on hanging a hub between a downstream switch and its uplink to
the 48 port switch and then using ethereal to analyse what's happening.

Bad Thing (TM). I have old 10Mb hub only. Thus it has bottleneck
without doubts. Are you lucky and have 100Mb hub at least?

HTH,
agkhram
--
status = rule("RFC1855"); /* avoid flame wars */
Back to top
BC
Guest





Posted: Thu Nov 10, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Thanks for the response guys. To the best of my knowledge nothing has
changed on the network. No new hubs / switches, no new cabling, not
even new workstations. The hangs seem to occur about once per week,
during working hours, but not at periods of peak activity. Reboot of
the 48 port switch temporarily solves the problem. I'm going to try
swapping out the switch this evening and see how things go....

Thanks again.

BC
Back to top
James Knott
Guest





Posted: Fri Nov 11, 2005 7:37 am    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

BC wrote:

Quote:
I would like to inspect what's going on at the switch to try and get to
the bottom of the failure. I've downloaded and installed ethereal,
however, I have no idea what I should be looking for in the log files.
Can anyone help?

I assume you mean no traffic is getting through? With ethereal, you can
monitor on one computer, while pinging it from others. Does it get
through? When you ping, can you see lights flashing on the NICs and
switch? If nothing's getting through the switch and only rebooting clears
the problem, I'd say the switch is NFG.

As always with troubleshooting, take things one step at a time and verify
the simple stuff first.
Back to top
bkbigpond
Guest





Posted: Tue Nov 15, 2005 4:35 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

"BC" <bcharlton@pchenderson.com> wrote in message
news:1131627636.314975.279840@g43g2000cwa.googlegroups.com...
Quote:
Hi there

Hope some kind soul out there can help or point me in the right
direction.

For the last month or so our users have experienced a network failure
about once per week. Rebooting the main 48 port unmanaged switch
(netgear) resolves the problem.

I would like to inspect what's going on at the switch to try and get to
the bottom of the failure. I've downloaded and installed ethereal,
however, I have no idea what I should be looking for in the log files.
Can anyone help?

At present we have our W2K servers running on a copper gigabit switch,
this switch is then connected to the 48 port switch. Various other
switches are downstream of this switch.

I plan on hanging a hub between a downstream switch and its uplink to
the 48 port switch and then using ethereal to analyse what's happening.

However, I've no idea what I should be looking for! Can anyone help
please??

Thanks

BC


Can you have the switch send syslog messages? Is there any spanning-tree
issues .. loops forming etc. I'm currently troubleshooting a Cisco
environment (6509, 6513 core switches). For no apparent reason (no changes
being made) the network has ground to a halt twice. Very hard to determine
what's going on if there's no logging available. External logging is ok but
can only happen while the network/switch is operational. Local debugging /
logging is better but if you have to shutdown a switch in a hurry (to
restore network services) you lose what logging was there.

BernieM
Back to top
Robert Redelmeier
Guest





Posted: Tue Dec 06, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

BC <bcharlton@pchenderson.com> wrote:
Quote:
Hmmm.....replaced the suspect 48 port switch with a spare
24 port switch (turns out only 23 ports were in use on the
48 port). Everything fine for 25 days, then bang, network
failure again yesterday. Reset of the switch restores
connectivity. The only thing I haven't change on this level
is the server gigaswitch to which the 48porter uplinks.
I intend to test this out with a spare at the weekend.
If this fails I'm at a total lost. I'm pretty sure there are
no physical loops in the network. The fact that the
switches are unmanaged makes it difficult to troubleshoot.


As you say, difficult. One reset in 25 days isn't that horrible,
but may not be acceptable in a commercial environment.

There are two general causes of switches needing resets:
hardware and software. The hardware side would be things like
static electricity, lightening, poor grounding/interbuilding
These can also be permanent failures.

The software side of things is more likely to be cause by
unpredicted behaviour from high loads, buffer overflows,
evil packets. Make sure jumbo packets are turned off.

-- Robert
Back to top
BC
Guest





Posted: Tue Dec 06, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Fans seem to be spinning round nicely / clear of crap. Environmental
conditions are static. Thanks for the response tho!
Back to top
BC
Guest





Posted: Tue Dec 06, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Hmmm.....replaced the suspect 48 port switch with a spare 24 port
switch (turns out only 23 ports were in use on the 48 port).
Everything fine for 25 days, then bang, network failure again
yesterday. Reset of the switch restores connectivity. The only thing
I haven't change on this level is the server gigaswitch to which the
48porter uplinks. I intend to test this out with a spare at the
weekend. If this fails I'm at a total lost. I'm pretty sure there are

no physical loops in the network. The fact that the switches are
unmanaged makes it difficult to troubleshoot. Any further
troubleshooting advice would be appreciated!
Back to top
William P.N. Smith
Guest





Posted: Tue Dec 06, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

"BC" <bcharlton@pchenderson.com> wrote:
Quote:
troubleshooting advice would be appreciated!

Well, since I'm a hardware guy, is there any chance that the cooling
fan(s) are {clogged, stopped, blocked} or that the closet the switch
is in is warmer than before? Just a random thought...
Back to top
BC
Guest





Posted: Wed Dec 07, 2005 4:17 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Thanks for the reply Robert.

Switches are unmanaged therefore no way of connecting to the
switch in or out of band to view / manage, can't see if ARP tables
are overflowed etc. I'm suspecting either screwy internal ARP / MAC
tables or broadcast
storm. (or DoS attack?!)

I'm currently hanging ethereal off of a hub connected to the switch to
see if there's an issue with broadcast traffic.

As the problem is relatively infrequent I think a weekly switch reboot
procedure is probably the most pragmatic thing I can do. I'm hoping to

replace the backbone with a L2 managed switch early next year, which
might help me gain further insight into the problem.
Back to top
Paul Vacquier
Guest





Posted: Wed Dec 07, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Robert Redelmeier wrote:
Quote:
BC <bcharlton@pchenderson.com> wrote:
Hmmm.....replaced the suspect 48 port switch with a spare
24 port switch (turns out only 23 ports were in use on the
48 port). Everything fine for 25 days, then bang, network
failure again yesterday. Reset of the switch restores
connectivity. The only thing I haven't change on this level
is the server gigaswitch to which the 48porter uplinks.
I intend to test this out with a spare at the weekend.
If this fails I'm at a total lost. I'm pretty sure there are
no physical loops in the network. The fact that the
switches are unmanaged makes it difficult to troubleshoot.


As you say, difficult. One reset in 25 days isn't that horrible,
but may not be acceptable in a commercial environment.

There are two general causes of switches needing resets:
hardware and software. The hardware side would be things like
static electricity, lightening, poor grounding/interbuilding
These can also be permanent failures.

The software side of things is more likely to be cause by
unpredicted behaviour from high loads, buffer overflows,
evil packets. Make sure jumbo packets are turned off.

-- Robert

There's not a chance that they are running a job locally that could
introduce a 'spike' is there?
Thinking of, say, payroll running up a 'burster' or the welding shop firing
up that old MIG welder
that only gets fired up once a month or so for the 'special jobs'?
Could you run a mains supply tester and look for brown? outs - or spikes
that could be picked up by the equipment.

May be that they are running a job that's chucking a massive load of data to
the network (just a straw - really wouldn't expect that)
--
regards
PPV.
--
Paul Vacquier | Tel: +44 1245 242229
BAE SYSTEMS Advanced Technology Centres | Fax: +44 1245 475244
Great Baddow, Chelmsford, CM2 8HN UK | paul.vacquier@b1a2e3systems.com
-----------------------------------------------------------------------
"This advice is offered in good faith but shall not be contractually
binding on and does not necessarily reflect the view of
CSC(Computer Sciences Corporation)."
Back to top
BC
Guest





Posted: Wed Dec 07, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Thanks for the response Paul. Interesting idea about data "spikes".
I'm not aware of any high network utlising apps that are coinciding
with the network outages, however, it is a distinct possibility. We
have a document management app that IS used infrequently, and if used
incorrectly, can fire enormous amounts of data across the wire. I
think this is worth investigating.

I don't think theres a problem with the computer closet electrics.
I've recently had these tested in lieu of a new backup generator and
nothing has shown up. The actual hardware is protected by UPS with
surge protectors. Although again power spikes could be an issue. I'll
ask our sparkies to monitor over a longer period.

Thanks for your valid input!
Back to top
BC
Guest





Posted: Wed Dec 07, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

Thanks for the response Paul. Interesting idea about data "spikes".
I'm not aware of any high network utlising apps that are coinciding
with the network outages, however, it is a distinct possibility. We
have a document management app that IS used infrequently, and if used
incorrectly, can fire enormous amounts of data across the wire. I
think this is worth investigating.

I don't think theres a problem with the computer closet electrics.
I've recently had these tested in lieu of a new backup generator and
nothing has shown up. The actual hardware is protected by UPS with
surge protectors. Although again power spikes could be an issue. I'll
ask our sparkies to monitor over a longer period.

Thanks for your valued input!
Back to top
Robert Redelmeier
Guest





Posted: Wed Dec 07, 2005 5:20 pm    Post subject: Re: Network Failure - No Idea How to Troubleshoot Reply with quote

BC <bcharlton@pchenderson.com> wrote:
Quote:
if ARP tables are overflowed etc. I'm suspecting either
screwy internal ARP / MAC tables or broadcast storm. (or
DoS attack?!)

I doubt something as common as ARP table overflow would
cause this. However, you could easly get a broadcast storm
if older computers running NetBEUI protocols get connected
and start trying to access network resources.

Quote:
I'm currently hanging ethereal off of a hub connected to
the switch to see if there's an issue with broadcast traffic.

Broadcast traffic should be received on all ports.

Quote:
As the problem is relatively infrequent I think a weekly
switch reboot procedure is probably the most pragmatic
thing I can do.

It is pragmatic, but I'm not sure it will help much. Something
happens on your network to cause the hang. Up until it
happens, everything is likely fine and the reboots do nothing.
This isan't a case of slow deterioration.

Quote:
replace the backbone with a L2 managed switch early next
year, which might help me gain further insight into the
problem.

That will help. Please remember that cheap unmanaged switches
are just that. They're not meant for much cascading. I would
also try to stick to the same manufacturer.

-- Robert
Back to top
 
Post new topic   Reply to topic    DComTalk.com Forum Index -> Ethernet All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Solutions: Telephone Systems Electronics Satellite TV Tech & Gadgets
Powered by phpBB