[sorta OT] LAN reliability ?
DComTalk.com Forum Index DComTalk.com
Discussion of VoIP, VPN, Video Conferencen, DSL and other data commucations.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web dcomtalk.com
[sorta OT] LAN reliability ?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    DComTalk.com Forum Index -> Ethernet
Author Message
Walter Roberson
Guest





Posted: Mon Apr 18, 2005 1:38 pm    Post subject: [sorta OT] LAN reliability ? Reply with quote

Recently, I had it put to me that LANs (and firewalls) should be 100%
reliable (barring major equipment failure) -- that networks & security
should be about as reliable as the electrical mains (i.e., something
that can taken for granted nearly all the time, and repairs should take
only a few minutes.)

I was informed that "millions of businesses every day" have that
kind of LAN reliability.

Is that level of reliability the norm in real SMBs, with 500-ish
hosts, multiple subnets, and a mandatory deny-by-default firewall
policy?

Which is the truer picture in a growing organization with fluid network
access requirements: that the network & security person has barely
anything to do because they set up the equipment "right" the
first time? Or that keeping up with the network & security
changes and failures and planning is more than a full-time job
that can involve many a late night (or marathon repair session)?

How much truth is there, in real organizations, to those old
cartoons of a skeleton with cobwebs in front of a computer terminal,
with the caption "The network's down again." ?


It seems to me that more than once I've been in a major bank and been
told "The network's down", and no-one, staff or customer, seemed
surprised. I also seem to recall hearing a number of casual
conversations along the lines of "Oh yeah, the network
went down again at work today"... and I don't recall
hearing anyone reply "Our network never goes down"... not
for anything short of a Service Provider.


Lastly: has anyone observed a network "freak out", with a series
of normally reliable devices getting confused and staying confused
all through hours of standard problem isolation procedures, with no
discernable reason for the multiple failures -- and for the devices
to eventually settle down, and start working properly with
configurations that didn't work before?
--
"This was a Golden Age, a time of high adventure, rich living and
hard dying... but nobody thought so." -- Alfred Bester, TSMD
Back to top
Vincent C Jones
Guest





Posted: Mon Apr 18, 2005 1:41 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

Too many topics to address in a bottom posted response. Some comments
in-line...

In article <d3vrn2$6or$1@canopus.cc.umanitoba.ca>,
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Quote:
Recently, I had it put to me that LANs (and firewalls) should be 100%
reliable (barring major equipment failure) -- that networks & security
should be about as reliable as the electrical mains (i.e., something
that can taken for granted nearly all the time, and repairs should take
only a few minutes.)

The physical network infrastructure _SHOULD_ be extremely
reliable. However, what it should be and what it is are two different
things. The most common failure is a broken patch cable at the user
end, but regardless of cause, if manual troubleshooting is required,
MTTR can be abysmally long. FWIW, I've had more trouble with power
than with networking. Here in the North East US, power cannot be
taken for granted...

Quote:
I was informed that "millions of businesses every day" have that
kind of LAN reliability.

Is that level of reliability the norm in real SMBs, with 500-ish
hosts, multiple subnets, and a mandatory deny-by-default firewall
policy?

Which is the truer picture in a growing organization with fluid network
access requirements: that the network & security person has barely
anything to do because they set up the equipment "right" the
first time? Or that keeping up with the network & security
changes and failures and planning is more than a full-time job
that can involve many a late night (or marathon repair session)?

How much truth is there, in real organizations, to those old
cartoons of a skeleton with cobwebs in front of a computer terminal,
with the caption "The network's down again." ?

Aha, here is where you are being misled... When a user says "the network
is down" 99.9% of the time, the network is still up and it is the
application or server that they are using which has died. Think of how
many "network" failures are cured by rebooting the PC, then tell me how
that action can impact the cabling in the wall, hubs, routers, etc.

Quote:
It seems to me that more than once I've been in a major bank and been
told "The network's down", and no-one, staff or customer, seemed
surprised. I also seem to recall hearing a number of casual
conversations along the lines of "Oh yeah, the network
went down again at work today"... and I don't recall
hearing anyone reply "Our network never goes down"... not
for anything short of a Service Provider.

WAN links have significant failure rates, but that is why redundancy and
backup links are used. In the case you cite, it is far more likely to be
a software problem at the application/database level than a network
infrastructure problem.

Quote:
Lastly: has anyone observed a network "freak out", with a series
of normally reliable devices getting confused and staying confused
all through hours of standard problem isolation procedures, with no
discernable reason for the multiple failures -- and for the devices
to eventually settle down, and start working properly with
configurations that didn't work before?

Yes, but there is almost always an explanation if you dig deep enough
into the problem. On the other hand, determining root cause can be time
and resource consuming, and most businesses are more interested in
ending the current problem than they are with preventing it from
happening again.

Warning: I have been called a "Network Management Bigot" for
requesting all sorts of monitoring. However, my experience has been
that if you look closely enough at how the network is ACTUALLY
running, you will often spot problems before they are manifested
as service outages. Examples range from marginal links which are
reporting only brief intermittent hiccups on their way to total
failure, to routing tables which indicate that the routes in use
are not the routes you designed with the high probability that
when something fails, the network will roll over and die rather
than select an alternate route.

Been there, done that, been burnt :-) But its been years since
I've had to worry about a network problem that couldn't wait until
morning to get fixed.

--
Vincent C Jones, Consultant Expert advice and a helping hand
Networking Unlimited, Inc. for those who want to manage and
Tenafly, NJ Phone: 201 568-7810 control their networking destiny
http://www.networkingunlimited.com
Back to top
J. Clarke
Guest





Posted: Mon Apr 18, 2005 4:19 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

Walter Roberson wrote:

Quote:
Recently, I had it put to me that LANs (and firewalls) should be 100%
reliable (barring major equipment failure) -- that networks & security
should be about as reliable as the electrical mains (i.e., something
that can taken for granted nearly all the time, and repairs should take
only a few minutes.)

I was informed that "millions of businesses every day" have that
kind of LAN reliability.

Is that level of reliability the norm in real SMBs, with 500-ish
hosts, multiple subnets, and a mandatory deny-by-default firewall
policy?

Which is the truer picture in a growing organization with fluid network
access requirements: that the network & security person has barely
anything to do because they set up the equipment "right" the
first time? Or that keeping up with the network & security
changes and failures and planning is more than a full-time job
that can involve many a late night (or marathon repair session)?

How much truth is there, in real organizations, to those old
cartoons of a skeleton with cobwebs in front of a computer terminal,
with the caption "The network's down again." ?


It seems to me that more than once I've been in a major bank and been
told "The network's down", and no-one, staff or customer, seemed
surprised. I also seem to recall hearing a number of casual
conversations along the lines of "Oh yeah, the network
went down again at work today"... and I don't recall
hearing anyone reply "Our network never goes down"... not
for anything short of a Service Provider.


Lastly: has anyone observed a network "freak out", with a series
of normally reliable devices getting confused and staying confused
all through hours of standard problem isolation procedures, with no
discernable reason for the multiple failures -- and for the devices
to eventually settle down, and start working properly with
configurations that didn't work before?

Don't confuse the reliability of the _network_ with the reliability of the
_system_.

When someone says "the network went down" they usually mean that the
_system_ went down, which may be a failure at Layer 4 or below but is much
more likely to be a server or application problem.

--
--John
to email, dial "usenet" and validate
(was jclarke at eye bee em dot net)
Back to top
Al Dykes
Guest





Posted: Mon Apr 18, 2005 4:20 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <d40a6s02ll@news2.newsguy.com>,
J. Clarke <jclarke.usenet@snet.net.invalid> wrote:
Quote:
Walter Roberson wrote:

Recently, I had it put to me that LANs (and firewalls) should be 100%
reliable (barring major equipment failure) -- that networks & security
should be about as reliable as the electrical mains (i.e., something
that can taken for granted nearly all the time, and repairs should take
only a few minutes.)

I was informed that "millions of businesses every day" have that
kind of LAN reliability.

Is that level of reliability the norm in real SMBs, with 500-ish
hosts, multiple subnets, and a mandatory deny-by-default firewall
policy?

Which is the truer picture in a growing organization with fluid network
access requirements: that the network & security person has barely
anything to do because they set up the equipment "right" the
first time? Or that keeping up with the network & security
changes and failures and planning is more than a full-time job
that can involve many a late night (or marathon repair session)?

How much truth is there, in real organizations, to those old
cartoons of a skeleton with cobwebs in front of a computer terminal,
with the caption "The network's down again." ?


It seems to me that more than once I've been in a major bank and been
told "The network's down", and no-one, staff or customer, seemed
surprised. I also seem to recall hearing a number of casual
conversations along the lines of "Oh yeah, the network
went down again at work today"... and I don't recall
hearing anyone reply "Our network never goes down"... not
for anything short of a Service Provider.


Lastly: has anyone observed a network "freak out", with a series
of normally reliable devices getting confused and staying confused
all through hours of standard problem isolation procedures, with no
discernable reason for the multiple failures -- and for the devices
to eventually settle down, and start working properly with
configurations that didn't work before?

Don't confuse the reliability of the _network_ with the reliability of the
_system_.

When someone says "the network went down" they usually mean that the
_system_ went down, which may be a failure at Layer 4 or below but is much
more likely to be a server or application problem.

--
--John
to email, dial "usenet" and validate
(was jclarke at eye bee em dot net)


Agreed. Or, for a business branch site it means that the leased line
to the corporate network is down. A competant company will haev
contingancy plans for this. It may be non-technical, like manual
proceedures.




--
a d y k e s @ p a n i x . c o m

Don't blame me. I voted for Gore.
Back to top
J. Clarke
Guest





Posted: Mon Apr 18, 2005 4:20 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

Al Dykes wrote:

Quote:
In article <d3vrn2$6or$1@canopus.cc.umanitoba.ca>,
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Recently, I had it put to me that LANs (and firewalls) should be 100%
reliable (barring major equipment failure) -- that networks & security
should be about as reliable as the electrical mains (i.e., something
that can taken for granted nearly all the time, and repairs should take
only a few minutes.)

I was informed that "millions of businesses every day" have that
kind of LAN reliability.

Is that level of reliability the norm in real SMBs, with 500-ish
hosts, multiple subnets, and a mandatory deny-by-default firewall
policy?

Which is the truer picture in a growing organization with fluid network
access requirements: that the network & security person has barely
anything to do because they set up the equipment "right" the
first time? Or that keeping up with the network & security
changes and failures and planning is more than a full-time job
that can involve many a late night (or marathon repair session)?

How much truth is there, in real organizations, to those old
cartoons of a skeleton with cobwebs in front of a computer terminal,
with the caption "The network's down again." ?


The answer to this question is specific to each company and can only
be determined as the result of a Business Risk Analysis and guidance
from the comany's senior management. In some cases governement
regulations about downtime apply. The analysis is freqently stated in
lost revenue and probabilities. It doesn't make sense to spend money
on technical fixes to some risks. In some cases a "loss of business"
insurance contract will be an appropriate way of adressing a risk.

The result of the risk analysis will tell the technical people what
the critical issues are for the company operations and customers. It
should also result in funding to meet the requirements.
operation.

There are many aspects to "non-stop/can't fail" operation and the
definition of non-stop is refined to the nature of your business. My
experience is with a Very Big Bank with retail ops in the New York
area with about 400 branches and a few thousand ATM machines. We had
several rules;

1. A bank ATM transaction , once acknowledged, can't be lost.
2. If one ATM is down another one nearby should be operational
3. A worst-case disaster in the main data center should not lose any data
and result in no more than 4 hours outage for lines of business other
than the branch banking system, which is covered by rule #1 and #2.

This is a drastic simplification of banking on the 80's. Rule 1 and 2
was addressed by using Tanden NonStop (tm) minicomputers to control
clusters of ATM machines. (PC weenies today don't know how rock solid
Tandem and VAX/VMS computers are, and were as far back as 1980.)

Rule 2 was further addressed by having 5 regional datacenters, each
with two mainframes (one a backup) which controlled geagraphical areas
of ATMs. If one of the data centers burned up the public would be
directed to drive a few miles to an operational area. This was deemed
by management as an acceptable risk/cost tradeoff.

Rule 3 was addressd by a duplicte of the main data center in another
state and if the main DC burned down the whole branch system and ATM
machines would still operate unassisted for about a day. We had 4
hours (per banking regs) to get the backup data ceneter up and
running. We did that on a regular basis. Data loss was prevented by
not giving the customer his acknowledgement until the databases at the
main and backup datacenters had acknowledged the update and were in
sync. Nobody said high reliability was cheap.

If the network in a branch was dead it was equivalent to the Utility
Power company having a bad day on that street. A "branch closed" sign
would be hung on the door with directions to the nearest operating
branch. Scope of failure is a big part of risk analysis and technical
failures are just one reason of many for a point outage and things
have to be kept in perspective.

9/11 and the recent Norteast Blackout have made contingency planning
experts update their quidlines for critical business operations. I
have a copy of it somewhere. It was summarized in Sysadmin Magazine
Nov 2004 issue. http://www.samag.com/articles/2004/0411/

It seems to me that more than once I've been in a major bank and
been told "The network's down", and no-one, staff or customer, seemed
surprised. I also seem to recall hearing a number of casual
conversations along the lines of "Oh yeah, the network went down
again at work today"... and I don't recall hearing anyone reply "Our
network never goes down"... not for anything short of a Service
Provider.

It (a) wasn't a business-wide outage, (b) they had manual proceedures
for essential tasks and (c) people could go to the next branch.


Lastly: has anyone observed a network "freak out", with a series
of normally reliable devices getting confused and staying confused
all through hours of standard problem isolation procedures, with no
discernable reason for the multiple failures -- and for the devices
to eventually settle down, and start working properly with
configurations that didn't work before?
--

What's a "network"?

This paragraph is too vague to address. For starters it depends on how
big and complex your network is and on what tools you have to measure
and analyse your network. If you have no management tools then there
could certainly be scenarios as bad as you describe. IME a rogue DHCP
server on a laptop can bring down a network and be very hard to find
without tools. I've seen an intermittant trojan on one PC spewing data
to the Public Internet bring down a company in the damndest way
because it saturated the uplink bandwidth and appeard to be a flakey
ISP link until we understood what was going on. That site had NO
managed hubs (against my recommendation). These would have allowed ne
to identify and fix the problem in minutes instead of a _very_ long
weekend.

Ethernet infrastructure (cable and patch panels) is designed to be
very reliable and once a CAT5 drop is shown to be working is the last
thing to assume to have failed when head scratching is going on.
Modern CAT5 wiring also fits the risk control principle in that there
is no network-wide failure mode. One cable _might_ go bad, but there
is now way, short of the cat pissing on a punchdown block of multiple
drops going bad at once. If your CAT5 infrustructure is unreliable
it's because it was done by an incompetant installer and you should
budget to get a pro in to make a recomendation.

And it's distressing the number of "pros" who aren't. One outfit that
contracted to do some work for one of my clients turned out not to consider
a post-installation certification scan to be part of the process. I
finally gave up arguing with them and scanned it myself. They also made a
big deal about being affiliated with Lucent. When asked to deliver the
paperwork for the Lucent warranty it turned out that Lucent had never heard
of them. The distressing thing about this bunch was that they were
teaching network installation all over the state at the state technical
colleges. Unfortunately my client was not litigious.

Quote:
Google for "business conringency planning" and "risk analysis"
and you'll get lots of hits.


--
--John
to email, dial "usenet" and validate
(was jclarke at eye bee em dot net)
Back to top
Robert Redelmeier
Guest





Posted: Mon Apr 18, 2005 4:20 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Quote:
Recently, I had it put to me that LANs (and firewalls) should be
100% reliable (barring major equipment failure) -- that networks
& security should be about as reliable as the electrical mains
(i.e., something that can taken for granted nearly all the time,
and repairs should take only a few minutes.)

You've received some good answers, and I'll add my 3 cents (CDN).

As many have said -- two different questions here. Networks are
generally very reliable. The occasional fried port or hung router.

Security is a _whole_ 'nother thing. Ideally, the network should be
fully open, and the devices (computers/servers) secure. No network
security required. With Microsoft products so insecure, the network
is called to help provide security by closing down. This is a PITA,
and doesn't stop trojans which the network is falsely blamed for.

Quote:
I was informed that "millions of businesses every day"
have that kind of LAN reliability.

They do. Mostly small businesses with a few printers
and a file server. The problem is that MS scales horribly.

Quote:
Is that level of reliability the norm in real SMBs,
with 500-ish hosts, multiple subnets, and a mandatory
deny-by-default firewall policy?

No. If single server uptime is 99.0% from random causes, and you
have 10 servers, only 90% of the time do you have all 10 servers.

Quote:
a late night (or marathon repair session)?

This is the norm. People are loaded until they break.

Quote:
and I don't recall hearing anyone reply "Our network never
goes down"... not for anything short of a Service Provider.

The network hardware at my home & work almost never go down.
I can almost always access Unix & other Linux-like hosts.
However, fairly frequently MS Windows desktops are out of action.

-- Robert
Back to top
Al Dykes
Guest





Posted: Mon Apr 18, 2005 4:20 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <rNP8e.264$yd7.11@newssvr11.news.prodigy.com>,
Robert Redelmeier <redelm@ev1.net.invalid> wrote:
Quote:
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Recently, I had it put to me that LANs (and firewalls) should be
100% reliable (barring major equipment failure) -- that networks
& security should be about as reliable as the electrical mains
(i.e., something that can taken for granted nearly all the time,
and repairs should take only a few minutes.)

You've received some good answers, and I'll add my 3 cents (CDN).

As many have said -- two different questions here. Networks are
generally very reliable. The occasional fried port or hung router.

Security is a _whole_ 'nother thing. Ideally, the network should be
fully open, and the devices (computers/servers) secure. No network
security required. With Microsoft products so insecure, the network
is called to help provide security by closing down. This is a PITA,
and doesn't stop trojans which the network is falsely blamed for.

I was informed that "millions of businesses every day"
have that kind of LAN reliability.

They do. Mostly small businesses with a few printers
and a file server. The problem is that MS scales horribly.

Is that level of reliability the norm in real SMBs,
with 500-ish hosts, multiple subnets, and a mandatory
deny-by-default firewall policy?

No. If single server uptime is 99.0% from random causes, and you
have 10 servers, only 90% of the time do you have all 10 servers.

a late night (or marathon repair session)?

This is the norm. People are loaded until they break.

and I don't recall hearing anyone reply "Our network never
goes down"... not for anything short of a Service Provider.

There are lots of parts in a "network" (your word).

There are Data Center clusters that have been providing literally
uninterupted service for years and there may be a Tandem system that
has been running for a decade with no downtime. These systems can fix
hardware and software on the fly. The limiting factors to uptime can
be company mergers and relocations and fuel for the generators.

Today the technology is highly distributed web servers based on BEA
WebLogicServer and IBM WebSphere running on many servers at multiple
locations.

The current phrase is "carrier grade" (Telephone industry terminology)
for computer systems that deliver "5 nines" uptime (99.999%) and the
ability to swap hardware and do software upgrades without service
disruption. That's still 5 minutes/year.


Quote:

The network hardware at my home & work almost never go down.
I can almost always access Unix & other Linux-like hosts.
However, fairly frequently MS Windows desktops are out of action.




--
a d y k e s @ p a n i x . c o m

Don't blame me. I voted for Gore.
Back to top
Al Dykes
Guest





Posted: Mon Apr 18, 2005 4:20 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <d3vrn2$6or$1@canopus.cc.umanitoba.ca>,
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Quote:
Recently, I had it put to me that LANs (and firewalls) should be 100%
reliable (barring major equipment failure) -- that networks & security
should be about as reliable as the electrical mains (i.e., something
that can taken for granted nearly all the time, and repairs should take
only a few minutes.)

I was informed that "millions of businesses every day" have that
kind of LAN reliability.

Is that level of reliability the norm in real SMBs, with 500-ish
hosts, multiple subnets, and a mandatory deny-by-default firewall
policy?

Which is the truer picture in a growing organization with fluid network
access requirements: that the network & security person has barely
anything to do because they set up the equipment "right" the
first time? Or that keeping up with the network & security
changes and failures and planning is more than a full-time job
that can involve many a late night (or marathon repair session)?

How much truth is there, in real organizations, to those old
cartoons of a skeleton with cobwebs in front of a computer terminal,
with the caption "The network's down again." ?


The answer to this question is specific to each company and can only
be determined as the result of a Business Risk Analysis and guidance
from the comany's senior management. In some cases governement
regulations about downtime apply. The analysis is freqently stated in
lost revenue and probabilities. It doesn't make sense to spend money
on technical fixes to some risks. In some cases a "loss of business"
insurance contract will be an appropriate way of adressing a risk.

The result of the risk analysis will tell the technical people what
the critical issues are for the company operations and customers. It
should also result in funding to meet the requirements.
operation.

There are many aspects to "non-stop/can't fail" operation and the
definition of non-stop is refined to the nature of your business. My
experience is with a Very Big Bank with retail ops in the New York
area with about 400 branches and a few thousand ATM machines. We had
several rules;

1. A bank ATM transaction , once acknowledged, can't be lost.
2. If one ATM is down another one nearby should be operational
3. A worst-case disaster in the main data center should not lose any data
and result in no more than 4 hours outage for lines of business other
than the branch banking system, which is covered by rule #1 and #2.

This is a drastic simplification of banking on the 80's. Rule 1 and 2
was addressed by using Tanden NonStop (tm) minicomputers to control
clusters of ATM machines. (PC weenies today don't know how rock solid
Tandem and VAX/VMS computers are, and were as far back as 1980.)

Rule 2 was further addressed by having 5 regional datacenters, each
with two mainframes (one a backup) which controlled geagraphical areas
of ATMs. If one of the data centers burned up the public would be
directed to drive a few miles to an operational area. This was deemed
by management as an acceptable risk/cost tradeoff.

Rule 3 was addressd by a duplicte of the main data center in another
state and if the main DC burned down the whole branch system and ATM
machines would still operate unassisted for about a day. We had 4
hours (per banking regs) to get the backup data ceneter up and
running. We did that on a regular basis. Data loss was prevented by
not giving the customer his acknowledgement until the databases at the
main and backup datacenters had acknowledged the update and were in
sync. Nobody said high reliability was cheap.

If the network in a branch was dead it was equivalent to the Utility
Power company having a bad day on that street. A "branch closed" sign
would be hung on the door with directions to the nearest operating
branch. Scope of failure is a big part of risk analysis and technical
failures are just one reason of many for a point outage and things
have to be kept in perspective.

9/11 and the recent Norteast Blackout have made contingency planning
experts update their quidlines for critical business operations. I
have a copy of it somewhere. It was summarized in Sysadmin Magazine
Nov 2004 issue. http://www.samag.com/articles/2004/0411/

Quote:
It seems to me that more than once I've been in a major bank and
been told "The network's down", and no-one, staff or customer, seemed
surprised. I also seem to recall hearing a number of casual
conversations along the lines of "Oh yeah, the network went down
again at work today"... and I don't recall hearing anyone reply "Our
network never goes down"... not for anything short of a Service
Provider.

It (a) wasn't a business-wide outage, (b) they had manual proceedures
for essential tasks and (c) people could go to the next branch.

Quote:

Lastly: has anyone observed a network "freak out", with a series
of normally reliable devices getting confused and staying confused
all through hours of standard problem isolation procedures, with no
discernable reason for the multiple failures -- and for the devices
to eventually settle down, and start working properly with
configurations that didn't work before?
--

What's a "network"?

This paragraph is too vague to address. For starters it depends on how
big and complex your network is and on what tools you have to measure
and analyse your network. If you have no management tools then there
could certainly be scenarios as bad as you describe. IME a rogue DHCP
server on a laptop can bring down a network and be very hard to find
without tools. I've seen an intermittant trojan on one PC spewing data
to the Public Internet bring down a company in the damndest way
because it saturated the uplink bandwidth and appeard to be a flakey
ISP link until we understood what was going on. That site had NO
managed hubs (against my recommendation). These would have allowed ne
to identify and fix the problem in minutes instead of a _very_ long
weekend.

Ethernet infrastructure (cable and patch panels) is designed to be
very reliable and once a CAT5 drop is shown to be working is the last
thing to assume to have failed when head scratching is going on.
Modern CAT5 wiring also fits the risk control principle in that there
is no network-wide failure mode. One cable _might_ go bad, but there
is now way, short of the cat pissing on a punchdown block of multiple
drops going bad at once. If your CAT5 infrustructure is unreliable
it's because it was done by an incompetant installer and you should
budget to get a pro in to make a recomendation.

Google for "business conringency planning" and "risk analysis"
and you'll get lots of hits.

--
a d y k e s @ p a n i x . c o m

Don't blame me. I voted for Gore.
Back to top
Walter Roberson
Guest





Posted: Mon Apr 18, 2005 4:20 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

Thanks for the reply, Al.

In article <d40ae9$khj$1@panix5.panix.com>, Al Dykes <adykes@panix.com> wrote:

Quote:
Rule 3 was addressd by a duplicte of the main data center in another
state and if the main DC burned down the whole branch system and ATM
machines would still operate unassisted for about a day. We had 4
hours (per banking regs) to get the backup data ceneter up and
running. We did that on a regular basis.

If you have a moment, I'd appreciate an expansion on that "regular
basis". I'm not quite sure whether you are saying that:

1) it was not uncommon to need to fall back to the backup data center
in response to some trouble issue; or

2) you regularily tested the fallback procedures ("fire drills"); or

3) because of issues like scheduled maintenance, backups, and the like,
that it was not uncommon to activate the duplicate center as a routine
business continuity mechanism; or

4) on the relatively few occasions when it was necessary to fallback,
that you were repeatably able to do so comfortably within the
four-hour window ?

Or to put things another way, are you saying that even with all
the reliability planning that the backup data centre had to be kicked up
in response to a problem, or are you saying that failovers
were no big thing on the occasions they were needed?
--
"This was a Golden Age, a time of high adventure, rich living and
hard dying... but nobody thought so." -- Alfred Bester, TSMD
Back to top
J. Clarke
Guest





Posted: Mon Apr 18, 2005 7:57 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

Robert Redelmeier wrote:

Quote:
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Recently, I had it put to me that LANs (and firewalls) should be
100% reliable (barring major equipment failure) -- that networks
& security should be about as reliable as the electrical mains
(i.e., something that can taken for granted nearly all the time,
and repairs should take only a few minutes.)

You've received some good answers, and I'll add my 3 cents (CDN).

As many have said -- two different questions here. Networks are
generally very reliable. The occasional fried port or hung router.

Security is a _whole_ 'nother thing. Ideally, the network should be
fully open, and the devices (computers/servers) secure. No network
security required. With Microsoft products so insecure, the network
is called to help provide security by closing down. This is a PITA,
and doesn't stop trojans which the network is falsely blamed for.

Whoa. The right approach is defense in depth. Horror story about a "fully
open network". Traffic was grinding to a halt on a particular network
operated by a university. Seems that there were several connections to
different Internet service providers. The network, including microwave
links etc, spanned most of the distance between New York and Boston, and
the ISP connections were T1 or faster (this was back when a 10 mb/sec
network was still hot stuff), so guess what most of the traffic on the
"fully open" network was.

Quote:
I was informed that "millions of businesses every day"
have that kind of LAN reliability.

They do. Mostly small businesses with a few printers
and a file server. The problem is that MS scales horribly.

Is that level of reliability the norm in real SMBs,
with 500-ish hosts, multiple subnets, and a mandatory
deny-by-default firewall policy?

No. If single server uptime is 99.0% from random causes, and you
have 10 servers, only 90% of the time do you have all 10 servers.

99% uptime is piss poor. That's one minute of outage every hour and a half
or so. Any server on which that happens is broken.

Further, having a server out may not affect system reliability at all. With
that many servers I would hope that you have some redundancy implemented.

Quote:
a late night (or marathon repair session)?

This is the norm. People are loaded until they break.

Maybe the norm where you are. Perhaps you need to look at why your system
is so unreliable. And if you're focussed on "Windows" and think that
eliminating Windows would solve the problem then you're not really looking
at the problem.

Quote:
and I don't recall hearing anyone reply "Our network never
goes down"... not for anything short of a Service Provider.

The network hardware at my home & work almost never go down.
I can almost always access Unix & other Linux-like hosts.
However, fairly frequently MS Windows desktops are out of action.

If you're running XP and your desktop machines in a place of business are
"out of action" "fairly frequently" you need to find out why and fix it.

Quote:
-- Robert

--
--John
to email, dial "usenet" and validate
(was jclarke at eye bee em dot net)
Back to top
Walter Roberson
Guest





Posted: Mon Apr 18, 2005 9:33 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <d40ae9$khj$1@panix5.panix.com>, Al Dykes <adykes@panix.com> wrote:
Quote:
In article <d3vrn2$6or$1@canopus.cc.umanitoba.ca>,

Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:

It seems to me that more than once I've been in a major bank and
been told "The network's down", and no-one, staff or customer, seemed
surprised.

It (a) wasn't a business-wide outage, (b) they had manual proceedures
for essential tasks and (c) people could go to the next branch.

I think you are pointing out here that the observed failures fit
within the parameters of a well-planned business risk model.

My mention of banks was only partially contextual. I would have
predicted that for banks (and other major businesses) that most
customers would expect and tolerate near-zero failure. But that's
not what I actually observe in practice: instead I observe that
people sort of sign a bit, but don't start raving about
"Why can't you people keep your computers up?!?" If the lineups
move noticably more slowly than the customers are accustomed to,
some of them get frustrated at the extra time -- but I don't hear
them getting frustrated at the "incompetence" of the bank's systems.

Thus, what I seem to observe is that most people appear to be
"socialized" to think systems/network problems are a fact of life, an
inconvenience but something to be expected, like the way a traffic
accident can slow down a highway. I have heard the occasional
complaint ("I tried to pay my bills but I couldn't because the
bank computers were down") -- but I hear more people complain
(and more bitterly) about the busses being late or about traffic jams --
or about the power having failed and they have to go around and
reset all their VCR clocks

And if people have become socialized to systems/network problems then
that suggests that network/server problems are "normal" in many businesses --
as opposed to the mental model that networks/systems are rarely a problem
most places and any operation which falls short of that has probably
been designed or managed incorrectly.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Back to top
Al Dykes
Guest





Posted: Mon Apr 18, 2005 9:54 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <d40luv$7a9$1@canopus.cc.umanitoba.ca>,
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Quote:
Thanks for the reply, Al.

In article <d40ae9$khj$1@panix5.panix.com>, Al Dykes <adykes@panix.com> wrote:

Rule 3 was addressd by a duplicte of the main data center in another
state and if the main DC burned down the whole branch system and ATM
machines would still operate unassisted for about a day. We had 4
hours (per banking regs) to get the backup data ceneter up and
running. We did that on a regular basis.

If you have a moment, I'd appreciate an expansion on that "regular
basis". I'm not quite sure whether you are saying that:

1) it was not uncommon to need to fall back to the backup data center
in response to some trouble issue; or

2) you regularily tested the fallback procedures ("fire drills"); or

3) because of issues like scheduled maintenance, backups, and the like,
that it was not uncommon to activate the duplicate center as a routine
business continuity mechanism; or

4) on the relatively few occasions when it was necessary to fallback,
that you were repeatably able to do so comfortably within the
four-hour window ?

Or to put things another way, are you saying that even with all
the reliability planning that the backup data centre had to be kicked up
in response to a problem, or are you saying that failovers
were no big thing on the occasions they were needed?
--
"This was a Golden Age, a time of high adventure, rich living and
hard dying... but nobody thought so." -- Alfred Bester, TSMD


Policy said that we did a genuine drill every 6 months unless events
in the Real World caused us to use the backup. In the late 80's in
Manhattan there were enough little disasters that we switched data
centers on a regular basis and rarely had to do full fire drills.
After every event there was a post-mortem analysis to see what didn't
work and what we could have done better. In an operation this complex
some (hopefully) little thing doesn't go as expected. We switched to
backup site whenever it made sense. It was a straight forward
operation. We had huge ringbinders with contingency plans for
different scenarious.

I'm out of this now but I understand that this newfangled thing called
the Internet and the experience of 9/11 shows that the hot/standby
pair strategy is weak and both sites need to be working in production
capacity in parallel to be able to say to your Chairman that you're as
ready as you can be for the next disaster.

My working scenario when I had to explain disaster scenario planning
was that the Vogons would lift our main operations building (or our
backup site) off the planet, with data and staff, instantly with no
notice and we needed to continue to meet business obligations when
that happened. Once you've planned for this every other scenario is
covered and if you try to enumerate all the possible little disasters
and plan for them individuually you're going to miss something and get
bit by reality someday.

Business Contigency Planning is a recognized job description.

--
a d y k e s @ p a n i x . c o m

Don't blame me. I voted for Gore.
Back to top
Al Dykes
Guest





Posted: Mon Apr 18, 2005 10:00 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <d40ng7$8v8$1@canopus.cc.umanitoba.ca>,
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
Quote:
In article <d40ae9$khj$1@panix5.panix.com>, Al Dykes <adykes@panix.com> wrote:
In article <d3vrn2$6or$1@canopus.cc.umanitoba.ca>,

Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:

It seems to me that more than once I've been in a major bank and
been told "The network's down", and no-one, staff or customer, seemed
surprised.

It (a) wasn't a business-wide outage, (b) they had manual proceedures
for essential tasks and (c) people could go to the next branch.

I think you are pointing out here that the observed failures fit
within the parameters of a well-planned business risk model.

My mention of banks was only partially contextual. I would have
predicted that for banks (and other major businesses) that most
customers would expect and tolerate near-zero failure. But that's
not what I actually observe in practice: instead I observe that
people sort of sign a bit, but don't start raving about
"Why can't you people keep your computers up?!?" If the lineups
move noticably more slowly than the customers are accustomed to,
some of them get frustrated at the extra time -- but I don't hear
them getting frustrated at the "incompetence" of the bank's systems.

Thus, what I seem to observe is that most people appear to be
"socialized" to think systems/network problems are a fact of life, an
inconvenience but something to be expected, like the way a traffic
accident can slow down a highway. I have heard the occasional
complaint ("I tried to pay my bills but I couldn't because the
bank computers were down") -- but I hear more people complain
(and more bitterly) about the busses being late or about traffic jams --
or about the power having failed and they have to go around and
reset all their VCR clocks

And if people have become socialized to systems/network problems then
that suggests that network/server problems are "normal" in many businesses --
as opposed to the mental model that networks/systems are rarely a problem
most places and any operation which falls short of that has probably
been designed or managed incorrectly.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers


Some good points here and banks may be differnt from, say, booking an
taking an airline flight, in that (a) people have such a grim expectation
of customer service that is very low and (b) online banking and ATMs
have meant that there are fewer "gotta get to the bank by 3PM" events.

If you book a flight, show up and find they don't have you in the
computer or have been overbooked you're going to be _much_ madder than
the bank scenario. Stuck in traffic is similar.

Windows 95 taught people to be tolerant of computer problems at work.


--
a d y k e s @ p a n i x . c o m

Don't blame me. I voted for Gore.
Back to top
Walter Roberson
Guest





Posted: Mon Apr 18, 2005 10:29 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

In article <d40mgf04kp@news3.newsguy.com>,
J. Clarke <jclarke.usenet@snet.net.invalid> wrote:

:If you're running XP and your desktop machines in a place of business are
:"out of action" "fairly frequently" you need to find out why and fix it.

I've been isolated for some years [this city is blooming nicely
in biotechnology, but the nearest "high tech city" is ~900 miles away].

Perhaps I don't get around as much as I should... but as best I recall,
I don't think I've ever met anyone who was actually skilled in
configuring and debugging and repairing MS Windows. I've met a number
of good unix/linux hackers, who could repair just about any software
problem -- but with MS Windows, having a good clue about the Registry
has been about the upper limit, after which the standard problem
resolution stream seems to be "Reinstall the application. Reinstall
Windows. Re-Ghost from a known-good system."

I'm certainly not trying to provoke a Unix vs Windows war here:
I'm asking more: Has my sample been biased? Is there a good
representation in IT of people who can -fix- MS Windows problems
beyond "Search the Knowledgebase and check out the registry, and if you
don't find the answer, then re-install?" And I certainly don't mean
to cast stones at MS Windows specialists with this question: I'm
asking seriously whether MS Windows gurus are uncommon or if I've
just not noticed them.
--
Entropy is the logarithm of probability -- Boltzmann
Back to top
Robert Redelmeier
Guest





Posted: Mon Apr 18, 2005 10:57 pm    Post subject: Re: [sorta OT] LAN reliability ? Reply with quote

J. Clarke <jclarke.usenet@snet.net.invalid> wrote:
Quote:
Whoa. The right approach is defense in depth.

When you need to defend. What to defend is a decision.

Quote:
so guess what most of the traffic on the "fully open"
network was.

Through traffic, of course. The Internet was designed that way.
the boundary routers could have been easily configured to drop
non-source/dest packets and it would have stopped. A better
solution would have been to negotiate with the various ISPs for
peering and/or cost to carry traffic. But that may have been
beyond the administrations skills.

Quote:
99% uptime is piss poor. That's one minute of outage every hour
and a half or so. Any server on which that happens is broken.

That isn't the usual granularity. It's more like 2 hours
every month or two, counting only core time. That includes
diagnosis time and is in addition to non-core hours server
routine maintenance and reboots to recover leaked memory.

Quote:
why your system is so unreliable. And if you're focussed on
"Windows" and think that eliminating Windows would solve
the problem then you're not really looking at the problem.

Well, yes. It's a long series of dependant chains. Any link
can break. I have no idea if Unix could be configured as
dependantly as MS-ActiveDirectory becomes when large.

Quote:
If you're running XP and your desktop machines in a place
of business are "out of action" "fairly frequently" you
need to find out why and fix it.

First, we do not use MS-WinXP. MS-Win2kPro is bad enough.
Second, failures are not global. People can usually work
at their desktops. But they lose access to some resources
like shared drives or email. Bizarrely, others are unaffected.

-- Robert
Back to top
 
Post new topic   Reply to topic    DComTalk.com Forum Index -> Ethernet All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Solutions: Telephone Systems Electronics Satellite TV Tech & Gadgets
Powered by phpBB