[Novalug] Software-based load balancing
Peter Larsen
plarsen at famlarsen.homelinux.com
Wed Mar 7 23:38:35 EST 2012
On Wed, 2012-03-07 at 22:29 -0500, Dave Greene wrote:
> I have to disagree with a bunch of items here and will provide a real world example.
:) And as real world examples go, we have to realize that different
requirements results in different architectures. That said - I don't
really think we're that far appart.
>
> > client -----> [firewall] ---->presentation (HTTP/JSP) ----->
> > [firewall] ----->business (Java EE/Spring) ------>database
>
> This is a fairly standard way of doing a lot of presentation/secure zone setups.
I would agree - the only difference between us is that I list databases
and the business tier on the same level. They're both part of the 3rd
tier and the app server (middle) calls out to each of the back-end
servers as needed. You even have solutions where you proxy/duplicate
the whole stack in different security zones. However, bottom line is you
only have one tier that manages your session and all the components of
the screen. I would never place "spring", JSF, CSS and all of this stuff
behind an additional firewall. It really doesn't bring you anything as
those components are readable almost as straight source on the client
side.
When it comes to firewalls I didn't write them on my little "diagram".
They are there between every layer.
> The presentation zone can be responsible for managing static and some non-confidential content as well as authentication. While business logic would be in the secure zone behind the second firewall. This example however does not have the load balancer in play.
> Not all content needs to come from the EE layer at all and that can be more specifically for the dynamic and access to confidential content, or just a plain data store.
I'm not in disagreement - it's just how we draw things. Several context
paths may not even be secured on your server - the static contexts would
be in one group, and the dynamic/spring would be in another. There's no
data on the presentation layer. There's no business rules stored there
either. In both cases, that is retrieved from the back-end servers.
Servers that can be databases, governance, business rules, business
process management etc. Putting features that requires sessions on
separate parts of the tiers makes no sense.
>
> Load balancing is also not a basic requirement unless you only need basic L4 load balancing. For example you can offload authentication to load balancers as well as static content and do content caching, compression or anything else at the edge instead of spinning cpu cycles to do the same thing.
Depending your setup you may have separate groups for SSO/authentication
yup. No disagreement there. Architectural no different than separating
static content out. Of course if your organization is large enough, your
SSO/authentication will be shared among many apps, and may therefore
reside in it's own zone somewhere.
This is what I really hate about answering stuff with general responses.
Everything always ends up with "it depends" but in general, we've still
got 3 tiers. I think you are talking 4 tiers because you differentiate
between a database server and a bpm server or a SOA/ESB server.
Conceptually I don't. They all are protected by the same backend zone.
> Here is a real world example:
>
> Client --> Internet --> firewall --> L4 LB VIP --> L7 LV VIP w/SSL FIPS --> Presentation zone servers --> L4/L7 LB VIP --> Secure Zone Servers --> DB Cluster
Not sure why you would put L4's infront of the L7, but I guess you can
do that. If by security you mean SSO/authentication/bpm/rules etc. we
agree again. I just would add the db under them, not behind them. Mainly
because the DB can actually initiate requests too or be a core part of
your process management, soa metadata, etc. So yes, LDAPs, BPMs, all end
up behind the line of the app tier. They're "just" connections and they
represent the actual content of the application you're serving.
> SSL is decrypted and re-encrypted on the L7 VIP. There is some cross data center items involved for persistency since this also spans 3 different data centers.
That definitely would be a special case for your app. WAN with
active/active clustering is not trivial.
> One data center for one application ( this particular one I'm looking at is dedicated ) is running about 1Gb/s of traffic and about 200k concurrent connections at the current time.
> The presentation zone handles presentation zone stuff. Javascript, cross data items where data has to be fetched from another application that is behind another load balancer and VIP where it has to do with display data. The secure zone side handles important stuff that you wouldn't want being pushed out to the world and has access to the "secure" database that has the important data.
> We also do a lot of URI mapping to determine where things go.
> For example http://www.example.com/ will go to the presentation zone servers for display data and authentication and are in pool_1, but http://www.example.com/something will goto pool_2 that contains VIPs for another application located somewhere else in that or another data center. While http://www.example.com/else will go to pool_2 which contains entirely different presentation zone servers for some other application.
Right - that's context mapping. So I talk about groups of clustered
servers that are all part of the same LB frontend, but each group behind
it will serve up specific requests
>
> Not every application needs to be "clusterable" and that term has a different meaning to different people.
> A pure web services application does not need to be "clusterable" ( for example a REST interface ) to gain benefits of load balancing. You can distribute the load and use a "key" for each request or part of the URI and there is no real session state.
Again, it depends on your level of paranoia. If you can live with some
requests going bad - let's say 1 out of 1 million calls - well, yes you
can definitely take a Web Service and say "no need for clustering here".
I'm not disagreeing that we have application components that we may
choose not to add the clustering overhead too - it's a decision you need
to make based on your requirements.
> In the example above there is no requirement for persistency in the presentation zone, however requests going to the secure zone do have a persistency requirement and the data in the headers is preserved from the secure zone to the presentation zone through the load balancers back to the client. If a presentation zone server goes down, no big deal. If a secure zone server goes down, also not a big deal since there is a single high speed session store per data center.
No argument here. Your backed services like DB or rules, can be "load
balanced" (I call it clustered but that shall not keep us apart). For
the really paranoid we want no single point of failure anywhere - at any
link. This also means our firewalls are duplicated (Spanning Tree
involved) and a lot more. From the system level and up we look at
providing HA. Although I've argued in the past, that if we can afford
loosing a server and we secure it with redundant servers, why do we need
to provide redundant systems on each server too - back to your paranoia
levels (and how deep pockets you have).
Where I do see a difference is how the back-end systems are called. You
can definitely pass in the session token from the middletier into the
request, which is required for bpm, but there are other operations where
that's not needed (like a traditional DB call). So I don't see the
communication to the backend services to be a simple matter of header
manipulation.
>
> One item of note is that we do re-encrypt data before sending to the presentation zone servers and that's more of a legal liability reason as well as it just makes good sense due to the type of traffic going across.
If your servers are in a public data center, I can see how that may be a
requirement yes. I hope that means you've offloaded the encryption to
the L4 or L7 modules since you're doing a lot of back and forth
"massaging".
> All in all different load balancers do very different things and are good at different things. Viewing them for the most part as an L4 device just limits the ability to make actual use of them.
I don't see the need for L4 at all from an application perspective. I
can see L4's play a role on the network infrastructure side - but to me
that's a bit off the raised subject here. As I understood the question,
this was focused on providing LB for application servers - L7 is needed
there as you need access to the HTTP headers.
>
> I've been doing this just as long going from system admin, application development ( desktop, mobile device, "internet" ), to architect and now combination of architect and managing something around a couple hundred load balancers from F5 to Netscalers, similar amount of firewalls and proxies. Blah blah blah.
>
> The one thing I can say for sure is that Blue Coats are EVIL!! :)
Well, I guess we much bleach them away then :)
>
> I have used HAProxy and that is not a bad load balancing solution either, at least for software based solution.
I don't think any amount past experience means that this stuff doesn't
get complicated nor is there "one size fits all". I think I gave the
impression early on, that that was the case - I'm happy to say here that
it was not what I meant. Hopefully enough additional confusion was
spread above to show that :)
--
Best Regards
Peter Larsen
Wise words of the day:
Footnotes are for things you believe don't really belong in LDP manuals,
but want to include anyway.
-- Joel N. Weber II discussing the 'make' chapter of LPG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url : http://calypso.tux.org/pipermail/novalug/attachments/20120307/b7559212/attachment.bin
More information about the Novalug
mailing list