[Novalug] Software-based load balancing

Brander Snaxe brandon20va at yahoo.com
Fri Mar 9 06:59:37 EST 2012


Also, I'd use keepalived on Linux or CARP on FreeBSD and have failover on the LB's themselves.



----- Original Message -----
From: Brander Snaxe <brandon20va at yahoo.com>
To: Dave Greene <omniplex at omniplex.net>; 'Peter Larsen' <plarsen at famlarsen.homelinux.com>; "novalug at calypso.tux.org" <novalug at calypso.tux.org>
Cc: 
Sent: Friday, March 9, 2012 6:52 AM
Subject: Re: [Novalug] Software-based load balancing

Thanks Peter and Dave.

I can see that I may encounter both 'real-world' scenarios. Our application can be deployed across tiers or totally deployed in the single application tier. When deployed across tiers, we use HTTP web services (Spring HTTPInvoker acutally) to move data. When deployed in only the back-end tier, I would probably use mod_jk, mod_proxy, weblogic module, etc. from an Apache server.

The decision to deploy using one method or the other is client requirements. For whatever reason, our current client did not want to use any type of mod_jk, mod_proxy, etc to move data from a web server to the app server and back. Regardless, I need to be able to deploy using either way and I think both of you agree these are both real-world examples.


So now my main consideration is how to move forward with KISS and accommodate some different configs.

Obviously I should not look at load balancing as just 'L4' or else I may miss out on other things such as authentication, caching, etc. Got it and this makes sense for me to consider.

The main choice for me at this point is to consider the security implications for L4 vs L7 in regards to SSL offload. It seems to me that if I use L7 and terminate SSL there, then in theory it would be possible for somebody else who hacked into our network to sniff communications between LB and presentation server. Is this a true statement? The same holds true if the communication from presentation to business tier could be sniffed. Our development/demo/proof of concept environment is shared and thus I do not have separate servers or vlans for each separate application. I am actually looking at this from a 'hosting/vps' type of model. Production implementations will get all the goods and separation one would want.

Depending on the deployment type, if I want full, end-to-end SSL across all tiers I would need to use L4 and do termination on each node. If the concern is not as great, I can use SSL termination on the L7s.

I believe I could build a single solution that does both L4 and L7 using Linux or use FreeBSD (starting with pfSense probably).

Here are the scenarios I see for my environment. Assume the LB can do L4 or L7 with software.

Full SSL in shared environment:

client/SSL-->[L4]-->SSL/presentation-->[L4]-->SSL/business-->DB

If I trust against sniffing in presentation or if the data is not as sensitive:

client/SSL-->[L7 SSL term]-->presentation-->[L4]-->business-->
OR
client/SSL-->[L7 SSL term]-->presentation-->[L7]-->business-->

These two options also give me the ability to put other software such as caching proxy, auth, etc. on the LB (L7 mode?) as well.

Any thoughts on this approach?

--Brandon






________________________________
From: Dave Greene <omniplex at omniplex.net>
To: 'Peter Larsen' <plarsen at famlarsen.homelinux.com>; novalug at calypso.tux.org 
Sent: Wednesday, March 7, 2012 10:29 PM
Subject: Re: [Novalug] Software-based load balancing

I have to disagree with a bunch of items here and will provide a real world example.

> client -----> [firewall] ---->presentation (HTTP/JSP) -----> 
> [firewall] ----->business (Java EE/Spring) ------>database

This is a fairly standard way of doing a lot of presentation/secure zone setups.  
The presentation zone can be responsible for managing static and some non-confidential content as well as authentication. While business logic would be in the secure zone behind the second firewall.  This example however does not have the load balancer in play.
Not all content needs to come from the EE layer at all and that can be more specifically for the dynamic and access to confidential content, or just a plain data store.

Load balancing is also not a basic requirement unless you only need basic L4 load balancing. For example you can offload authentication to load balancers as well as static content and do content caching, compression or anything else at the edge instead of spinning cpu cycles to do the same thing.

Here is a real world example:

Client --> Internet --> firewall --> L4 LB VIP --> L7 LV VIP w/SSL FIPS --> Presentation zone servers --> L4/L7 LB VIP --> Secure Zone Servers --> DB Cluster

SSL is decrypted and re-encrypted on the L7 VIP.  There is some cross data center items involved for persistency since this also spans 3 different data centers.
One data center for one application ( this particular one I'm looking at is dedicated ) is running about 1Gb/s of traffic and about 200k concurrent connections at the current time.
The presentation zone handles presentation zone stuff. Javascript, cross data items where data has to be fetched from another application that is behind another load balancer and VIP where it has to do with display data. The secure zone side handles important stuff that you wouldn't want being pushed out to the world and has access to the "secure" database that has the important data.
We also do a lot of URI mapping to determine where things go.
For example http://www.example.com/ will go to the presentation zone servers for display data and authentication and are in pool_1, but http://www.example.com/something will goto pool_2 that contains VIPs for another application located somewhere else in that or another data center. While http://www.example.com/else  will go to pool_2 which contains entirely different presentation zone servers for some other application.

Not every application needs to be "clusterable"  and that term has a different meaning to different people.
A pure web services application does not need to be "clusterable" ( for example a REST interface ) to gain benefits of load balancing. You can distribute the load and use a "key" for each request or part of the URI and there is no real session state.

In the example above there is no requirement for persistency in the presentation zone, however requests going to the secure zone do have a persistency requirement and the data in the headers is preserved from the secure zone to the presentation zone through the load balancers back to the client. If a presentation zone server goes down, no big deal. If a secure zone server goes down, also not a big deal since there is a single high speed session store per data center.

One item of note is that we do re-encrypt data before sending to the presentation zone servers and that's more of a legal liability reason as well as it just makes good sense due to the type of traffic going across.

All in all different load balancers do very different things and are good at different things. Viewing them for the most part as an L4 device just limits the ability to make actual use of them.

I've been doing this just as long going from system admin, application development ( desktop, mobile device, "internet" ), to architect and now combination of architect and managing something around a couple hundred load balancers from F5 to Netscalers, similar amount of firewalls and proxies. Blah blah blah.

The one thing I can say for sure is that Blue Coats are EVIL!! :)

I have used HAProxy and that is not a bad load balancing solution either, at least for software based solution.




-----Original Message-----
From: novalug-bounces at calypso.tux.org [mailto:novalug-bounces at calypso.tux.org] On Behalf Of Peter Larsen
Sent: Wednesday, March 07, 2012 20:44 PM
To: novalug at calypso.tux.org
Subject: Re: [Novalug] Software-based load balancing

On Wed, 2012-03-07 at 15:20 -0800, Brander Snaxe wrote: 
> I am in charge of designing some of the infrastructure pieces for my software development team. We have an n-tier solution accessed via the browser. There is a presentation tier, a business logic tier, and the database tier. The presentation tier is comprised of HTTP servers for static content and Java EE servers for JSP/servlets. The business logic is comprised of Java EE servers running web services, EJBs, and Spring.
> 
> The presentation tier is accessible to the internet. The business logic tier is not. The database tier and the business tier are in the same network zone.
> 
> It's like this:
> 
> client -----> [firewall] ---->presentation (HTTP/JSP) -----> 
> [firewall] ----->business (Java EE/Spring) ------>database

That's a bit odd way to do things. Usually your proxy is not involved AT ALL in creating content. And a proxy is what your load balancer is. In other words, your backend would serve up both static and dynamic pages and the proxy's role is simply choosing the right backend server.

> 
> Now, I plan on using Java EE clustering in the business tier no problem.

Which is where all your content should be placed. The proxy can some times act as a cached server - but that doesn't change that content is initially generated and managed by the backend system.

The old mod_jk and mod_proxy with apache is the way this has been done for more than a decade. Mod_jk provides for static load balacing. This just means the load balancer is "blind" and uses an algorithm separate from the load of the backend servers to determine which server is "next in line" for a new request.  You can use mod_jk with most JEE servers, such as JBoss, Tomcat etc.

It works like this:

User -> Apache mod_jk workers -> node1
                              -> node2
                              -> node3
etc.

If a node goes down it will be skipped by mod_jk. Using mod_proxy and other modules, you can turn your proxy server into a caching server which will serve up data without having to pass the request back to the back-end nodes.

> 
> However, now I need to implement a load balancing strategy for both tiers. I am assuming these load balancers would go here:
> client -----> [firewall] ---->(LB/VIP)--> presentation (HTTP/JSP) 
> -----> [firewall] -----> (LB/VIP) --> business (Java EE/Spring) 
> ------>database

Again, you don't have "two presentation tiers". Just one. The backend database is usually behind some kind of HA provider too, but that's very database dependent in regards to how that is provided.

> 
> Let me say that
> 1. I come from a developer background and not infrastrure/deployment 
> 2. I DO know basic system engineering (Linux+ and Network+, worked 
> with virtualization) 3. I have never worked with load balancing 4. I 
> have done lots of reading on the internet on it for the past few days

I've been doing this stuff for 15+ years both as a developer, system admin and architect. Load balancing is a basic requirement. However, you need to consider your backend application and it's needs too. If it's not clusterable, then you're not gaining much from your LB - your can still loose sessions and data. If you want to implement session clustering, you need to consider if you want to use sticky sessions or have active replication between your business tier nodes making any node being able to handle any call. As you see, this puts pressure on the backend nodes to share everything which means a lot of memory and CPU goes to maintain this redundancy.

With EE frameworks like JBoss (full disclosure - I'm a JBoss solutions
architect) you can control the clustering/replication that goes on to minimize this overhead. But in the end, the application you run MUST support clustering or what-ever sessions you have open when a node fails will be lost.

> 
> My goal is to do somthing that mimics real world deployment. I want to keep it as simple as possible. This is NOT production, so performance is not as important as whether or not all of our software works decently in the configuration.

Real world does not have multiple sessions/load balancers. That makes no sense. All EE servers can provide static and dynamic content, so create a simple WAR file with your images, css and other static content and it will be available like your jsp and similar dynamic pages.

> 
> User base is probably 100 concurrent max.

A single server should be able to handle that these days. With web apps, we need to be careful in how we define concurrent users. Are we getting
100 concurrent incoming requests, or do we keep 100 open sessions at a time - remembering that most users don't log out - they just close the browser; you get an artificial large number of open sessions that way.
So you need to talk about session sizes and lifetype. That should help you determine if a single server can handle it or spreading it over multiple will make sense from a performance perspective. For HA,  you of course need at least 2 (actually the recommended setup is 3 nodes minimum for the distribution algorithm to work optimal).

> 
> Since this is DEV and not PROD, my costs need to be limited. I cannot acquire any sort of hardware load balancing solution for this (I have read about Netscalers and F5's). So I am forced to come up with a software-based approach.

Apache mod_jk is quite open source and free. And works great with Tomcat as well as JBoss.

> All of this will be done using virtual machines in vSphere.

vSphere is definitely not cheap -- wonder if you shouldn't consider alternatives? But regardless, Java doesn't care. Nor does the load Apache based balancer.

> I need SSL from the client side minimum. I have concerns about encrypted traffic from tier-to-tier as well but need input.

We usually implement that on the proxy. It's very common to only use SSL for certain parts of the site, so using URL mappings when you enforce SSL in your setup is recommended. The connection to the backend systems are all internal, and should not need to be encrypted. Be VERY careful how the headers are treated when you do this, as the load balancer must be configured to massage the headers so the back-end systems do not get confused.

> 
> I have researched the following:
> 1. Linux Virtual Servers (IPVS)
> 2. Pound
> 3. Pen
> 4. HaProxy
> 5. Nginx
> 6. CARP (freebsd/pfSense)
> 7. VRRP
> 8. keepalived
> 9. stunnel
> 10. stud
> 11. probably more I'm forgetting but these are the prominent items on 
> my mind

You need to pick something that is session aware and can work with the backend servers. mod_jk has a proven track record, and I don't think we can claim Apache as being something that you're taking your chances with. Using ip tables to do this will not work You have to manipulate headers for this to work, and that's what Apache/Proxy servers are all about.

> 
> I have learned that load balancers come in two forms 1. Layer 4 2. 
> Layer 7

Absolutely - and what you want is a Layer7 proxy which is mod_jk. In other words, it needs to be protocol specific for http/https. 

> 
> This website was very helpful (but trust me this is not my only source of research):
> http://loadbalancing.org/

http://docs.jboss.org/mod_cluster/1.1.0/html/demo.html
https://community.jboss.org/wiki/UsingModjk12WithJBoss
http://tomcat.apache.org/connectors-doc/

> Now I'm struggling how to proceed.
> 
> What would you or do you use in your client environment?

Some hardware load balancers can do this too - such as F5. But they are very pricy. The advantage is that you can offload the SSL processing and centralize your LB. Another thing the hardware loadbalancers are designed to do is provide no single point of failure. In other words, what happens if your load balancing node fails? You actually need two and cross them. The simple way to do that is a cold-standby and have a heartbeat move the IP over if lb1 goes down. 

Depending on your app, you may or may not need fully clustered solutions
- BOTH on the LB and on the application tier side. It depends on your level of paranoia and what downtime costs you. 

> 
> I have thought of using Nginx as a layer 7 load balancer and also doing SSL termination on it. This has the advantage of being simple and I can have central management of SSL certs/installs. However, this means traffic inside the network from the LB to the actual load balanced web server could be sniffed. Is this a real concern? I really like the idea of central SSL termination, but this seems like a security risk.

As long as it's a true layer 7 and it's able to do sticky sessions if you are designing to that, then it should be fine too. Some common things that are done, is to map context paths to specific groups of hosts. So instead of having all your apps available on all backend servers, you may have GIS type apps on group A, shopping in Group B, and static stuff on group C - and on the LB you map out the context paths to these groups.

A plug for JBoss here - is that we have ways to do this dynamicaly so as the nodes present their cluster memberships and available context paths, the routing is setup automatically - NO CONFIGURATION NEEDED. 

> 
> The same holds true for the presentation servers to the business logic servers. If a presentation server uses a non-encrypted web service from the business tier, then this traffic could be sniffed as well. In anybody's experience, do you use SSL between tiers?

You're basically doing double session setup - it will not work the way you think. Your app server should do what it's supposed to do and own the session etc. - the rest is a matter of caching proxies and routing.
Of course with caching you have to be careful too that you don't serve up stale content.

> Linux Virtual Servers (IPVS) seems cool, but if I use it at any tier, then I have to do SSL after the load balancer on the servers themselves as it is Layer 4. This means no central management of SSL certs.

SSL needs to be done as close to the outside line as possible. Once you've got the communication in-house, you no longer need encryption. 

> 
> I'm sure I could talk about more configurations on my mind, but I wanted to start a conversation just to get initial thoughts from those who do this sort of thing. I realize there are always trade-offs with complexity, security, performance, etc. but I just can't seem to decide on what 'real' people are doing (and not just what google searches tell me people are doing).

I've added a bit above for you to consider. These are general considerations and does not depend on the type of appserver you use.

If I was to do this on a tight budget, I would start with CentOS, add KVM and virt manager plus cluster suite. Create clustered/stable storage for your virtual machines, create a LB host based on Apache, and 3 backend hosts based on Tomcat (if your app is just JSPs Tomcat is a great choice). Setup mod_jk, and implement SSL on Apache. Once you have your mod_jk workers defined, you can start testing.

Btw. I would definitely start by testing your applications without using the loadbalancer. That way you know if it's a lb configuration issue or an application issue.

If your app needs are a bit more than just JSPs, and you have a true need for a full JEE stack, that's where JBoss comes in - again jboss.org provides you with the bits to play with. mod_jk will not have to be changed to use JBoss - nor does it matter to mod_jk what type of content is being served up to the clients.

--
Best Regards
  Peter Larsen

Wise words of the day:
What if everything is an illusion and nothing exists?  In that case, I definitely overpaid for my carpet.
        -- Woody Allen, "Without Feathers"

_______________________________________________
Novalug mailing list
Novalug at calypso.tux.org
http://calypso.tux.org/mailman/listinfo/novalug
_______________________________________________
Novalug mailing list
Novalug at calypso.tux.org
http://calypso.tux.org/mailman/listinfo/novalug



More information about the Novalug mailing list