Innnards: Where does the work get done in a web application?

As the author of the WWW FAQ, I regularly answer questions about the workings of the Web. If a question is frequently asked, I simply add an article to the FAQ. But sometimes a question is more detailed, more in-depth— not really a FAQ, but still of interest to others. You'll find those questions, with my answers, here in Innards along with commentary on other web-technology-related topics.

2007-08-06

Q. I am an old COBOL programmer and know little of how the web actually works. I want to develop and application, whereby the software resides on the host and it is not copied in whole to the local PC, but only modules as necessary for the actual computations to be done on the local PC. Can this be done on the web?

A. Yes, this is the way things normally work on the web. In addition to static content like images and text, web pages can also contain interactive form fields such as text fields, pull-down menus and list boxes. All web browsers have had this capability since NCSA Mosaic 2.0 appeared in 1994, a very short time after the initial appearance of the web.

The user can spend as much time as they like deciding what data to enter in these fields before they submit a response to the web server, which does not have to worry about every keystroke the user makes, only the final result. This is similar to the way time-sharing systems such as those you remember worked in the old days.

The critical difference is that while a dumb terminal was traditionally connected to a single "big iron" computer, or a minicomputer such as the PDP-11, web browsers are able to connect to millions of different web servers via the Internet.

Q. [When] early PCs came along, and we used to write programs that looped on each keystroke. So, for the first time, there was some local logic that was able to, for example, know whether a field was supposed to be numerical. And, since the local PC was interrogating the keystrokes in context, if an alpha character was entered into the field at any point, the local CPU would know it and slap the wrist of the person data entering. Then when "enter" was hit, some gross processing was further done on the host side.

A. So in essence, the "dumb" terminal became a bit smarter, taking some of the workload away from the server.

A similar situation exists with modern web sites, which use JavaScript (not Java! They are completely different) to address this issue. JavaScript code can respond to each keystroke, or simply validate form fields when the user clicks submit before allowing them to be sent to the server. Either way, there are two main benefits:

1. Some of the load is taken off the server. This is nice, but relatively minor compared to the second benefit.

2. The user does not have to wait for the server to respond with bad news. While the Internet is amazingly fast by any objective standard, in the real world we all get impatient waiting for web sites to respond. And when JavaScript code saves us from the "world wide wait" by pointing out errors before the server is contacted, we're all happier.

Q. Here's another case. [Let's assume that] a student's historical data resides on his PC. Now the student wants to search for all the instances of Napoleon in his data. The search field is read and the data and the program name to process it are sent to the web. The web reacts by sending a piece of code to the PC that searches for the data (which always resides on the user's hard drive). This is different from the host "searching" the user's data files, because the host is then intimately involved with the PC. Instead, I want the work to be done in the PC's memory, as if the work were being done by the host's memory. In other words, the host is thereby able to service many other PC's, because all it is doing is sending code to the PC as needed. The host contains code only. The PC contains data only.

Can this be done routinely. Can HTML do this?

A. The big difference between this question and your previous questions is that you wish to store the data on the user's computer, yet still allow the web site to access the data.

As it happens, web browsers do provide a way of storing data on the user's computer on a per-web-site basis. A web site can choose to store a "cookie", a piece of data on the user's PC that will always be supplied back to that web site when the user visits again.

However, this method is extremely limited, primarily for security reasons. Many browsers are set up to store cookies only during the current session, or only for a short period of time, or only if the user explicitly allows it for that web site. And since the cookie is re-supplied to the web server with every page access, it doesn't make sense to store large amounts of data this way.

Another problem with this method is that by requiring the user to always use the same computer, we give up one of the main benefits of the web: It doesn't matter where you are, you can still access your data! Servers are good things. They free us from the tyranny of always working at the same desk.

Instead, most modern web applications such as gmail or Google Applications store the user's data on the server side. However, they then use cookies to keep track of the fact that a particular user is logged in.

Web sites like gmail can also use AJAX (Asynchronous JavaScript And XML) techniques to fetch new information from the server without loading a new page. In this scenario, there is quite a bit of intelligence downloaded to the web browser in the form of JavaScript code. Here the goal is not to spare the server, although AJAX "polls" can be much more lightweight than full page fetches. The primary goal, instead, is to give the user a smooth, interactive experience in which they never experience a perceptible wait and can continue to type the next email message (in the case of gmail) while new email messages continue to pop up smoothly "in the background."

This does not mean that it is completely impossible to store a larger amount of data on the user's computer. Web browsers do in fact have that capability. Firefox supports "signed scripts," JavaScript code which has been given the privilege of accessing the user's hard drive. And Internet Explorer has a "signed ActiveX control" feature that accomplishes similar things, although in a proprietary fashion.

However, in the real world, this capability is very rarely used because of the extreme security risk involved in allowing an unknown web site to access your hard drive without restrictions. Even if the company creating the web site has good intentions, that does not mean that an unscrupulous cracker won't find a way to infiltrate that web site and then carry out malicious acts on millions of home PCs. Most users will simply click "no" when invited to approve the installation of signed scripts and controls. And I would add that they are right to click "no." The benefits are not worth the risks.

Due to the concerns of security, the benefits of being able to log in and work from anywhere, and the fact that storage space is generally cheaper in large "server farms" than on individual PCs, web applications nearly always store data on the server side. Browser-side data storage (cookies) is used only as a convenience to remember who is logged in and, if the user agrees to store a cookie for a longer time, to make future logins unnecessary.

In conclusion, there is quite a bit of similarity between the World Wide Web and an old-fashioned time-sharing system. In fact, nearly all of your suggested scenarios are a normal part of the modern web's operation. The sole exception is that web applications rarely store data on the browser or "client" side— because it is usually advantageous to store it on the server side, and because of the security issues that arise when a web site is allowed to act beyond the safe boundaries of the web browser window.