Performance
General performance and usability guidelines
The Requea Platform is designed to offer light speed performance. The goal is to have a sleek and very interactive interface.
Remember that ergonomics research states that:
- Under 100 millisecond response time. This is the "Sleek" range. The user will perceive a fluid application.
- Under 1 second. This is the "Processing range". The user will perceive a responsive but will notice processing times.
- Over 1 second and under 10 seconds. This is the "Busy range". The user will be annoyed by processing times, its productivity will decrease, but the application is still usable. (sometimes barely)
- Over 10 seconds. The user will start doing something else. Therefore, you will lose him. This is the "unacceptable" range.
In addition to that, I believe that application scalability is achieved by being server-side efficient. When you spend less CPU cycles on a user action, the total amount of CPU time can be spread over a larger number of users.
Elements of performance
In essence, this is a time race and every millisecond counts. To achieve that performance, there are a number of mechanisms built in the platform to serve the data to the user within the 100 ms time range.
For information, a non comprehensive list of them:
- Use of taglibs and JSP. For rendering, the platform uses servlets and JavaServerPages. Those are java compiled objects that are optimized for rendering. In addition, the Java JIT (Just in time compile) does a great job of pre-compiling the java byte code into native processor assembly code for optimum CPU performance.
- A Model / View / Controller for rendering and user actions processing.
- Ajax based interactivity. Where there is a need to interact with the server (like processing of a on change user click), the minimum amount of data is posted to the server and the update is received with AJAX based transaction. This reduces the amount of data on the network and the amount of processing on the server to the bare minimum.
- CSS and JS based design. The HTML is CSS based to separate the look and feel information (CSS, loaded once) with the actual data (HTML, rendered each time). The same approach is used for javascript.
- Client cached of non alterable data. JS, CSS, images, file attachment content cannot change. Therefore, the platform declares those objects as cacheable with an expiration date far in the future. The client browser may use those information for optimum caching.
- Server side caching of user data. Security tokens are cached upon login, as well as user information such as locale.
- Caching of LOV (List of Values). Common list of values are cached in the platform and not retrieved each time the rendering engine would need it.
- Pre-compilation of server based JavaScript code. All javascript is loaded and pre-compiled into java classes. The execution time is therefore on par with pure java code, and JIT optimized as well.
- First level cache of entity objects. A cached is used to avoid loading twice the same object within the same database transaction.
- Optimistic locking of database objects. Even though this does not speed the transactions, it dramatically reduces the contention and makes the job of the database server much easier and much more scalable
- Lazy loading of collections and referenced objects. Collections are not loaded immediately but are loaded on demand (lazy load). This reduces the number of SQL orders sent to the database server.
- Batch loading of collection items. Items in collections are not loaded one by one, but with batch processing (10 at a time, even if we have less).
- Pre-parsing of SQL statements. SQL orders to load, insert data are pre-calculated, and pre-parsed parametrized statements are used. In addition to improve the security by removing the risk of SQL injections attacks, it also helps the database server to optimize execution plans and caching of database data, thus dramatically improving the overall performance.
- Limits load of data. When rendering lists, only the first items are retrieved and a page cache mechanism is used to retrieve the minimum amount of data. The SQL orders includes database specific hints to let the database server that we want just the 10 first. In most of cases, the database server will do a good job at returning those items as fast as possible (avoiding full table scans for example).
- Pre-calculating of images. Pictures and images (platform, and application based) are resized and re-compressed using JPEG compressor prior to rendering to the user. This ensures the right resolution to be sent to the client, and avoid large amount of data to travel on the network
- No use of server-side inefficient architecture. For example, EJB are not used. Even though they are scalable in theory, their fine grain distributed nature makes them inefficient and therefore they are dismissed on the requea platform.
Possible bottlenecks
Unfortunately, there are a few places where those optimization techniques will be defeated, and you may experience delays and overall performance issues. Those issues are pretty uncommon, but you may check your applications to make sure that you do not get into one of those.
Here is a list of a few of the one we have found. Those are real examples of real problems that we found on production systems.
- enough memory.
Symptoms: Slow down, Java out of memory exceptions in the log files.
Solution: This is basic Requea installation configuration. You should adjust the memory requirements to the load of your application. And make sure your database server has enough memory also.
- First actions slow
Symptoms: The first user action is slow, especially after a platform upgrade. Subsequent calls are fast.
Solution: None. Loading and caching takes time, same with JSP compilation and search indexes first creation. If you restart your app, you reboot you server every night, you will just defeat all the caching mechanisms in place. And not only the Requea platform caches, but also the database server caches. If you have to reboot your server, just accept this first action slowness. If you do not have to reboot your server, just don't. We have platforms with uptime over 100 days running just fine. And most of the restarts are due to operating system security updates, not the platform itself.
- Large database table, non optimized SQL execution plan
Symptoms: If you do complex querying on large tables, the database server make experience delays
Solution: Make sure that you have the right indices on your database. You can turn on SQL trace by creating an hibernate.properties file in your WEB-INF/classes path and setting the hibernate.show_sql to true. This kind of problem starts on tables with over 100k records. If that's not the case, don't bother with this one.
- References or Components with one to many cardinality and very large collections. We have found a case where an entity had a property reference to another entity with a one to many occurrence. The target entity was a large volume table (over 2M records) and each parent entity had tens of thousands of linked records.
Symptoms: Slow processing of data. Slow JDBC warnings in the logs even tough the retrieval is done on an indexed foreign key.
Solution: Do not declare this property in the parent object. Handling this kind of collection (Reference or Component) is highly questionable. In many cases the platform will have to load this collection (prior to saving for example) and lazy loading will not help. Use adhoc filtering and custom user actions instead.
- Calculated options (List of Values) returning very large amount of data.
Symptoms: Slow interface, very large HTML generated. Slow rendering metrics on the monitoring
Solution: List of Values should be a small number of value items. Do not put an option if you have over one hundred of possible option values (we have seen the case with a rogue script generating 10,000 values). Use a reference on a category type entity instead.
Solution 2: Calculated list of values may be cached. If the computation is not based on instance data, you may cache the result to avoid the processing. (They should not return large number of items though)
- Over use of calculated fields. Especially when those calculated fields are based on components or reference
Symptoms: Too many SQL queries on just a click on a check box. (check number of JDBC operations on the performance monitoring on the #postback sub action)
Solution: If you have many calculated fields and many properties with the "Recalcuate All" rule set to true, the server will spend a considerable amount of time recalculating those options. Reduce the number of calculated properties. Scale them down with filtering. Use separate user actions if you need to render the lists.
- Slow operations.
Symptoms: You do a lot of processing (exporting, importing files...)
Solution: When you do a lot of processing, a slow operation cannot be avoided. Make sure that the "slow operation" check box is checked, and set progress.total and progress.current values in you script. The platform will render a progress bar and let the user the opportunity to cancel the operation if necessary.
- Consider upgrade.
Symptoms: Old version 2.1.
Solution: We continue to monitor some of our customers implementations and improve the platform every day. Therefore a recent platform will be faster and less resource consuming than an old one. This is especially true if you compare a 2.1 vs a 2.2 version.