Archive for the ‘Terracotta developer’ Category.

Caching the Hot Stuff with Terracotta

As I’ve been blogging about recently, we have been developing an exam-taking web application at Terracotta to demonstrate the Session Clustering capabilities of Terracotta. Since one of the requirements of this web app is that we support 40,000 concurrent users, we thought we’d better cache the hottest exams (using Ehcache) rather than fetch them from Hibernate each time. Since modifying an exam should occur far, far less frequently than taking an Exam, and since Terracotta already supports Ehcache, this was a no-brainer.

There is an ExamService service, configured as a Spring bean, with DaoExamService being the default implementation:

The findById method is expected to be the most frequently-invoked method. The other two methods shown are administrative functionality, not expected to be used frequently. We want to cache the results of all three methods in a single, clustered Exam cache.

My approach was to write a new service, CachingWrapperExamService, a proxy which owned the cache and which delegated to another ExamService instance:

The following were straightforward changes:

  • Once again, our use of interfaces have reaped dividends: Introducing this new Caching ExamService was as easy as tweaking the Spring XML file – the change was completely transparent to all users of the ExamService bean. It also made unit testing easy, as I could create a mock ExamService to test caching with.
  • The Maven pom.xml had to be changed to note that Ehcache is now a compile-time dependency, not just a runtime dependency.
  • The Terracotta config file tc-config.xml did not have to be modified, as we were already using the Ehcache TIM, and so our CacheManager was automatically clustered.

And just like that, we have a clustered cache of the hottest exams being used.

ORM can lead to inflexibility; Terracotta can help

Okay, granted, I’m biased, I work for Terracotta. Be that as it may, I’d like to share some experiences my teammates and I have had using Hibernate recently while developing a web app.

First, some brief background. We are developing a “reference” web app at Terracotta to use to promote and explore the Sessions Clustering Use Case which we are working to nail. The app is an online exam-taking application, with the goal of supporting 40,000 concurrent users. I’ve blogged about this before, and you can read about the technology stack we settled on. Development has been done primarily by myself and my teammates Geert Bevin, Abhishek Sanoujam, and our supervisor Alex Miller.

Hibernate is wonderful, and it is an integral part of our web app. It feels to me like we got moving pretty quickly using Hibernate for persistence of our domain objects. For ORM, it’s unbeatable.

But the thing I noticed is, there’s just no avoiding the fact that whatever your domain POJO’s are that need to be persisted, chances are good that the use of ORM will impose some constraints on how you must write those POJO’s. I have two examples of this to share.

Example One – Generics

First, we have an exam Section class which, conceptually, is a container for either multiple sub sections, or Questions, but not both. The ideal solution would be to define Section as this (JPA annotations omitted):

where TestContent is an interface implemented by both Section and Question. Thus, an instance of Section could be declared as having type of either Section

or Section, which satisfies our constraint.

However, at runtime (when starting Tomcat), Hibernate (the JPA provider) threw an exception pointing out that Section had an unbounded type (or something like that). After a little digging around on the internet, I found a forum where someone explained that an Entity cannot have a generic type, because it’s not known until instantiation time what the linked Entity will be.

Therefore I had to compromise. I modified Section, removing it’s generic type and adding two explicit collections, one for Questions, one for Sections.

This is less than ideal because the Section API itself doesn’t naturally prevent a single Section instance from having both sub-sections and questions, even though we don’t want to allow this.

Example Two – complex object tree

Similarly, for my other example, one of the constraints is that a Question must have exactly one correct choice (from among it’s two or more choices). So our first inclination was to structure the Question class thusly:

But this caused problems when saving an edited Exam which had had a Question added to it. I no longer have the stack trace handy, but the gist of the Hibernate exception was that a transient (unsaved) object was detected in the object graph being merged (updated).

Alex and I dug in and finally examined the generated database schema. What we saw was that the QUESTION table had a CORRECT_CHOICE column which was a foreign key into another table, QUESTION_CHOICE I think it was. Alex and I theorized that there was a possible ordering problem in updating an Exam with a new Question and Choices – what if Hibernate attempted to set the CORRECT_CHOICE foreign key before inserting the new choices for the question?

I’m not 100% positive that’s the correct explanation, but in any case Alex made the executive decision to simplify our domain model and not spend any more time debugging. We added a boolean “isCorrect” property to Choice, and removed the “correctChoice” reference from Question:

Problem solved – we no longer got the Hibernate exception. But, as Abhishek pointed out, our domain objects no longer enforced the constraint that a question could have only one correct choice. With the updated classes, nothing would prevent instantiating a question with multiple choices marked as correct. This put the burden on additional validation code to enforce this constraint, and overall is just less than ideal.

How Terracotta Can Help

The point I am agonizingly slowly building to is, I think it’s acceptable to have these constraints on our persistent domain objects, but only on the ones that should be persisted. An anti-pattern that we at Terracotta have seen again and again is the misuse of the database and ORM to persist state that really does not belong in the System of Record, but rather is transient state that must be persisted only to scale applications by keeping the applications stateless. One of the Terracotta co-founders, Orion, coined the term “State Monster” to describe this abuse of the db, and recently Wille Faler wrote a very good blog describing this.

Terracotta can help by providing an alternative to making apps stateless for scalability purposes. With Terracotta, go ahead and write your application in the most natural way, including shared state that is only transient. Consider this helpful graph about data lifetimes when deciding what state belongs in the SOR and what state is merely transient or pending. Then, use Terracotta to both cluster and persist the POJOs that don’t belong in the SOR. The advantage is that Terracotta does not impose any constraints on the API of the sorts I have written about here – generics are fine, arbitrarily complex object graphs are no problem. Terracotta clustered objects don’t even have to implement Serializable.

Weekly Summary – web app

It was a better week. As I said last week I was looking to cut down on distractions and improve my focus. I stuck to a morning routine this week: every day I got up around six, took a 30 minute walk on the beach while listening to Stackoverflow podcasts, then got showered and online by 7 to 7:15. I usually got over an hour of work done before anybody else even woke up. I kept twittering, e-mail and such to a minimum.

I’ve definitely enjoyed working at our condo here in Destin, FL. It was quite a novelty to work on a balcony overlooking the Gulf of Mexico, and I made sure to rub my teammates’ faces in it. But honestly by the end of the week I couldn’t stand to work out there with all the noise and glare. I spent most of my remaining time on the north balcony which overlooks a golf course and lake, and is much quieter.

This week I got started on the reference web app that we are working on as part of Terracotta’s strategy to really nail the clustered “user session” use case. Alex has blogged in detail about that strategy, and more recently on the technology stack we settled on for the web app. Geert Bevin, who is himself the author of the Rife web framework, had already spent a week or two learning Spring MVC and laying down some architectural groundwork. By the time I got started on Monday he was up and running and had stubbed out some basic pages.

My week was divided into two parts. Part one was two and a half painful days of just setting up my workspace. This involved upgrading to Eclipse 3.4 Ganymede, which includes the WTP that Geert was already using to launch the app in Tomcat. Then upgrading all of the plugins I needed. Then endlessly debugging because I got exception after exception when I tried to do seemingly anything.

When I was interviewing for Terracotta, one phrase I heard was “just in time learning”, and that was certainly true this week. Here is a probably-not-complete list of technologies that I’ve had to learn at least some of (or some more of) this week, quickly: Maven and it’s Eclipse plugin, Subversion and Subclipse plugin, WTP, Spring MVC, JPA annotations with Hibernate as ORM provider, SOJO (for JSON parsing in Java), Crosscheck (being evaluated for JavaScript unit testing in Java vm), jQuery, MySql, and probably some other stuff I’m forgetting.

During the second half of the week, everything started clicking and I was finally committing changes. I began working on the exam-creation page. Geert recommended that we approach the page by using JavaScript to allow the user to build up an entire exam and submit it as a single request, passing JSON to the Controller. I’m really excited about this because I’ve grown to like coding JavaScript. I’m almost disappointed that I’ll be on vacation this week. By week’s end I had the controller basically working, accepting and parsing JSON. But I did not get any JavaScript in place yet, although I spent a good bit of time reading up on jQuery and Crosscheck.

My Office This Week

I’m working remotely this week in Destin, FL. It’s rough, but that’s life working for Terracotta.

destin.jpg s100_1479.JPG

Weekly Summary

I struggled last week. It was just one of those weeks where nothing seemed to be easy. The week started off nicely with me adding some more functionality to our distributed cache testing framework, that’s a peripheral and relatively new code base that I’m pretty familiar and comfortable with by now. But after that things started queuing up and I couldn’t close anything out.

First, I was asked to take a look at a customer issue involving Serialization of Terracotta-instrumented ConcurrentHashMap instances. The instrumenting we do on that class breaks Serialization of CHM instances between non-Terracotta processes and Terracotta processes. I’d like to blog about this separately, and actually I think someone smarter than me could do a PhD dissertation on how Terracotta instruments CHM (there’s a lot of it) and why it’s necessary. So in two days I was really only able to evaluate the bug, get it reproduced in a test case, and decide on an approach, but was not able to actually make a fix. That was disappointing.

Next, I was asked to see about removing the JSR 107 dependency from our ehcache TIM. By this time it was later in the week and I was packing for our trip to Florida, and I only finally found what I think is the problem while en route to Florida without an internet connection.

Thirdly, a teammate and India and I have been trying to do some followup testing on the initial cache testing I last blogged about, but it has been going slower than we’d like due to various roadblocks too boring to mention. Just a little while ago I committed some more functionality to our distributed testing framework which will allow us to vary the number of Segments used in CHM instances when we test, and that should’ve been done last week.

This week I’m going to make a concerted effort to eliminate potential distractions, and to better manage all the various feeds that bombard me. IM has to be on pretty much all the time, so my coworkers can reach me, but otherwise this week I’m going to turn of Twitter, RSS feeds and all e-mail for long stretches at a time and get…things…done! I’m also going to go to bed early (soon) and try to get up very early, take a walk to wake myself up, and get to work early. I’d like to do that all week and see if it helps.

My family and I are in Destin, FL for two weeks. I’m going to work this week and take next week as vacation. I’m excited to be down here and have a change of scenery, but at the same time I’m surrounded by many more potential distractions.

Weekly Summary – Clustered Performance Testing

I haven’t been writing here as much as I’d like to, and truthfully there is a ton of stuff I want to write about! But it’s hard to make the time. Especially when, given extra time, I’d rather just keep working on what I’m doing :)

Over the last couple weeks I’ve been running a set of clustered performance tests using our homemade distributed testing framework, nicknamed “Droid”. I went into some detail about this in my last . The testing I did was to measure the cluster-wide throughput (transactions per second) given one, two, four and eight nodes in a cluster (not counting the Terracotta server as a node). We repeated these test using both ConcurrentHashMap and Ehcache as our distributed cache, and we repeated all of the tests with a new (2.6.2) and older (2.5.4) version of Terracotta.

I had a one-on-one with my boss Steve, also one of the co-founders and originally an engineer himself (now head of engineering). We had an interesting discussion about the testing results, and concurrent testing in general. He reminded me to always be aware of unexpected bottlenecks when testing, and always make sure you’re measuring what you think you’re measuring. For example, we designed the test so that none of the machines would be memory or CPU bound – but did I verify that that was in fact the case? Not really. I just set the jvm memory high enough and hoped for the best. We were really trying to get a feel for how Terracotta distributed lock contention would bog down linear scalability as more L1 nodes were added, so Steve’s point was that we don’t want other unexpected resource constraints to mar the measurements.

Late last week, continuing into this week, I started taking a first swipe at collecting cluster-wide statistics in Droid. We already have single-node statistics, but it would save us (primarily Alex) some time if the framework did the number crunching for us. Of course this means we have to use Terracotta and create another distributed object for doing such collecting and processing.

I also spent some time with a new engineer in India, Himadri, trying to get him started on Droid. His development machine runs Windows and I have a MBP, so there’s been some pain there. In particular, we ran into what turned out to be a known (but not by me) issue in our build process that occurs only in Windows.

In other news, so far I absolutely love working out of my home, but on Thursday two weeks ago I experienced the downside. My internet connection went out. Grrr. So I packed up and drove to my parents’ house, but it was out there, too. Stupid of me – my parents and I both have Charter cable, and it turned out that Charter had an area-wide outage that day. At the time I was very irritated – it’s so easy to hate Charter. I finally decided to go to McAlister’s Deli, which has free wifi. I started my working day at 11 o’clock that day. But it ended up being a great experience: the wifi worked fine, and McAlister’s has sweet tea, which to me is like crack cocaine. My boss Alex and I even ended up meeting there last week for a working lunch. Incidentally, Alex has DSL at his house, so chances are good that we won’t both have an outage at the same time.

This Friday I’m leaving with my family for Florida for two weeks. I’m going to work down there the first week.

Weekly Summary

It was a good week. I finally got all of the automated TC Spring tests to pass for Spring 2.5.4, so I was able to mark that issue done. Terracotta now clusters Spring 2.0.x through 2.5.x. That code base is due for a refactoring, though. Our code for clustering Spring uses AspectWerkz to define join points all over the Spring source code, not just the public API. What this means, as I’ve ranted about before, is that even minor changes to Spring’s source code (as occur even between minor releases such as 2.0.5 and 2.0.8) have broken our clustering code. What I’d like to do, when time permits, is see if we can rewrite our aspects to only use methods of the public Spring API as join points. That should give us a whole lot more stability.

My boss Alex is prepping me to help him do some more performance testing. He recently wrote some great blog entries about that here and here. We met with the product management team this week to brainstorm what sort of testing we want to do, what sort of data they might want to have from a marketing/sales perspective, etc. As Alex pointed out, it’s a tricky thing – this sort of testing always leads to finding bugs, which leads to bug fixes, which invalidates any prior testing and so you have to start over. Luckily, we already have a very capable distributed testing framework, developed in-house by Alex, in which we can pretty easily script tests with Groovy. We can have agents on multiple machines (i.e. L1 nodes, talking to a TC L2 server) and have the agents start workers to run tests. The agents can do things like kill and restart workers, to test having to repartition a distributed cache. Sounds like the first thing we’re going to measure is the load time and then the TPS (transactions per second) for a couple different kinds of distributed caches: ConcurrentHashMap and Ehcache.

We found out this week our next big company-wide gathering in San Francisco will be the week of Oct. 13-18. I’ve already book my flight and hotel room. I’m excited – these trips have so far been a lot of fun.

I did a phone interview for a candidate to join my team. Probably shouldn’t elaborate on that yet, but I will say that Terracotta is very thorough with candidates. When I interviewed back in January, I did five phone interviews, four of them with other engineers, before being invited to come out in person. When I did fly out, I was interviewed by another five people, including the CEO and CTO! Honestly, although it was exhausting, I had a great time! I loved being challenged by, and having conversations with, some very smart and talented people who have produced some amazing software.

New software this week: OmniGraffle, which I’ve heard from everyone is the only graphics editing software you need on a Mac. I’ve got a copy now which I will hopefully be using in the not-too-distant-future to write some more technical blog entries about Terracotta. Also, Alex encouraged us to try out FindBugs, including it’s Eclipse plugin here (update site). I’ve added both of these to my list of essential Mac software for the Terracotta developer.

Bash and TC Build Hacks I Learned in the Last Two Hours

There’s very good documentation about Terracotta’s in-house TC Build system already. But I’ve been doing some intense debugging with Hung, and have learned some things that I want to write down before I forget.

run without ivy: tcbuild blah blah --no-ivy – I’m assuming this runs faster because it skips using Ivy to check that all dependencies are in place.

run without compiling tcbuild --no-compile blah... when just shuffling some runtime dependency or something.

put environment stuff in .bashrc

check trunk/buildsystem to find things like jruby

For our automated container tests, individual jar files are placed in one huge WAR file. This is not true for ordinary unit tests.

Doing something like ./tcbuild check_one CustomScopedBeanTest --no-ivy > log.txt 2>&1 puts output in a file, and the last part redirects err stream to output stream.

Important shared stuff at /shares/terra/jdk/ such as Java, ant, etc

Grep trick 1: ps -ef | grep java to see details about Java processes running

Grep trick 2: env | grep JAVA to see environment variables I should have set up to run tcbuild

Grep trick 3: find <path> -name <filenamepattern> | xargs grep <searchstring> find all files matching filenamepattern that also have search string within them

find trick: rm -rf `find . -type d -name .svn` remove all .svn directories recursively

~/.tc/appserver is where tomcat is stored during automated tests – may want to remove as sanity check sometimes.

~/.ivy* is where ivy stuff is stored – may want to remove prior to doing total clean rebuild.

Weekly Summary – TC Spring again

This weekly summary actually encompasses the last three weeks. Sigh.

Lots of activity throughout dev is centered around the Terracotta 2.6 and 2.6.1 releases, as well as the upcoming 2.6.2 release.

Primarily I’ve been working on updating Terracotta’s Spring support to 2.5.x. Currently we only support up to 2.0.5. I had thought I had gotten it working up through Spring 2.0.8, but late last week we fixed a bug in our build process which then revealed three failing automated TC Spring tests which were previously (incorrectly) passing. So Spring 2.0.8 is not quite there…but close. Meanwhile, my compadre Nitin had made some changes that got TC working with Spring 2.5, but those changes are not backwards compatible to Spring 2.0.x, so I’m investigating whether they can be merged together somehow. Since we are dependent on the Spring source code in order to instrument their code (by using Aspectwerkz), we are subject to the whims of whatever source code changes occur between even minor releases (such as differences between Spring 2.0.5 and 2.0.8).

The other thing of note that I got accomplished was to respond to this post on our forums about a deadlock occurring in Terracotta L1. The poster had nicely laid it all out for us, with a stack trace excerpt clearly showing the deadlock. My teammates and I reviewed the pertinent class, and I cleaned up a number of synchronization bugs or missing synchronization. The deadlock itself was cleaned up by moving to a CopyOnWriteArrayList for a collection, which previously was being locked while iterating through it (read-only) and doing expensive stuff. The fix will be in 2.6.2 release.

I was without internet connection at my house a couple weeks ago for a few days. I had to do bloody battle with Charter to get that fixed. Ultimately a technician came and found that the line to my house had been put on a splitter at some undetermined point in the past, and so my signal strength was no longer strong enough. Meanwhile, luckily, I was able to go to my parents’ house and get some work done there. Have I mentioned that I love my MacBook Pro, and wireless internet?

Weekly Summary

Last week was shortened by jury duty on Monday. Fortunately, I was never selected from the pool, and on Tuesday I was back to work.

There are (still) a number of monkey failures (such as this one) that I need to get working on.

However, I discovered I could procrastinate tackling those by checking the forums. I decided to try to answer this post (which has since been addressed by a couple of my teammates). Almost two days later, I conceded that it’s really really hard to try to cluster the underlying javax.swing.text.AbstractDocument of a JTextField. I still haven’t got it. (See gkeim’s response for a clever workaround.)

I finished up the week by working on droid. One of my peers was having trouble running a test in which he wanted one of the spawned workers to have a different tc-config than all the others. There are some weird subtleties in passing vm arguments through the agent which are intended for the worker, but as it turned out, I believe the functionality is already there and didn’t require any changes on my part, just an explanation of how to do it.