Wednesday, August 10, 2011

97 Things Every Programmer Should Know

I just finished reading Oreilly's 97 Things Every Programmer Should Know. It was a very interesting compilations of essays from expert programmers. The book is very well done. They kept it short and to the point by having every essay to be no longer than two pages long. Here are some of my favorite:

Apply Functional Programming Principles - Edward Garson:
This caught my attention mostly because I am interested in learning Scala this year. Also, I'm currently working with stock trading and most of the principles resonate on what we do in an everyday basis at work. Additionally, some my Miami JUG colleges Luis Espinal and Jorge Fiallega, have emphasized on using these type of programming languages to leverage multicore.

Garson also mentioned that mastery of the functional programming paradigm can greatly improve the quality of the code you write in other contexts. If you deeply understand and apply the functional paradigm, you designs will exhibit a much degree of referential transparency.

Referential transparency is very desirable property: it implies that functions consistently yield the same results given the same input, irrespective of where and when they are invoked. That is, function evaluation depends less - ideally, not at all - on the side effects of mutable state.

The net result [using functional principles] is a design that typically has better responsibility allocation with more numerous, smaller functions that act on arguments passed into them, rather than referencing mutable members variables. there will be fewer defects, and furthermore they will often be simpler to debug, because it is easier to locate where a rogue value is introduced in these designs than to otherwise deduce the particular context that results in an erroneous assignment. This adds up to a much higher degree of referential transparency, and positively nothing will get these ideas as deeply into your bones as learning functional programming language.

The boy Scout Rule - by Robert C. Martin (Uncle Bob):
The boy scouts have a rule: "Always leave the campground cleaner than you found it". If you found a mess in the ground, you clean it regardless of who might have made it. I use a similar example, I have always compare bad code to a dirty parking lot. If you see a dirty parking lot, you are tempted to dirty it even more in contrast to a clean parking lot. Uncle Bob posts the question, "What if we all followed a similar rule in our code: Always check a module in cleaner than when you checked it out?". According to him, we would see the end of the relentless deterioration of our software systems. Instead, our systems would gradually get better and better as they evolved. We would see teams caring for the systems as a whole, rather than just individuals caring for their own small part.

Do Lots of Deliberate Practice - by Jon Jagger:
Delivery practice is not simply performing a task. If you ask yourself, "Why am I performing this task?" and your answer is, "To complete the task," then you're not doing delivery practice.

You do delivery practice to improve your ability to perform a task. It's about skill technique. Deliberate practice means repetition. It means performing the task with the aim of increasing your mastery of one of more aspects of the task.

Continuous Learning - by Clint Shank
I worked with a project lead that once told me, "I'm tired of constantly learning new languages, techniques, and tools". I thought to myself, "I think you are in the wrong line of work". The nature of our industry is that we need to keep learning. Although, this might be obvious, The moment that we stay stagnant we become stale. That is the reason why we have meet-ups and why we should always learn a new language. Here are some of the things that Shank mentions to keep us from being stagnant:
  • Read books, magazines, blogs. Twitter feeds and websites. If you want to go deeper into subject consider joining a mailing list or newsgroup.
  • If you really want to to get immersed in a technology, get hands on - write some code
  • Always try to work with a mentor, as being the top guy can hinder your education. Although you can learn from anybody, you can learn a whole lot more from someone smarter or more experienced than you. If you can't find a mentor, consider moving on.
  • Use virtual mentors. Find authors and developers on the web who you really like and read everything they write. Subscribe to their blogs.
  • Get to know the frameworks and libraries you use. Knowing how something works makes you know how to use it better. If they're open source, you're really in luck. Use the debugger to step through the code to see what's going on under the hood. You'll get to see code written and reviewed by someone really smart people.
  • Whenever you make a mistake, fix a bug, or run into a problem, try to really understand what happened. It's likely that someone else ran into the same problem and posted on the web. Google is really useful here.
  • A good way to learn something is to teach or speak about it. When people are going to listen to you and ask you questions, you'll get highly motivated to learn. Try a lunch-n-learn at work, a user group, or a local conference.
  • Join or start up a study group or a local user group for a language, technology, or discipline you are interested in.
  • Go to conferences. And if you can't go, many conferences put their talks online for free.
  • Long commute? Listen to podcast.
  • Ever run a static analysis tool over the codebase or look at the warnings of your IDE? Understand what they're reporting and why.
  • Follow the advise of the Pragmatic Programmers' and learn a new language every year. At least learn a new technology or tool. Branching out gives you new ideas you can use in your current technology stack.
  • Not everything you learn has to be about technology. Learn the domain you're working in so you can better understand the requirements and help solve the business problems. Learning how to be more productive - how to work better - is another good point.
  • Go back to school.


Deploy Early and Often - by Steve Berczuk
I really believe on Continuous Integration (CI), but I think I could do better. I have heard that in other big shops (Facebook, Googlde, and Netwflix) they have daily deployments. Here is Bercuk thoughts:
Debugging the deployment and installation process is put off until close to the end of a project. The installation/deployment process is the first step to having a reliable (or, at least easy debug) production environments. The deployment software is what the customer will use. By not ensuring that the deployment sets up the application correctly, you'll raise questions with your customers before they get to use your software throughly.

Starting your project with an installation process will give you time to evolve the process as you move through the product development cycle, and the chance to make changes to the application code to make the installation easier. Running and testing the installation process on a clean environment periodically also provides a check that you have not made assumptions in the code that rely on the development or test environments.

Don't be Afraid to Break Things - by Mike Lewis
This is something that I always encountered in many shops. Code that it's so bad that no one wants to touch it. Lewis' suggest not to be afraid of your code.
Who cares if something get temporarily broken while you move things around? A paralyzing fear of change is what got your project into this state to begin with. Investing the time to refactor will pay for itself several times over the life cycle of your project. An added benefit is that your team's experience dealing with the sick system makes you all experts in knowing how it should work. Apply this knowledge rather than resent it. Working on a system you hate is how anybody should have to spend his time.
Don't Repeat Yourself - by Steve Smith
Of all the principles of programming, Don't Repeat Yourself (DRY) is perhaps one of the most fundamental. the principal was formulated by Andy Hunt and Dave Thomas in The Pragmatic Programmer, and underlines many other well known software development best practices and design patterns. The developer who learns to recognize duplication, and understands how to eliminate it through appropriate practice and proper abstraction, can produce much cleaner code than one who continuously infects the application with unnecessary repetition.

Repetition in Logic Calls for Abstraction:
Repetition in logic can take many forms. Copy-and-paste if-then or switch case logic is among the easiest to detect and correct. Many design patterns have the explicit goals of reducing or eliminating duplication in logic within an application. If an object typically requires several things to happen before it can be used, this can be accomplished with an Abstract Factory or a Factory Method pattern. If an object has many possible variations in its behavior, these behaviors can be injected using the Strategy pattern rather than large if-then structures. In fact, the formulation of design patterns themselves is an attempt to reduce the duplication of effort required to solve common problems and discuss such solutions. In addition, DRY can be applied to structures, such as database schema, resulting in normalization.

A Matter of Principle:
Other software principles are also related to DRY. The Once and Only Once principle, which applies only to the functional behavior of code, can be thought of as a subset of DRY. The Open/Close Principle, which states that "software entities should be open for extension, but closed for modification," only works in practice when DRY is followed. Also the well know Single Responsibility, which requires that a class ahve "only one reason to change", relies on DRY.

How to Use a Bug Tracker - Matt Doar
I found many shops that do not have a good bug tracker. I really like Jira, and we are currently using it at work. But this is not enough, you specify a good "bug report". The lack of information can cause confusion or lack of credibility on the bug. For example, I have seen many bugs closed with "I can't replicate the bug", when in fact the bug was not well understood. This is what this essays is about.

A good bug report needs to convey three things:
  1. How to reproduce the bug, as precisely as possible, and how often this will make the bug appear.
  2. What should have happened, at least in your opinion
  3. What actually happened, or at least as much information as you have recorded

Improve Code by Removing It - Pete Goodliffe
There is direct correlation with lines of code and bugs. In here the practice of "less is more" is indeed useful. Pete writes about an anecdote regarding this exact problem.

I observed that the product was taking too long to execute certain tasks - simple tasks that should have been near instantaneous. This was because they were overimplemented - festooned with extra bells and whistles that were not required, but at the time had seemed like a good idea.

So I've simplified the code, improved the product performance, and reduced the level of global code entropy simply by removing the offending features from the codebase. Helpfully, my unit tests tell me that I haven't broken anything else during the operation.

Keep the Build Clean - Johannes Brodwall
I started a project for a client once and as soon as I started the server in prod there were a bunch of warning messages. When I asked the team leader about the warnings he replied: "those are OK warnings". We should keep the code as clean as possible. Like Broadwall says,
When I start a new project from scratch, there are no warnings , no clutter, no problems. But as the codebase grows, if I don't pay attention, the clutter, the cruft, the warnings, and the problems can start pilling up. When there's a lot of noise, it's much harder to find the warning that I really want to read among the hundreds of warnings I don't care about. To make warning useful again, I try to have a zero-tolerance policy for warnings from the build. Even if the warning isn't important, I deal with it. If it is not critical but still relevant, I fix it.
Again, the book is very well done and a quick read.

Thanks for reading the post.

Regards,
Marcelo

Monday, July 18, 2011

Testing and Groovin'

I've come to the conclusion that writing unit tests in Java is such a pain. Specially when there are others languages that are better suited for writing test. Lately, I've been writing all my test using Groovy. I have been using it for writing unit tests and creating mock objects. I am using EasyMock for most of my programming time, but I recently found out that Groovy has its own mock object framework. Here are some example from the site. I think that they are pretty compelling.


Domain Layer:




Service Layer:







This is the class that we are going to test. As you noticed, we are using the ExchangeRateService class.


Tests:

Here we will need to mock the ExchangeRateService. We are going to use a few example that Groovy can provide us:
- You can often get away with simple maps or closures to build your custom mocks
- By using maps or expandos, we can incorporate desired behaviour of a collaborator very easily as shown here:



This is an example of using it with closure:



If we need the full power of a dynamic mocking framework, Groovy has a built-in framework which makes use of meta-programming to define the behaviour of the service. An example is shown here:

As I mentioned, just like any other mocking framework, Groovy has its own framework build-in: "MockFor" which defines the behavior of the collaborator.


Let me know how it goes and thanks for reading!

Wednesday, July 6, 2011

How to sell your idea to your boss - HBR article

I had the fortune of working with some great CIOs and CTOs. Some of them where very technical savvy, while others where very business oriented. The best technical managers, in my opinion, are those that can align business with technology. Unfortunately, I worked for some really bad managers. Working for someone like that its very challenging. The worse is when he manages a technical team (programmers, sysadmins, etc) and claim to have some technical expertise when he clearly does not. This type of manager is very dangerous when he starts weighting on architectural decisions, programming preferences, and others. I worked for just this type of manager once and I learned a lot. The biggest challenge for me was when I had to pitch an idea - technical methodology (Scrum, Kaban, Agile) or having to learn a different language (Ruby on Rails, Groovy, or JRuby). HBR wrote about a similar scenerio: How to sell your idea to your boss. Here is what the article talks about:

Main point:
  • You are a middle manager
  • You see the big picture but your boss does not. He is just focused on the here and now
  • What do you do if your boss wants you to stay in your place?

First and foremost, do your job: Make certain that you do everything you are asked to do. Onced you have stablished yourself as a credible performer, there are three things you can do to give your big idea a better chance to succeed:

1 - Align you initiatives with the corporate objectives:
  • Whatever you propose must complement your company's strategic direction.
  • Build a business case for your idea by showing how your idea does not conflict with current priorities, but in fact supports them by planning for the future

2 - Work Through Your Boss:
  • Do not go around your boss
  • Walking through your plan and get his feedback
  • Incorporate his ideas if they are viable
  • Find ways for your boss to get some credit

3 - Build coalitions:
Things get done in organizations because people pull together to get the work done. The same goes for driving initiatives. Enlist the support of peers to help you get your idea of the ground. Leverage your customers; these are the people who will benefit from your idea. Frame your idea around serving their needs more comprehensively.

While I have seen these steps work, I have to admit that some bosses cannot be led. These are bosses who are typically very insecure in their positions and feel that creativity from below is a thread to their powers. Managers who step on others do not deserve to be in positions of authority, but as long as they wield authority as a weapon, their direct reports are best advised to keep their heads own and do as hey are told - that is, until they can find another outlet or opportunity for their talents and skills.

Management, when it works, is a reciprocal process. Bosses set direction, set objectives, and follow up to ensure that work is completed on time and on budget. Employees must ensure that they do what their boss asks then to do. But managers and employees work best when they understand both that good ideas can come from anywhere in the organization and that companies can capitalize on them are the ones that will succeed. Success depends upon those who not only think creatively but also have the skills to put their ideas into action.

Another key element in pitching new ideas is how they are framed. Nobel Prize - winning psychologist Daniel Kahneman and his collegues Amos Tversky demostrated that when choices are framed as gains versus no gains, decision makers tend to be risk averse. When choices are are framed as losses versus no losses, however, decisions makers tend to be risk seeking. So if you pitch idea by emphasizing what the company can gain (new customers, more profits, higher market share, etc) as a result of implementing the new idea, you may actually be making it less likely that the idea will be accepted. If you pitch it by emphasizing what the company will avoid loosing (ground to the competition, lower market share, etc) by implementing the new idea, however, you may find your boss more receptive to the idea.

Friday, July 1, 2011

Automation, monitor with Nagios, and passive checks


The first thing that we did during my previous company (a start-up) was to automate everything. Granted, there are some stuff that you cannot automate for whatever reasons, but you should try to automate as much as possible because of scalability. Once an automated process is setup, then the only thing pending is to make sure if it's running fine and if there are any warnings, if it is not running, etc. Nagios is a great tool for monitoring these type of things. There is almost an infinite amount of things that you can do with it. Previously, it was a pain to install, but lately, the installation is very straight forward.

In Nagios, there are two types of checks: active checks and passive checks. Active checks is a constant check - very much like heartbeat in high-availability architectures. The example that I always tell managers is a constant "are you OK?" conversation between the Nagios server and its clients. This is great if you need to have consistent check on a particular process. For example, uptime, drive space, CPU load, memory usage are processes that should be monitored every "n" time (every 5 minutes). For those process that are asynchronous and should be triggered by a particular event, then we use passive checks. In my case I had the following requirements:
  • Client(s) can provide me a file at anytime between 9 am - 6:30 pm
  • The file should contain a specific format
  • If the format is not valid, we need to contact the client
  • If the format is correct, persist it into the DB
  • Once it is in the DB, then we launch another process and performed some statistical calculations
Passive checks can be for a host or a particular service. In this example, I will covered the steps to configure a service.
  1. Install Nagios
  2. Configure a particular service for this component
  3. Create an external application that checks the state of the application (in my case I used Groovy and shell script - groovysh)
  4. Write to an external command file

This is the picture of the process for Nagios:


There are a few configurations that needs to be enabled to have the passive checks work in the nagios configuration (/usr/local/nagios/etc/nagios.cfg). Make sure that the followings are set to "1" (enable):
- accept_passive_service_checks=1
- check_external_commands=1

There should also be a "command_file" with some type of path. For example: command_file=/usr/local/nagios/var/rw/nagios.cmd

Then, configure the service check and enable the passive_checks_enabled. This will be done in the host configuration file (localhost in my case):
vi /usr/local/nagios/etc/objects/localhost.cfg



Restart Nagios:
sudo -i service nagios restart

You should be able to see the "asynch_client_files" service in the localhost. The next step would be to check if the passive check is working via the command file. The way that Nagios knows about any passive events is by writing into a file (nagios.cmd). The following parameters are needed:

where...

timestamp is the time in time_t format (seconds since the UNIX epoch) that the service check was perfomed (or submitted). Please note the single space after the right bracket.
host_name is the short name of the host associated with the service in the service definition
svc_description is the description of the service as specified in the service definition
return_code is the return code of the check (0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN)
plugin_output is the text output of the service check (i.e. the plugin output)

Temporarly, you can get the timestamp by doing the following command in linux: date +%s

To execute the test, we do the following in the terminal screen:
In this case, I had to login as root:

If you go back to the Nagios site: http://localhost/nagios you should be able to see that the new change has been done and now the service shows a status of "OK" with the comment "The security master is up to date"

The only thing missing is configuring an application that sets the status of the page. You can use the language of your choice and use crontab or something else to execute the application.

Thursday, June 23, 2011

Getting your Groovy on! - Groovy 1.8

I started working on the new version of Groovy 1.8. Here is a brief description of how you can start using Groovy and the new features of Groovy 1.8. Although, this post has some basic information about Groovy, it is not a "getting started" guideline. If you are new to Groovy, I highly recommend you to go to the following sites. They cover the language in great detail and provide some good documentation:

- Bazhidar Batsov has a great post on Groovy
- Groovy has a great Getting Started Guideline

Installation and Getting Started

- Download the binary distribution and unpack it into some folder (i.e. /groovy-1.8.0)
- (Optional) Create a symbolic link to the groovy directory and I name it "/groovy".
- Set your GROOVY_HOME environment

if you do a "ls -l /groovy" you will get the following:

In my application I do the following (MacBook Pro using TextMate):

To make sure that you have the latest installation, do the following:


Most people I know work with Eclipse. Groovy has a plugin for Eclipse and a tutorial to upgrade the Groovy version inside your IDE.

Using Arrays/List:


SQL and Groovy:


Which will return the following:
-- Marcelo --
-- Antonio --
-- Jorge Luis --
-- Nestor --

In the new Groovy 1.8, there is a feature for pagination:


Where the first two states the starting point and the next two states the max record to fetch.
Output:
XTO (20040809)
DE (20040809)

Tests are perfect for Groovy. You can use the latest JUnit to run all the unit test cases as do in Java. Below I will be using some of the example using JUnit test:

Closures and Arrays/List:
Different people use Groovy in different ways, but everyone agrees that the collection API for Groovy is very powerful. Below are more examples of using List or Arrays along with closures:

As you can see, the each is a closure in Groovy and it iterates through the list. The output is the following:
3
4
5
6
7
11

Closure composition
Closure composition is to be able to combine closure with one another. In some cases you want to combine two type of functions. For example:


Trampoline
Groovy used to have error when using closures in recursive algorithms. They were able to resolve this issue with trampoline. Here is an example of using the factorial function.


Memoize Cache
Memoize is a very interesting tool for Groovy. From the Groovy's documentation:
The return values for a given set of Closure parameter values are kept in a cache, for those memoized Closures. That way, if you have an expensive computation to make that takes seconds, you can put the return value in cache, so that the next execution with the same parameter will return the same result – again, we assume results of an invocation are the same given the same set of parameter values.
There are three forms of memoize functions:
- the standard memoize() which caches all the invocations
- memoizeAtMost(max) call which caches a maximum number of invocation
- memoizeAtLeast(min) call which keeps at least a certain number of invocation results
- and memoizeBetween(min, max) which keeps a range results (between a minimum and a maximum)

Here is an example:

output is:
Fri Jun 24 13:29:15 EDT 2011
Fri Jun 24 13:29:16 EDT 2011
Fri Jun 24 13:29:16 EDT 2011
Fri Jun 24 13:29:17 EDT 2011

Maps and Default Value:
Maps now can have a default value (similar to Ruby). In the case below, if we needed to count the frequency of words in a sentence, we can start a map and store the words and the counter. By configuring the default counter, all the words will have a 0


Groove on brother!

Wednesday, June 15, 2011

Error svn: Can't create directory in Subversion SVN

I created a respository in svn by doing the following:
svnadmin create clientconnector

But when I tried to connect and create the infrastructure (branches, tags, trunk), I got the following error:


I knew that I need to check the permissions of my connections. Now, I'm using Apache to handle all my repository permissions.

This can be found in the following section:
/etc/httpd/conf.d/subversion.conf



Where all my user's information is stored in the /etc/svn-auth-conf file.

When I checked the directory for my repositories: /data/svn I noticed the following:


The problem is the group. By doing the following I was able to solved the problem with the privileges.

chown -R apache.apache clientconnector

You shouldn't have to restart your apache server, but in case that this doesn't work, restart apache by doing: service httpd restart

Friday, June 10, 2011

When to refactor your code

Like many of my peers, I've worked in different shops and in different domains. First of all, I try NOT to be "that guy" that wants to change everything over night. The number one rule of any developer after finding a good mentor is to become a domain expert. Then, you can start writing value added code. However, there have been times that I need to refactor a piece of core code but the manager or team leader completely denies the need to do the task. Usually, it happens when the managers are either not programmers or the project has a very tight deadline. I usually suggest that it needs to be done since the code is just keep getting "smellier" and follow up with the value added principles of refactoring code: (easy to read, test, DRY, scalable, etc). However, I always leave it up to them to make the decision. This type of attitude is extremely frustrating, specially if you are a full-time employee and if the majority of the other developers want to refactor the project.

The other day I was listening to Standford's Entrepreneurial Thought Leaders. If you haven't listen to them, I highly recommend it. One of my favorite interviews was with David Heinemeier Hansson from 37 Signals and one of the founders of Ruby on Rails. In his speech, he basically said that working for a bad boss was a good thing. Because you learn how NOT to do things.

In one of my early projects, I worked with some six developers. A senior manager, demanded that the project would be done in six months. This was before Scrum or any Agile methodology, or even DDD or TDD was in place. We took the approach of divide and conquered. We separated between teams of threes to work on back-ends, front-end, third-party applications, etc. After six months we tried to integrate the project with all our code. As you can imagine, the project was a mess and a complete failure. It had duplicate code everywhere, specially for "utilities" classes. It was bloated, slow, and it was not what the customer wanted.

I'm sure that many of us can relate. We have all been on a project or a contract, where the head of the department or the team lead, decides not to refactor because it will jeopardize the deadline of the project. This is a constant tug of war that I have with some individuals. We need to take a stand and explain to these leaders that we cannot keep kicking the can. The new buzz word for this is "technical debt". There are sometimes that we can have some type of technical debt, but as I said before, there are other times that the code is just screaming for a refactor.

That's why I like the comparison of software development to gardening. You can plant new plans in your garden, but then you have to clean it up once in a while to remove all the weeds. This is the refactoring part.

Here is a list that I have made to know when we should refactor a piece of code:
  • Not able to test
  • Very hard to read
  • Does too many things
  • Too big/bloated classes or methods
  • It is mutable when it shouldn't
  • When the code is not DRY
  • Method has over 5 parameters
  • Too many nested loops or if statements
There is a great quote that I found in wikipedia:
By continuously improving the design of code, we make it easier and easier to work with. This is in sharp contrast to what typically happens: little refactoring and a great deal of attention paid to expediently adding new features. If you get into the hygienic habit of refactoring continuously, you'll find that it is easier to extend and maintain code. -- Joshua Kerievsky, Refactoring to Patterns

Friday, June 3, 2011

Getting your equals right

I am still amazed when some of the programmers either don't use the equals properly or just don't use it at all. Here is some of the examples of when to use it and the rules of thumb according to Joshua Bloch from Effective Java (my favorite Java Book).




The class is very simple, the user passes a total amount of shares and the shares that are allocated to that total. Pretty straight forward. Here is the test case.

By following these rules, here is t


This does works. But we can do better. We can compared the objects and see if they are indeed correct. This can also helps us in future projects. As you can see, the class is immutable, which means that we can easily share this object without compromising the state. We can do this by overriding the "equals" method in the Share's class. But first, we need to remember the rule that the equals method implements the equivalence relation. It is:
  • Reflexive: an object must be equals to itself
  • Symmetric: any two objects must agree on whether they are equal
  • Transitive: if one object is equal to a second and the second object is equal to a third, then the first object must be equal to the third.
  • Consistency: if two objects are equal, they must remain equal for all the time, unless one (or both) of them is modified
  • Non-nullity: all objects must be unequal to the invocation of o.equals(null)
Here is the class with the equals override:





Now we can compared the actual objects:


Much better, but according to Item 9 in Effective Java, we should always override hashCode when we override equals. Failure to do so will result in a violation of the general contract of Object.hashCode which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and HashTable. The good thing is that Eclipse can be a great tool. By just clicking on Source code > Generate hashcode and equals you get this:

Thursday, May 19, 2011

Start with Jenkins CI #win!

I wanted to start using Jenkins for quiet some time. I used Continuous Integrations in other projects, and they are absolutely helpful! To start Jenkins was very easy thanks to their great website. These are the steps that I used to installed Jenkins in my CentOS release 5.6 box. Jenkins automatically gets installed in the port 8080, but I have something running there and I wanted to keep record of where all my services are running so I changed that to 8009.

1 - Installation:

2 - changed the configuration by going into /etc/sysconfig/jenkins and edited the default port JENKINS_PORT="8089"

3 - restart or start jenkins by doing a "service jenkins restart" and monitor the log /var/log/jenkins/jenkins.log in case there are any errors

4 - go into your server: http://localhost:8089

Friday, March 4, 2011

This is not your grandma's iBatis...is MyBatis!

I am currently starting a new project with one of my client. The previous team created their own ORM (ugh). I've noticed that they were not closing any SQL connections which is causing performance issues. Clearly, I need to refactor the ORM class. I wave several options:
For big starting projects (big green projects), I usually prefer Hibernate. Even though it is heavy and resource intensive, once you set it up, it is easy to use and maintain. It is also a very mature ORM. I usually choose this tool for major re-factoring and if the project has a lot of domain object and non-complex SQL queries. For small projects, I usually prefer MyBatis (previously named iBatis). It is light and if the company has complex SQL you can just map the SQL to objects via "mappers".

This particular project, it does not have any domains and have some complex queries, so I decided to use it. However, when I went to the site, I noticed that they have moved the code and not only they changed the name (it is now mybatis instead of iBatis) but that the API completely changed. I also learned that the connection template for Spring changed. Nevertheless, I still thought that Hibernate was going to be an overkill, so I decided to give MyBatis a shot. Here is some of the the stuff I learned about the new API and its connection with Spring. The following is my software stack for this project:
  • Java 1.6
  • Spring 3.0
  • mybatis 3.0
  • mybatis-spring 1.
I will show the following on this post:
  1. Configuration and Mappers
  2. Connection via Spring
  3. Testing using spring
  4. Log4J configuration
Configuration and Mappers
Nothing much has change here with the exception of the XML headers. The aliases and the mappers do behave the same. As you noticed, I do not include any of the parameters for the connection. This will be handle by Spring.



The domain that I will be using for the mapping is the following Destination class.


The mappers have some changes to it compared to the previous iBatis. In here, I used the constructor since my destination class is immutable. As always, the select point to the mapper so that what it fetches the record(s) it know how to map the POJO.


My repository layer is very simple. As you noticed, I'm using the Repository and Qualifier to be able to autowired it.

The database configuration is basically the same with the exception of the sqlSessionFactory. As you can see I have two properties: datasource and configLocation. The configLocation is where my config.xml file is located.

There is nothing that I need to do with the DAO/Repository context since it is using the component-scan:

My DAO test is the following:

Wednesday, February 23, 2011

My rant about legacy code

"Ya ni llorar es bueno", this is a Mexican saying that translates to "it's not even worth crying anymore". I thought about this saying after a talked with few companies and friends regarding legacy application, specially legacy code that directly hit the ROI of the company. To add some context around this post, let me explain what does "legacy code" means to me:
  • It is using a technology relatively "old" (+5 years)
  • It is in production
  • The application plays an important role in the company

I have worked on a few of these applications and I hate most of them! Not because the they are hard but because of politics. It has been my perception that managers are "gun shy" when it comes to handle the re-write of legacy code when they are clearly out of control. I am not sure why this is, but this are some my reasons based on my interviews with tech leaders and developers:

  1. Time: aah, the Pandora box of managers. Lets face it, it is hard to estimate software projects, and worse the re-factoring of a major legacy code. The stakes are high if the re-factoring goes wrong. Managers have mentioned that these refactoring are a double edge sword because it can be really good, or it could eventually damage their reputation and the morale of the team.
  2. Resources: many manager are fighting with other departments when it comes on resources/programers. Many companies are trying to innovate and create new applications to latch on to the new business models. These projects take priority over the legacy systems that they have in place (if it's not broken, don't fix it).
  3. Domain experts: many of the large dev shops (+20 developers) have two types of development departments: engineer and support. The engineers are senior developers with at least 5 years of expertise. They develop the application based on the specs of the project manager and product owners. The support team are a mixture of junior and junior-senior developers. They take over the code when the engineers deploy the application to production. The problem with this model is the lack of mentoring/couching of the junior team. The support team does not have the know-how or the domain expertise as the engineers. The consequences are the introduction of bad code or bad practices when they need to add a feature or fix a bug.

Some of the dark side of the legacy codes are the following:
  • Bloated controllers and DAOs
  • Business logic is ALL over the code (DAO, stored procedures, controllers, DTO, the list goes on)
  • Silo effect - just a few developers know about the application
  • Fragile code - tightly coupled code
  • Bug identification and turn-around time is large
  • Application is slow

I doubt that I can get the right answer to this problems, but here is my advise to current and aspiring managers regarding this matter:
  1. Bite the bullet: if the legacy application is a major part of your business, don't wait any longer. Get a grip on this application before it get any worse and start doing a re-factoring by doing small iterations (not longer than 2 weeks old).
  2. Candid conversations: ask the tough questions regarding the application like "does the architecture needs to be changed?" Have all the developers review the current app and see what should be changed. Identify the large parts of the code to change (architecture) and what are the smaller (low hanging fruits) projects that can give more momentum to the project.
  3. Go agile! An agile methodology like XP, Scrum, and Kanban add great value to projects specially like pair programming, test-driven development (tdd), domain-driven development (ddd), and continuous integration (CI). The greatest thing about Agile is its rapid time to market. Small iterations means that your customers can see what you did in a couple of weeks and get their feedback. This also provides early risk reduction since you can find bugs relatively quickly. Pair programming helps mentor junior developers and avoid the silo effect. TDD guarantees that any code added to the application is tested before shipping it to production and gives better quality to your customers. DDD adds "depth" to your code and isolates all the business logic in one area of the code. This way, any developer know that if there is a change in the business logic, he/she needs to look for the domain packages. Finally, CI (building pipelines) it is primarily focused on asserting that the code compiles successfully and passes the suit of unit and acceptance tests (including performance and scalability tests).
Legacy code is a dreaded word for developers. The fact is that no developer wants to develop in Java 1.4 or Struts 1.x. If your want a company to attract good talented individuals, try to handle your legacy code.

Again, I'm sure that I missed or don't understand ALL the reasons why so many companies have large legacy code in their core systems, so please...I welcome your thoughts in this matter.

Thursday, February 17, 2011

My take on Ruby and Java

Following the advise from Debasish Ghosh, I put my list of new languages to learn and one of them was Ruby.

Why Ruby?
  • Many web developers have told me great deals about the language and aboutRails.
  • Puppet is a system management that I want to learn and it uses Ruby.
  • Some of my pending books use Ruby for its examples - Programming Amazon Web Services
  • I have a few projects in my pipeline that requires a lot of code with basic funcionality (CRUD) and a nice GUI
  • I want to avoid the "gold hammer" antipattern

Questions that I had up-front:
  • Is the syntax that nice? EVERY SINGLE Ruby guy swears by the syntax. What is so sexy about the syntax?
  • How slow is it? I have been hearing a lot of rumors regarding the horrible GC of Ruby. Java programmers have blog that JRuby was faster than Ruby
  • Multithreaded - green threads? Heard a lot from colleagues that Ruby is not well equipped for multithreaded
  • How long would it take me to learn it? Try to measure the learning curve.
  • Where would I use this in real life? Where are the best places/projects that I could use Ruby and most important which ones to avoid (Twitter).
  • How does it compare to Java? I'm sure that there are some pros and cons regarding the syntax, but what is difference of using Groovy, JRuby, or Scala vs using Ruby?

Analysis
Syntax
Indeed, the syntax is very nice. It is very similar to Groovy. It does has a few features in the language that I haven't seen in others. Here are some examples:

Virtual Attributes
I created a BookInStock class along with a few required parameters, then I wanted to get the price in cents.
class BookInStock
attr_reader :isbn, :price
attr_writer :price

def initialize(isbn, price)
@isbn = isbn
@price = Float(price)
end

def price_in_cents
Integer(@price*100 + 0.5)
end

def price_in_cents=(cents)
@price = cents / 100.0
end
end

book = BookInStock.new("isbn1", 33.80)
puts "Price = #{book.price}"
puts "Price in cents = #{book.price_in_cents}"
book.price_in_cents = 1234
puts "Price = #{book.price}"
puts "Price in cents = #{book.price_in_cents}"

Result:
Price = 33.8 Price in cents = 3380
Price = 12.34
Price in cents = 1234

Here we've used attribute method to create a virtual instance variable. To the outside world, price_in_cents seems to ben attribute like any other. Internally, though, it has no corresponding instance variable.

Default Hash
Lets say that you need to count the amount of words in a file. The way that would do the counting, is that each line would be treated like an array of words. Create a hash, check if the hash has the word, if it doesn't then add the count of 1; otherwise, increment it.

if(counts.containsKey(word))
counts.put(word, counts.get(word) + 1);
else {
counts.put(word, 1);
}

In Ruby you can create a Hash.new(0), the parameter (0 in this case) will be used as the hash's default value - it will be the value returned if you look up a key that isn't yet there.

def count_frequency(word_list)
counts = Hash.new(0)
for word in word_list
counts[word] += 1
end
counts
end

Regular Expression
You can give name to the pattern and retrieve the value of the matched regular expression
pattern = /(\d\d):(\d\d):(\d\d)/
string = "It is 12:34:56 precisely"
if match = pattern.match(string)
puts "Hour = #{match[:hour]}"
puts "Min = #{match[:min]}"
puts "Sec = #{match[:sec]}"
end

The result is the following:
Hour = 12
Min = 34
Sec = 56

Lambda
Lambda is like a call done assuming that it is like a method:
Form example:
say_hi = lambda {|a| "Hello #{a}"}
say_hi.("quintin")
Will return, "Hello quintin"

Another example, lets say that you have to calculate ten consecutive numbers but the calculation is based on a parameter passed by the user (multiplication or addition).
if operator == :multiplication && number != nil
calc = lambda{|n| n * number}
else
calc = lambda{|n| n + number}
end

puts ((1...10).collect(&calc).join(","))

Here, line 7 does the iteration of each of the 10 numbers and uses the appropriate calculation based on the parameter and passes each number to it. Then, it joins each number with a comma.

Slow?
The Ruby team has done a lot to its virtual machine, but (coming from a static typed language perspective) it is still hugs a lot resources.

Multithreaded
In the previous version of Ruby (1.8), the way that it handle the threads was through its VM. This process is called green threads. In Ruby 1.9, threading is now performed by the operating system. The advantage is that it is available for multiprocessors. However, it operates only a single thread at a time.

Learning Curve
If you are using Groovy, Python, or any similar dynamic language, the learning curve is almost null. The only problem is the lack of good support for editors. NetBeans used to be a very nice editor, but they have decided to discontinue the support for Ruby. I've been a Mac guy for quiet some time now, so I've been using TextMate and it worked great.

Where should I use it?
I try to "pigeon hole" the applications carefully before me or my team decide which type of language to use. For example, if the application needs to be highly efficient with a large amount of transactions, and performance needs to be immediate, I do NOT trust Ruby (sorry). I would turn to Java, Spring, Hibernate/iBatis, EHCache stack. However, if the application is a quick and simple CRUD pages with low usage like a series of admin pages, then Ruby would be my choice.

Comparing Ruby to Java
It is worth to keep Ruby in your toolbox in case you need to do quick scripts or sites. I did end up learning different things, like Ruby's nice components. For example, if you would like to get the top-five words in the previous word count example, you can just do the following:
sorted    = counts.sort_by {|word, count| count}
top_five = sorted.last(5)
The sort_by and inject are two handy component. Here are some other examples:
[ 1, 2, 3, 4, 5 ].inject(:+) # => 15

( 'a'..'m').inject(:+) # => "abcdefghijklm"

Also, being a TDD guy, I really enjoyed the Behavior-Driven Development or BDD. Based on the books and forums that I've read, this is the Ruby community's choice of tests. It encourages people to write tests in terms of your expectations of the program’s behavior in a given set of circumstances. In many ways, this is like testing according to the content of user stories, a common requirements-gathering technique in agile methodologies. Some of the frameworks are RSpec and Shoulda. With these testing frameworks, the focus is not on assertions. Instead, you write expectations.

The class that I created is a class for tennis tournament:
class TennisScorer
OPPOSITE_SIDE_OF_NET = {:server => :receiver, :receiver => :server}

def initialize
@score = { :server => 0, :receiver => 0 }
end

def score
"#{@score[:server]*15}-#{@score[:receiver]*15}"
end

def give_point_to(player)
other = OPPOSITE_SIDE_OF_NET[player]
fail "Unkown player #{player}" unless other
@score[player] += 1
end

end

The test is the following:

require 'simplecov'
SimpleCov.start

require_relative "tennis_scorer"

describe TennisScorer, "basic scoring" do
let(:ts) { TennisScorer.new}


it "should start with a score of 0-0" do
ts.score.should == "0-0"
end

it "should be 15-0 if the server wins a point" do
ts.give_point_to(:server)
ts.score.should == "15-0"
end

it "should be 0-15 if the receiver wins a point" do
ts.give_point_to(:receiver)
ts.score.should == "0-15"
end

it "should be 15-15 after they both win a point" do
ts.give_point_to(:receiver)
ts.give_point_to(:server)
ts.score.should == "15-15"
end
end

As you can see you add context to your test and check if the solution is fine. Executing the tests echoes the expectations back and validates each test.

How slow is it?
Dave Thomas, author of the book Programing Ruby 1.9 version 3 wrote the following in his post regarding the re-factoring of the code of Twitter from Ruby to Scala:

At the kinds of volumes that Twitter handles (and with what I assume is a somewhat scary growth curve), Twitter needs to improve concurrency—it needs an environment/language with low memory overhead, incredible performance, and super-efficient threading. I don't know if Scala fits that particular bill, but I know that current Ruby implementations don't. It isn't what Ruby's intended to be. So the move away is just sound thinking. (I suspect it also took some courage.) I applaud Alex and the team for this.

Instead of defending Ruby when it's clearly not an appropriate solution, let's think about things the other way around.

The good folks at Twitter started off with Ruby because they wanted to get something running quickly, and they wanted to experiment. And Ruby gave them that. And, what's more, Ruby saw them through at least two rounds of phenomenal growth. Could they have done it in another language? Sure. But I suspect Ruby, despite the occasional headache, helped them get where they are now.


Conclusion
Finally, I would recommend learning Ruby and I would definitely keep doing more stuff with it. Although there are some characteristics similar to Groovy, Python, and others dynamic languages, it does has some nice different features. Also, the Ruby community is very vibrant/active. There are thousands of programmers building packages/APIs or gems.

The way to load the API is very similar to the "yum" command in Linux. Let say you need to do the following:
  1. Connect to a GMail account
  2. Check for e-mails that have attachments
  3. Do some type of business logic
  4. Send an e-mail with your results
I could create everything from zero, but instead I was able to find a nice little gem named "gmail". I just did "gem install gmail" and voila, I got the API! I also wanted to use MongoDB for a Ruby on Rails (RoR) project and a podcast explain to me how to do it.

Again, this is just the beginning but it was a really nice experience and remind me back of why I got into programming. The challenge and the unknown is what drive must of us on finding better solutions.

Wednesday, February 2, 2011

Persisting Tuning

Persisting tuning has been drilled into my head early in my career and I understand the term, "earlier is cheaper". At the beginning of my career I worked for a major bank on a data warehouse. I worked as a junior developer along side with some pretty solid data architects. I learned a lot about databases. Specially, that they are notorious for bottlenecks in applications. Later in my career, I worked on web development projects. I found love with the Spring framework and ORM (Hibernate and Ibatis). Here are my thought regarding a presentation done by Thomas Risberg, a senior consultant from SpringSource, he stated the following:
There is no silver bullet when it comes to persistence tuning. You have to keep an open mind an approach the problem systematically from a variety of angels
The presentation did not touch on "big data" and the NoSQL movement. However, there is still a lot of good stuff in it, specially if you are using Java, Spring, Hibernate, and an ODBC.

Currently, I have been working on an application that needed to support up to 200 messages per second. Below are a few strategies and process that I implemented to try to increase three major task:
  1. Performance: response time needs to be in millisecond
  2. Scalability: able to hold 200 message per second
  3. Availability: able to scale horizontally (not buying a bigger box, but getting a similar box)
DBA - Developer Relationship
When creating a database, you have two tasks as a DBA:
Operational DBA:
  • ongoing monitoring and maintenance of database system
  • keep the data safe and available
Development DBA:
  • plan and design new application database usage
  • tune queries
The Operational DBA role is the following:
  • Data volumes, row sizes
  • Growth estimates, update frequencies
  • Availability requirements
  • Testing/QA requirements
Development DBA role is the following:
  • Table design
  • Query tuning
  • Maintenance policies for purging/archiving

Database Design:

Database design can play a critical role in application performance

Guidelines:
  • Pick appropriate data types
  • Limit number of columns per table
  • Split large, infrequently used columns into a separate one-to-one table
  • Choose your index carefully - expensive to maintain. improves query
  • Normalize your data
  • Partition your data

Application Tuning

Balance performance and scalability concerns. For example, full table-lock could improve performance but hurt scalability

Improve concurrency
  • Keep your transactions short
  • Do bulk processing off-hours
Understand your database system's locking strategy
  • some lock only on writes
  • some locks on reads
  • some escalate row locks on table locks

Performance improvements
Limit the amount of data you pull into the application layer
  • Limit the number of rows
  • Select only the column you need

Tune your ORM and SQL

  • consider search criteria carefully
  • avoid NULLS in the where clause - NULLS aren't index
  • avoid LIKE beginning with % since index might not be used

Spring Data Access Configuration:

Pick a transaction management that fits your application needs
  • Favor a local resource DataSourceTransactionManager
  • Using a container managed DataSource can help during Application Server support calls
  • JtaTransactionManager using XA transactions is more expensive. Warning: XA transaction sometimes needed whenJMS and database access used together
Why XA are more expensive? Because of its setup and its overhead
Setup includes:
  • run a transaction coordinator
  • XA JDBC driver
  • XA Message Broker
  • put the XA recovery log to a reliable log storage
XA has a significant run time overhead
  • two phase commit
  • state tracking
  • recovery
  • on restart: complete pending commits/rollbacks --> read the "reliable recovery log"

General transaction points:
  • Keep your transactions short to avoid contention
  • Always specify the read-only flag where appropriate, in particular for ORM transactions
  • Avoid SERIALIZABLE unless you absolutely need it

JDBC Tuning
Connection to the database is slow - use a third-party connection pool like DBCP, C3PO or native one like oracle.jdbc.pool.OracleDataSource. Never use DriverManagerDataSource

Improve availability by Configuration the DataSource to survive a database restart. Specify a strategy to test that connectionare alive
  • JDBC 4.0 isValid()
  • simple query
  • metadata lookup
Consider a clustered high-availability solution like Oracle RAC


Use Prepared Statements
  • Use prepared statements with placeholder for parameters - like select id, name from customers where age > ? and state = ?
  • Prepared statements allow the database to reuse access plans
  • Prepared statements can be cached and reused by the connection pool to improve performance
  • The JdbcTemplate can be configured using setFetchSize
  • Larger fetchSize let you lower the number of network roundtrips necessary when retrieving large results
Favor query() methods with custom RowMapper vs. queryForList().
queryForList() uses a ColumnMapRowMapper which generates a lot of expensive HashMaps

private JdbcTemplate jdbcTemplate;

public void setDataSource(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);
}

public List> getList() {
return this.jdbcTemplate.queryForList("select * from mytable");

Using column index instead of column label does give a slight speed advantage but it makes code harder to read and maintain

public Actor findActor(String specialty, int age) {

String sql = "select id, first_name, last_name from T_ACTOR" +
" where specialty = ? and age = ?";

RowMapper mapper = new RowMapper() {
public Actor mapRow(ResultSet rs, int rowNum) throws SQLException {
Actor actor = new Actor();
actor.setId(rs.getLong("id"));
actor.setFirstName(rs.getString("first_name"));
actor.setLastName(rs.getString("last_name"));
return actor;
}
};


// notice the wrapping up of the argumenta in an array
return (Actor) jdbcTemplate.queryForObject(sql, new


ORM Tuning
  • Don't load more data than needed - use lazy loading
  • Can however result in many roundtrips to database - n+1 problem
  • Avoid excessive number of database roundtrips by selective use of eager loading
  • Consider caching frequently used data that's rarely updated
  • Learn specific fetching and caching options for your ORM product
Determine a good fetch strategy (Hibernate)
  • use "select" fetch mode for relationships needed sometimes - result in 1 or 2 queries
  • use "join" fetch mode for relationships needed all the time 0 limit to single collection for one entity since the results for multiple joins could be huge
  • Fetch mode can be specified statically in the ORM mapping or dynamically for each query
  • Capture generated SQL to determine if your strategy generates expected queries
  • Use to prefetch a number of proxies and/or collections
  • Caching allows you to avoid database access for already loaded entities

Enable shared (second-level) cache for entities that are read-only or are modified infrequently
  • Include data when not critical it's kept up-to-date
  • Data that's not shared with applications
  • Choose an appropriate cache policy including expiration policy and concurrency strategy (read-write, read-only ...)

Consider enabling query cache for query that is repeated frequently and where the referenced table aren't updated very often

More on Hibernate: "Working with Hibernate with Spring 2.5" with Rossen Stoyanchev


Bulk Operations
  • Perform bulk operations in database layer if possible
  • SQL statements - update, delete
  • Stored procedures
  • Native data load tools
  • From the application layer - use batch operations
  • JdbcTemplate - batchUpdate
  • SimpleJdbcInsert - executeBatch
  • SpringBatch - BatchSqlUpdateItemWriter
  • Hibernate - set hibernate jdbc.batch_size and flush and clear session after each batch

SQL Tuning
SQL is the biggest bottleneck when it comes to performance problems
  • Capture the problem SQL
  • Run EXPLAIN
  • Make adjustments
  • ANALYZE
  • Tweak optimizer
  • Add index
  • Repeat until adequate performance

Capture SQL Statements
  • Use JPA - add to LocalContainerEntityManagerFactoryBean
  • Using Hibernate - add to LocalSessionFactoryBean
  • Alternative using Hibernate with Log4J

Database Specific Tools:
- MySQL has a Slow Query Log

---log-slow-queries and --log-queries-not-using-index
Analyze Tables and Indexes
  • Use ANALYZE to provide statistics for the optimizer (Oracle)
  • Other database use similar commands
  • Learn how the optimizer works for specific database
Summary
  • Capture your SQL and run Explain/Autotrace on the slowest statements
  • Review your DataSource and Transaction configurations
  • Work with your DBAs to tune applications and database

Back at it!

I have not been able to update my blog. Towards the end of last year, my daughter, Juliana Olivas, was born. Therefore, I'm now trying to change the world one diaper at a time. Also, I'm trying to stay awake and keeping up with my current job at Up-Mobile. I am hoping to start updating my blog this week.