Wednesday, February 29, 2012

My two cents on failure

Dan North once said: "Fear leads to risk, risk leads to process, process leads to hate...and suffering and Gantt charts." I've worked for many companies (banks, financial institutions, and government) and most of them do not have the type of tolerance for failure. As a matter of fact, they encourage a "blame game" culture - people point the finger at the person who makes the failure. In these companies I was told that if you are the person that is given more bad news than good news to the boss, then you will be labeled as incompetent. The first thing out of my boss mouth when he noticed a problem/failure was, "Who did it? Who's responsible for that? Who's fault is it?" In these companies, if there was something wrong, people would fix it and didn't tell anyone. This created a nightmare when there were bugs on the system because no one would admit to the problem or how it happened. For the last five years, my career took a different turn. I joined a startup and eventually I end up in management, and I found a different paradigm, "failure is part of the business."

According to Harvarv Business Review, there are different type of failures:
  • Preventable: this is when it could have been prevented. For example, testing in production, deploying code before testing it, logged in as root and doing an "rm -rf /".
  • Complexity related: a large number of organizational failure are due to the inherent uncertainty of work: A particular combinations of needs, people, and problems may have never occurred before. Although serious failures can be averted
  • Intelligent: failure when experimentation is necessary. An example will be like "spike" in Agile.

Intelligent failures are normal in startups because you are trying something that hasn't been tried before. The company IDEO has the slogan:
Fail often in order to succeed sooner
This is something that not so many companies have. Entrepreneurs embrace failure and uncertainty. In startups, there is just too much uncertainty and as Steve Blank puts it,
In a startup no business plan ever survives first contact with customers.
So what happens if the original business plan fails? An entrepreneur adapts! He/she learns what went wrong, makes adjustments, and then monitors the new business plan. This means that failure is a natural part of doing business.

I am sure that all these previous businesses are aware of this issue. Obviously, they want to learn from their mistakes and avoid them in the future. But more often than not, bosses get irritated when a person presents them a problem. This is the difference between startups and stablished companies. As per HBR, companies can learn through three different activities:
  1. Detection: the should be to surface failure early.
  2. Analysis: find out what is the root-cause of the problem, or in agile terms: root-cause analysis (RCA).
  3. Experimentation: try to generate intelligent failure for the express purpose of learning and innovation.
It is rare for a business to be an overnight success. There are just a few FaceBook and YouTubes out there. It is going to be hard, and for many failure is the beginning of the journey. As per North, "uncertainty isn't just about expecting change, it's assuming that some unexpected bad thing will happen during the project and we can't even know about them when we start."

Friday, February 17, 2012

Implementing the "No Asshole Rule" at work

I'm usually the person that does the interviews when we have a programming position. There are three criteria that I look for when I interview potential candidates:
  1. Technically savvy - can the candidate do the job? Does he/she has a good foundation on computer science (data structures, algorithms, etc)
  2. Quick learner or persistent when it comes to finding a solution.
  3. Is he/she an asshole
As I mentioned before, my interviews are very technical. For the first criteria, I start with a phone interview and present problems that deal with algorithms and basic fundamentals of computer science. If the candidate does well, then I schedule a personal interview and give them a small project to bring to the company. In person, we do a code review of the project along with the team and then along with the candidate. This is simply a "creative criticism" and a way to find out more about the programming skills of the candidate.

The second criteria is hard to gauge. I usually try to give some questions and if I see them struggle, then I see how they handle it. I also ask questions like "what is the latest thing (programming language, framework, tool) that you learned?", "what do you do when you get stuck on a problem?", etc.

The third criteria is extremely important to the team and the hardest to catch. Everyone has worked for or with an asshole. Just so we are clear, I really liked the definition from Robert Sutton's "The No Asshole Rule":
THE DIRTY DOZEN
Common Everyday Actions That Assholes Use
1. Personal insults
2. Invading one’s 'personal territory'
3. Uninvited physical contact
4. Threats and intimidation, both verbal and nonverbal
5. 'Sarcastic jokes' and 'teasing' used as insult delivery
systems
6. Withering e-mail flames
7. Status slaps intended to humiliate their victims
8. Public shaming or 'status degradation' rituals
9. Rude interruptions
10. Two-faced attacks
11. Dirty looks
12. Treating people as if they are invisible
I usually call the candidate's references and asks some "what if" scenarios. Despite all of this, it is hard to identify an asshole.

It is very hard to build a cohesive team, but when you have one...it's amazing. The team works like well-greased machine. Introducing an asshole is almost like throwing a monkey wrench to the machine. As Jim Collin's mentioned in his famous book "Good To Great:"
If you have the wrong people, it doesn’t matter whether you discover the right
direction—you still won’t have a great company. Great vision without great people is
irrelevant
Managers, if you have an asshole, you need to handle them right away! You need to let them know that this type of behavior is simply unacceptable and if it continue, they will be terminated. It is simply not fair for the team or to the company. Simply put, assholes will bring down production and cohesiveness to the team. Worse, you can potentially loose some great resources.

Questions:
How do you find people that are quick learners?
How do you handle or filter assholes?

Tuesday, February 14, 2012

IT and productivity - what does that mean?

Today I was reading Wall Street & Technology and I stumbled into an interesting article by Howard Rubin, "How to Assess IT Productivity". It looked very interesting, but when I finished reading it, I had more questions than answers. The author makes a good point
Defining an accurate (an universally agreed-to) measure of information technology productivity is perhaps the 'holy grail' of IT measurement.
That's absolutely correct. If you're a manager, or have been a manager, then you know that is true. He composed a list of "6 Core Assessment Areas for Measuring IT Productivity":
  1. Computer Technology Measures of Processor Economic Efficiency
  2. Supply-Side Measures of Reduce Economic Efficiency and Productivity
  3. Demand-Side Measures of Economic Efficiency And Productivity
  4. IT Portfolio Measures of "Run the Business" vs. "Change the Business"
  5. IT Budget Agility Measures of Fixed vs. Variable Costs
  6. Operational Leverage Measures
This is where I started doubting the article. The truth is that there is really not a right answer for "IT productivity." It is one of those subjective questions where you get many answers and probably all are correct. The answers will vary depending on the companies and/or its department. In other words, its all about context and domain. But, what if we go back to basics. How about the following measurements:
  • Release cycles: how often we make deployments of new features or fixes
  • How many bugs were introduced on the release/deployment
  • Turn-around time on bug fixes: how fast do we find them?
  • Performance and scalability of application: how fast is our application? How much can it handle?
  • Hight Availability measurements. For example, what is the turn-around time for an event failure (going from primary to secondary)
These might not be "productivity" per se, but they are value added to our customer, and they should be monitored. You might even considered them as quality rather than productivity. But one might argue that they go hand-by-hand. Quality has a tremendous impact on productivity. In my industry (financial trading) we always go back to the 2010 Flash Crash which plunged the Dow Jones Industrial Average to about 1000 points due to an error on a high-frequency trading.

I really like what I read in Net Objectives:
Creating software is about delivering business value. Without some measure of business value, it's hard to determine whether the software has any.
Here are some examples of business Values:
  • Increased revenue (sales, royalties, fees)
  • Decreased expenses
  • Using less resources
  • More efficient use of resources
  • Customer satisfaction
  • Product promoters / satisfiers/ detractors
  • Staying in business
  • Avoiding risk
  • Innovate!
I like what Jonathan Rasmusson said in The Agile Samurai when wondering whether you are doing things the "agile way", instead ask yourself two questions:
  • Are you delivering something of value every week?
  • Are you striving to continuously improve?
At the end of the day, IT and productivity means what you and your company thinks it means.

Thursday, February 9, 2012

Using logwarn to manage logs along with Nagios

I needed to check on a reject file and notify Nagios right away. The reject file rotate everyday with the following format: "C"yyyymmdd.rej (C20120209.rej). The file looks like this:


The objectives were the following:
- Check file every 1 minute from 9:30 am - 4:00 pm
- The first line contains "ACMEFixLogs" and it is not a reject payload
- Notify the number of rejects and the name of the file
- If after the notification, there are no rejects, do not notify

The simplest way is to leverage some type of log checkers that reads the content and saves the last line in a temporary file. This way, the next check will start from the saved line and the content will not be repeated. I found logwarn and I really like it. Mostly because it is has a Nagios plugin and this is my notification engine. Also, logwarn has a regex filter that I could use for my file.

Here is how I solve the problem:
- Create a shell script as a Nagios plugin (exit 0 if everything is "OK", exit 2 if there are any errors.
- Leverage the capabilities of logwarn to handle the reading of the file
- Use the regex so that I disregard the first line "ACMEFixLogs" using the logwarn "!"
- Use Linux's word count to find out how many rejected lines there are
- If there are any reject payloads notify (exit 2)
- If there are none, notify that everything is fine (exit 0)

Here is the script:

Monday, February 6, 2012

Executing a command line via Groovy to Nagios

The other day I got stuck trying to execute a Linux command via Groovy. The command needed to pass several parameters to Nagios (my monitoring system). When I was trying to pass all the parameter via String, this did not work. The way that I was able to resolve it is using a string array. Here is an example:

Friday, February 3, 2012

Find information of Stocks using Yahoo's YQL

I needed to find information about specific ticker symbols and get their stock market(NASDAQ, NYSE, etc). I found my answer with Yahoo's Query Language (YQL) - very nice API! Here is a quick Groovy application that basically goes and fetches information from the site and retrieves the ticker symbols and their correspondent exchange/stock market.


Here is the unit test cases:

Thursday, February 2, 2012

Mount drives into Linux via CIFS

I thought about this blog from Nestor Urquiza. If you are like me, then your Linux servers are like an island in a sea of Windows servers. Here is how you can mount drives without the use of samba.

First, create a directory: sudo mkdir /mnt/ultralord-logs/

Now, lets try to make the mount:


Here, I am mounting the //ultralord-main/logs with the username "gandalf" and the domain "ultralord". If the username is only a local user (not in the domain), then the domain will be the name of the computer "domain=ultralord-main".

Now, check if the mount was valid by doing: ls -l /mnt/ultralord-logs

I always recommend to setup a credential file to hide the username and password.


If you are able to see logs in the directory, then you are ready to edit the /etc/fstab so that the mount happens automatically.




If when trying to mount a drive you get the following:


The solution is to install smbfs