programming

Managing Software Projects - Dealing With Change

March 31st, 2010  |  Published in Software Engineering, programming  | Add to del.icio.us

Managing Software Projects - A Software Engineers Perspective.
Contrary to popular belief, many aspects of a software project could be proactively controlled and managed by an ordinary software engineer to make his/her life easier. Instead of whining about frequent requirement changes, looming deadlines ..etc, there are many things you can do to help yourself retire with enough hair left. The “Managing Software Projects” series is an attempt at exploring these aspects.

Be In A Position Of Strength When Changes Occur (By Maintaining Code In A Ready-To-Ship State).
Rarely does anything in life work according to plan and software projects are no exception. Simply put, changes are bound to happen and how you deal with them will determine how successful the project is. Most of these changes will be involuntary and unexpected and the programmers and testers are usually at the receiving end. With deadlines whizzing past you the pressure mounts and the vicious cycle continues. This is where a solid foundation is useful as explained here. If you have a solid foundation, then it becomes less of a burden to maintain code in a ready-to-ship state at all times. Without that you may not have the ability to respond to changes in a timely manner. Lets look at a few scenarios.

  • Sudden requests for demos or alpha/beta builds by influential customers. If you have a continuous build system and you keep your builds reasonably green, then you could literally provide the last successful build. If not you would need to drop everything you are doing and try to figure out a way to get a stable build. If you have many test failures, you will probably need to fix them first as well.
  • A team member resigns, or falls ill. Unless the project culture had insisted in tested, reviewed and documented code right from the get go, the person who have to take over will need to spend a significant amount of time in getting it going.
  • And our favourite, the customer and/or marketing folks change their mind for the umpteenth time. Again if you have ready-to-ship code backed by an armada of tests (unit, integration, performance, interoprability ..etc) then you are in much better position to make the required changes with confidence. The tests will provide you instant feedback about the impact. The most important thing is to be able to give an accurate picture of the impact of such a change. If you have evidence to backup your claims then managers, marketing folks and the customer are more likely to listen to you

Have A Well Defined Process To Manage Change (No, I am not talking about the process that your organization already has)
Most organizations have a formal process for handling RFE’s and bug fixes etc. (There are also organizations that are completely at sea when it comes to this). Irrespective of whether there is such a process or not at the organizational level, the engineering team should have it’s own. Some points to consider from an engineering perspective (as alluded to in my previous post).

  • If your organization doesn’t have a proper process already in place or does not follow the process, then the first thing to do is to work on getting that done. It’s sad to see that some organizations doesn’t have such a process, while others did have a process but was frequently ignored for short term convenience. The consequences were disastrous, especially for the poor engineering team, and not to mention that the ability to respond effectively decreases dramatically over time. How such a process should look like and what’s involved is a separate discussion that I hope to tackle later on.
  • Branching policies for handling releases, hotfixes (these are notorious for making your life dreadful). In my humble experience, having this discussion right at the beginning (and on an ongoing basis) vs later in the project cycle (especially when everybody is under the gun) is whole lot more productive.
  • Handling RFE’s/critical bug fixes for existing releases. - same as above. Do it when your head is relatively free.
  • Figuring out how to evaluate change requests & who is responsible for what area to make that call While the team leader/manager is ultimately responsible for making a call, they are always not in a good position to make a good decision without input from others. Therefore it want hurt to have folks informally appointed (known within and outside the team) as subject experts in certain areas. Radical changes to the code base, taking on new functionality that directly/indirectly affects a particular area should not be decided without the input from these area experts. All though Apache Qpid is an open source project, most committers will discuss with the area expert (on the dev-list/JIRA) if the commit falls outside the committers core area of expertise. Not only it’s courteous and reduces friction/disagreements it also ensures things are done correct the first time which minimizes the impact it has on schedule etc..

Deal With RFE’s/Requirement Changes Methodically (instead of being emotional and kicking up a storm).
No matter how swamped your team is and/or how ridiculous the request/requirement is, emotional outbursts or rejecting them with “not enough time” ..etc is not going to help.
It’s in your best interest (and the interest of the customer) to listen, understand, analyse the impact and then compile your response backed with evidence.

  • If you make an effort to listen & understand the change request,
    • You may find the real underlying requirement is something else. Which maybe easier to implement after all.
    • Provide an acceptable alternative
    • Provide an acceptable workaround until you fix it properly in the next release.
  • If you provide a proper response backed by evidence
    • First of all people are likely to listen to you as you are presenting facts backed by evidence and not mere conjeture .
    • It helps people to understand the impact and the consequences (Ex half the functionality will not work and would need n extra days).
    • Even if your opinion is ignored and the change is forced upon, atleast for your own/teams benefit you know how much of an impact you have to deal with.

Your Credibility Is Important.
This could go a long way in ensuring your opinion is heard, hence being able to negotiate changes successfully!

  • If you have provided sound judgement in the past, the managers are likely to listen to your opinion.
  • If you have built credibility with the customers, then they are likely to listen to your opinion, accept your workaround or alternative solution.

Ability To Improvise
Sometimes it takes a bit of improvisation to deal with the changes and still beat the clock. Instead of complaining and whining about the changes, use them as an opportunity to show your worth.
And here are some stories..

  • Once we had to deal with a customer who asked for frequent GUI changes on an application that ran in an embedded device. Suffice to say it was a real pain. At the time one of my colleagues was toying with the idea of expressing the screens in XML. We promptly adopted it and taught the customer how to customize the screens to their hearts content. My colleague eventually went on to lead the team.
  • We needed an automated build system to run our test suites in different OS’s and aggregate results. As the project progressed the permutations grew. We looked around and didn’t find anything that fit our needs. Then a colleague of ours used our own product to create a distributed automated-build-system.
  • While working on an application that ran on an embedded device (same as item 1), the QA team had a tough time testing due to the restricted nature of the embedded device. The long test cycle ate into our dev cycle and impacted our turnaround time. I finally wrote a simulation program during my free time that simplified a lot of the testing. I scored some brownie points there, which came handy during the next performance review

In summary, a software engineering team can do a lot to mitigate the risks posed by changes that happen throughout the project. How well you negotiate those changes will have a huge impact on the final outcome.

Managing Software Projects - The Way We Set It Up Is How It Ends Up

March 29th, 2010  |  Published in Software Engineering, programming  | Add to del.icio.us

Managing Software Projects - A Software Engineers Perspective.
Contrary to popular belief, many aspects of a software project could be proactively controlled and managed by an ordinary software engineer to make his life easier. Instead of whining about frequent requirement changes, looming deadlines ..etc, there are many things you can do to help yourself retire with enough hair left. The “Managing Software Projects” series is an attempt at exploring these aspects.

It amazes me to see how little attention is paid to this simple truth when a software project is undertaken. There is a reason why it’s said “you reap what you sow”. More often than not the success of a software project (for that matter many things in life) depends on how well you have laid the foundation. But It seems far too many think that software projects are immune from simple realities in life. You don’t need to have elaborate plans or lengthy kick off meetings, but reasonable time and effort needs to be spent on getting the basics right. While none of us have the power to foresee every obstacle that might crop up during the project, there are many aspects that are within our control. And more often than not, these aspects are not given their due recognition, hence taken for granted !

The Way We Set It Up Is How It Ends Up.
What can be managed should be managed right from the start by following the proper process. We all know we need to test, use coding standards, provide documentation, do code reviews, follow process..etc, but how often do we think about these aspects right at the beginning of the project? These are things that we can control to a reasonable extent, but if not paid enough attention could spiral out of control and eat up enough cycles to cause major disruptions in your project later on.

  • How many projects use a continuous build system right from the start? In the first couple of days or so you may not even have enough tests to run, but ensuring that at any given moment the code is compilable and tests are passing will set the tone for the project right at the beginning.
  • We all think we could write documentation when we have enough functionality, but we soon find out that we have quite a bit to write with not enough time. Ditto for code review & coding standards.
  • Same goes for writing tests too! By the time we decide to write the tests in earnest, we have done quite a bit of coding. All that time saved by not writing tests and a lot more is now wasted doing damage control.
  • Using a bug tracking system right from the get go is better than using sticky notes:)
  • How often do we slap in a quick ant file/make file ..etc to get it going soon, thinking we can improve it later as the projects gets more complex. Anybody who had gone through the pain of tinkering with the build system when the project is in full swing knows what I am talking about. Spending some time upfront designing a more robust build system would save you loads of time and frustration down the line.
  • During the initial setup period, how often do teams discuss about the internal release process, branching policies, how to handle hotfixes, bug fixes for existing releases etc. If these points are discussed in earnest right at the start, then perhaps you could put in place structure that would accommodate them more easily vs having to bend over backwards at a latter stage of a project when we really need to do it.

Simply put, following proper process right at the start will determine how successfully your project ends.

This Applies Even When Your Are Doing A Proof-Of-Concept (And here’s why)
I have seen many proof-of-concepts being done without using any version control, without adhering to coding standards, using a bug tracking system or writing any sort of tests. The justification being it’s only a proof-of-concept, and we could do it properly when we get to the real thing. But how many times have we seen the proof-of-concept being used almost as it is as the initial code when the real project starts?

Now all of a sudden you have a lump of code, that is not reviewed, documented, tested or version controlled properly. Since no bug tracking was used, all the time spent in testing the POC is now wasted. While nobody cares about the POC being stable and forgive you readily when the system crashes several times during the demo, the same allowance is not given when you deliver the real system. But scarily enough the same unstable code is now being used as the base for the real system. Now is the time you lament about not following proper procedures during the POC development.

Ideally a POC should be what it is - a POC and nothing more. The real system should be started from scratch. But the problem is, we don’t live in an ideal world!
So if your organization has a history of turning POCs into real systems in double time, then follow process right from the start to avoid a stroke or a heart attack.

The Only Thing That Does Not Change, Is Change Itself. So Plan For It At The Beginning!
Changes are going to happen! Either the customer or the marketing dept or both will change their mind. These changes will happen for a variety of reasons and more often than not they will need to be accommodated. We cannot have an architecture that will withstand all the numerous RFE’s. We cannot plan for everything. But we could still have a plan for dealing with these changes! Having one right at the beginning will save you a load of trouble. You may need this kind of planing at different levels.

  • Most organizations have a process for dealing with changes and RFE’s. But you may need to have a project specific criteria as well. What might work for one project may not work for another, so the corporate guidelines may be inadequate in covering all scenarios. Working that out right at the beginning and getting approval from all concerned parties may reduce a load of pain at the end.
  • While you cannot architect for every conceivable change, we could still try to identify areas in the system that can potentially change. Not only this is useful in scheduling, it is also important in the architecture stage as you could try to isolate change as much as you can. Even though you have no control or visibility into how a particular feature might eventually pan out, you still maybe able to isolate it enough to minimize the impact the changes could cause in your system.Identifying areas that could change at the beginning and setting up your architecture in such a way will ensure how well you deal with inevitable changes.
  • Similarly testing strategies should also be planned with changes in mind. Having this discussion right at the beginning rather than later will no doubt contribute positively towards the success of the project

Related to this topic is discussions about resourcing, requirements planning etc. These aspects are critical in properly setting up a project. However the above piece is written more from a software engineers perspective. In some ways the software engineer has very little influence (ex resourcing, requirements planning..) in determining the way a project is setup and in other ways quite a bit is within their sphere of control (ex. the aspects mentioned above). The idea is to maximize what you could manage/control and plan for what you cannot.

5 reasons why Distributed Systems are hard to program

July 23rd, 2008  |  Published in Architecture, Distributed Systems, programming  | Add to del.icio.us

Here are 5 reasons why I found distributed system are hard to program. This is not some sort of thorough analysis, but merely my observations in dealing with such systems. For completeness, here is the definition of “Distributed System” I used.
A distributed system contains of more than one process that runs as a single system. These processes can be on the same computer or multiple computers that are on a local area network or geographically distributed over a wide area network.

Without any further do here are the reasons in no particular order.

1. Difficulty in identifying and dealing with failures.
When communicating between processes failures can happen at many levels. Dealing with them is not trivial. Of course you rely on frameworks based on technologies like RMI, CORBA, COM, SOAP, AMQP, REST(is an architectural style not a standard) etc to handle these. But the fact remains that you still need to clearly think about these cases and handle these situations properly.

For example if we consider a simple interaction between two processes on different computers, the following failures can happen.

  • Failures that occur within the process that initiates the communication (sending the message or invoking the RPC call).
  • Failures between the time the process hands over the request to the OS and the OS writing it to the network.
  • Network failures between the time it takes to transmit the packets from one computer to the other.
  • Failures between the time the OS on the receiving end receives the packets and then handing it over to the recipient process.
  • Failures that occur when the recipient process tried to process the request/message.

Sometimes the framework you use, is unable to/may not report all these error cases. Sometimes when the error is reported, it may not contain enough information to figure out at which level the error occurred.
Did it reach the remote computer? if so how far up the stack did it go?. If the receiving process got the request or message did the error occur before or after the request/message was processed?
In some cases where idempotency is built into the the receiving application or the framework/protocol (ex a message client that detects duplicate messages, or doing an HTTP GET) a simple retry maybe ok. In some cases Idempotency and retrying maybe expensive or difficult to implement. In such cases careful thought needs to be given on how these different errors are identified and handled.

2. Achieving consistency in data across processes.
One of the hardest problems in programming distributed systems is achieving a consistent view of data across the processes. When one processes updates some data, you need to replicate them across the other processes, so if any other process decides to operate on the same set of data, then it is doing so on the most current copy.
Lets look at two examples.

Assume a global banking application for ABC bank. A customer goes to a branch in New York, US and deposits money to an account. A few moments later his relative in London, UK does a withdraw on that account. Due to latency there is obviously a time lag before the process in London, UK sees the updated amount in the account.

In an online trading system, a user in NY places an item for sale. The transaction is updated on the closest data center which is in Boston. A few moments later another user in LA is searching for the exact same item and is served off a data center in Phoenix. The user in LA may or may not see the item due to the latency involved in replicating the data across

For example 1 strong consistency is required, while for example 2, you could get away with weak consistency, for example by setting an SLA that says data is valid within a 5 min time window.
This is not an easy problem to solve and this area itself is a subject on its own. Wener Vogels wrote a nice peice on this called Eventually Consistent which is worth reading.
Of course there are specialized frameworks/libraries that can handle this for you. But still there is no escape for you and you pretty much need to have an understanding of the pros and cons of various approaches, failure modes etc.

3. Heterogeneous nature of the components involved in the system.
A distributed system may contain components written in a variety of languages deployed across machines with different architectures and operating systems. Needless to say that this poses certain challenges (especially integration, interoperability issues) when implementing the system. A whole range of standards/technologies were presented to solve these issues, including but not limited to CORBA, SOAP, AMQP, REST (is an architectural style not a standard) and RPC based frameworks like ICE, Thrift, Etch etc. Anyone who has worked with these technologies knows that neither of these are trivial to use nor provide a complete solution in every situation.

If anybody has read the recent posts by Steve Vinoski and the discussions around it would realize the issues/challenges surrounding RPC. The following paper discuss the impedance mismatch problems when working with IDL based systems. The issues with type systems and data formats are not limited to RPC only. When using a message oriented approach like SOAP (doc lit style) or AMQP you will end up tunneling data thats not supported by the protocol as a string or a sequence of bytes. When using REST you would need to represent your resource in a format the requesting application understands/supports, which maybe quite different from the native format.

Again not an easy issue to deal with no matter what technology or framework is used. As an architect/developer you need to understand these issues and deal with them accordingly.

4. Testing a distributed system is quite difficult.
This is arguably one of the hardest aspects of developing a distributed system. Verification of the behavior and impact of your code in the system is not easy.
There are many aspects that needs to be tested, and doing so before every checkin is not a fun task at all. Running some of these tests before every checkin is not practical. But its a good idea to run them nightly and some tests during the weekend. Here are some of the areas that needs to be tested (I plan to write another blog entry elaborating on the testing aspects).

  • Functionality testing (can be covered with well written unit testing)
  • Integration testing - you need to test the distributed system as a whole with all the components involved
  • Interoperability testing - this is crucial when heterogeneous components (different languages, OS) are involved, and is quite different from integration testing
  • TCK compliance - If your system is based on standards/specifications, then you need to ensure that you haven’t broken anything w.r.t compliance
  • Performance testing - to ensure that your changes haven’t accidentally caused a degradation in performance
  • Stress testing - to ensure that your checkin hasn’t accidentally caused any stability issues - ex increased chance of deadlocks when the load increases
  • Soak testing - to ensure that your checkin hasn’t caused any longevity issues - ex a memory leak thats manifested after a couple hours, days

Most often than not developers cut corners in their testing as running these tests are tedious and time consuming. Also these tests need to be run regularly to catch issues in a timely manner and the best way to tackle this issue is to automate as much testing as possible. There many options with continuous build systems like cruisecontrol or using a plain old cron job.
Functionality testing, TCK compliance, certain types of integration and interoperability tests can be run periodically.
In most organizations test machines are just lying around doing nothing during the night (unless around the clock testing is done with development centers in different time zones.). Instead of wasting computing cycles, you could automate test suites to run during the night. More time consuming integration and interoperability tests, performance, stress and soak testing can be done nightly, while more longer duration soak testing can be scheduled to run during the weekends.

While testing is a tough issue for any type of system, distributed systems have a lot more failure points which adds to the complexity.
Getting these tests right to cover these failure points and executing them needs a lot of careful thought and planning.

5. The technologies involved in distributed systems are not easy to understand .
Distributed system are not easy to understand. Neither are the myriad of technologies used in developing these systems.
Most folks find it difficult to grasp the concepts behind these technologies. If you look into the discussions and misconceptions surrounding REST you can understand what I am trying to get at. CORBA was not an easy spec to understand, so is WS-* or AMQP. While it is true that you don’t need to understand everything to develop using them, you still need at least a reasonable understanding to figure how to tackle some of the above mentioned issues. Frameworks based on these technologies are touted as the cure for these problems. Sure they could help, but it still does not shift the burden away from you.
To compound the issue all sorts of vendors keep touting their technology/framework as the next silver bullet. No matter what vendor you use, at the end of the day you are still responsible for getting it right. And it is not an easy task. You need to face the reality that distributed systems are hard and that you cannot hide every complexity behind some framework.

Restructuring Code

June 9th, 2008  |  Published in Architecture, programming  | Add to del.icio.us

Most programmers need to deal with restructuring the code they work on due to a variety of reasons. While most of the time it is driven by demand, sometimes it is also done for personal reasons. Here are some of the reasons I have had or seen within the teams that I have worked over the years.

  1. The current code would have reached a stage where it is impossible to do any more modifications without breaking something else
  2. The requirements have changed so much that the current architecture/design cannot handle it without a redesign
  3. The current application doesn’t/will not scale, perform well enough as it wasn’t designed to handle the current load/anticipated growth
  4. The folks who worked on the code are no longer there and nobody knows what the code really does (or what it was supposed to do)
  5. You don’t really like the current way it is implemented and think that there is a better way to do it using framework X or library Y

While sometimes restructuring could be done easily, but 90% of the time it is not a trivial task. When the frustration gets to you, you may have even entertained the idea of rewriting an application/module/section from scratch. Is this really a good idea? Sometimes this maybe the only option, but most of the time this may end up being a bad idea due to a variety of reasons.

  • One of the biggest mistakes is to throw away the old code without any due consideration simply based on the assumption that the old code is bad and we are going to write much better code.

    Throwing away the old code (especially if it was in production) means, you are throwing away months (or years) of tested, battle hardned code that may have had fixes for bugs that you aren’t even aware of. If you don’t take this into account, the new code you write may end up showing the same bugs that are already fixed in the old code. This will waste a lot of time ,effort, knowledge gained over the years

  • This method or class is ugly, lets throw it away and write again.

    That odd looking method or that badly written class may have some fix for a race condition or an optimization that one of your customers is depending upon. Discarding that code as crap without really understanding whats going on may end up with dire consequences for your team.

  • Not paying enough attention to the existing unit tests/ test frameworks when building the new system.

    These tests were added for a reason, possibly in response to a bug or some sort of intermittent failure that was reported on a production system. These failures/errors may have been reported way before you joined the company. Sometimes none of the current team members are aware of all the issues. So discarding the test code means you are throwing away years of hard work and knowledge

  • The current code is way behind all the cool technology we have today. The new framework X can do things a lot more elegantly, so lets re write.

    This is by far the worst mistake. If something is not broken then why try to fix it?. The code is ugly, or technology used is not cool are not good reasons. Simply bcos the style or structure of the code is not according to your personal preference is not a reason at all for unwanted restructuring .

I have been guilty of doing most the of the above mentioned points and through hard lessons I have realized that,

  • The best approach for restructuring starts with taking stock of the existing code base and tests written against that code. This will help you understand the strengths and weaknesses in the existing code, so you could ensure that you preserve the strong points while avoiding the mistakes.
  • It is best to reuse as much code as possible, bcos no matter how ugly the code is, it has been tested/reviewed etc.
  • Incremental changes are better than one massive code change. Incremental changes allows you to gauge the impact on the system more easily through feedback from tests etc. It is not fun to see 80 test failures after you make a change and can lead to frustration/preasure that will in turn result in more bad decesions . A couple of test failures is easy to deal with and provides a more managable approach.
  • After each iteration it is important to ensure that the existing tests pass. Analyze why the tests are failing, and make modifications/add new tests if nessacery if the existing tests are not sufficient enough to cover the changes you made. Failure to do so can result in a lot of pain down the road.
  • Avoid the temptation to rewrite everything. Personal preferences and ego shouldn’t get in the way. If something works then don’t change it.
  • Remember that humans make mistakes and restructing will always not garuntee that it will be atleast as good as the previous attempt. I have seen and have been part of several failed restructuring attempts

Having said all of that, sometimes you have no option but to rewrite from scratch. But IMHO that should be your last resort.