So far we saw how to perform several tasks, but only one at a time. There are commands to copy files, list them, connect to another machine, control processes and so on. But to automate procedures, we often need a combination of these simple tasks.
In bioinformatics, there are many tasks that have to be repeated constantly. One example is updating local versions of sequence databases. These tasks in general involve running more than one program. Following on our example, one may need to, after downloading the updated version of sequences databases, to run BLAST on sequences that were generated locally, to check if there is any new match. In Linux there is a very easy way to created little ``programs'', the shellscripts that perform many shell commands. It is similar to the old ``.bat'' files of Windows. To create a shellscript you only need to create a file where, in each line, you type the Linux command to be executed. After you finished editing the file, save it, and add ``excecution'' permission to it. Now you file can be run as any normal Linux program. When you run this file, all Linux commands will be performed in the order you specified. In this way, one does not need to wait for a process to terminate in order to issue the next commands.
This kind of execution is called batch processing, since the commands are started in a batch and is very easy to specify. All you have to do is to write a ``script'' in a text file and make it executable. We will see how to do that shortly.
The script is a list of command lines to be executed, one after another, just as if you were typing them on the terminal. There are also some other special commands which allow you to control the execution of these commands, like testing for results or iterating. We sill some of these control commands soon, but for now we will learn how to execute a script.
Before we can actually execute a script, we need first to write one. Open your favorite text editor and write the following commands:
cp -r Test1_ Aux cd Aux diff ../Fasta_sample Fasta_last >delta grep -i cgtta delta
An interesting feature is that you can add comments to your scripts: anything written after a pound sign (#) until the end of line is ignored (you may want to try it in the terminal, just for fun).
# make a backup copy cp -r Test1_ Aux #move into work subdirectory cd Aux #compare new data with the standard diff ../Fasta_sample Fasta_last >delta #and locate the species of interest grep -i '>Slime' delta
Save this script in the file myscript. To run it, just call bash and use the file as the argument.
bash myscript
This is already nice, but we can make it better. If we change the
permissions of the file myscript to allow execution, using
chmod, you do not need to write bash every time:
chmod uo+x myscript myscript
Now, myscript works like any regular program you have in the
system. The reason is that any text file which has execution permission
is interpreted by the standard shell -- bash, in our case.
We could say explicitly which interpreter to use in a special comment in
the first line of our script. This line must start with these
characters: #!, followed by the path of the interpreter.
Always use the full path, to avoid security breaches. This
feature, called sh-bang, is very useful when you want another
program to interpret your script, as we shall see in chapter ???.
Our complete script, with this special line, is the following
#!/bin/bash # make a backup copy cp -r Test1_ Aux #move into work subdirectory cd Aux #compare new data with the standard diff ../Fasta_sample Fasta_last >delta #and locate the species of interest grep -i '>Slime' delta
In the following we will see how to use some special features of bash to do more complex tasks within a script.