tr command in Linux translates one set of characters to another. It can replace a character or a set of characters by another character or set of characters. tr reads the input from standard input and displays the output on standard output. Input can also be given in a file or by using echo command.
tr is short for translate.
The standard format for tr command is :
$ tr [option] [char_set 1] [char_set 2]
Based on the option(s) specified the tr command replaces the set of characters in “set 1” by “set 2”.
To replace characters using tr command simply mention the characters to be replaced in 1st set and characters that are to be put in their place after replacing in 2nd set.
$ tr 'a' '1'
This command will wait for the input from STDIN. After getting the input, output on the screen will appear with all instances of ‘a’ replaced by ‘1’.
1. Using echo with tr command
The example above reads input from STDIN. Echo command can provide input along with tr command. Use Pipe(|) operator to run the commands together.
$ echo "apples and bananas" | tr 'a' '1'
2. Taking input from a file
tr can also take its input from a file. This is useful when the translation is to be done over a voluminous collection of text. Redirection (<) operator is used to give input from a file.
$ tr 'a' '1' < input.txt
input.txt contains the same text as the example above.
To save the text to a file use redirection(>) operator to redirect the output to a file.
$ tr 'a' '1' < input.txt > output.txt
Changing the case of text with tr command
One of the most common uses of tr command is in translating text from lowercase to uppercase or vice-versa.
As tr works on sets of characters, we can explicitly mention the set of lowercase characters as set 1 and set of uppercase characters as set 2 to make the switch.
$ echo "apples and bananas" | tr a-z A-Z
Set a-z represents the set of lower case letters and the set A-Z represents the set of uppercase letters.
Another way of doing the same is :
$ echo "apples and bananas" | tr [:lower:] [:upper:]
Here, [:lower:] represents the set of lowercase alphabets and [:upper:] represents the set of uppercase alphabets.
Deleting characters with tr
tr has the ability to delete a set of characters from the text. This is achieved by using tr along with -d command.
$ echo "apples and bananas" | tr -d 'n'
This command will eliminate all occurrences of ‘n’ in the text.
To remove occurrences of multiple characters, mention all the characters in single quote.
$ echo "apples and bananas" | tr -d 'na'
This command will remove occurrences of ‘n’ and ‘a’
Since tr works on the character level, all individual occurrences of ‘n’ and ‘a’ are removed. It’s easy to be mistaken and think that the command will only remove occurrences of ‘na’ occurring in that sequence. However, that’s not the case.
Squeeze multiple occurrences into one
Squeezing multiple occurrences into one can be useful to compress the text. It is often used to remove instances of multiple space between lines.
-s option is used with tr to squeeze.
$ echo "apples and bananas" | tr -s 'p'
Multiple occurrences of ‘p’ in apple have been reduced to a single occurrence.
$ echo "apples and bananas" | tr -s 'na' '1'
The output of this command is equivalent to that of first replacing occurrences of characters ‘n’ and ‘a‘ with ‘1’, followed by a squeeze operation. To compare look at the second command in the output. The result of the second command is of simple character substitution.
Let’s squeeze all the 1’s in the second command’s output to see if we get the same output as the first.
We get the same output as the first command in the output.
To remove consecutive whitespaces in text use :
$ echo "apples and bananas" | tr -s " "
Alternatively [:space:] can be used in place of ” “
$ echo "apples and bananas" | tr -s [:space:]
Extracting digits out of text
To achieve operations where only a particular set of characters need to be preserved. It’s best to use -c option. -c is used for complementing the set.
Complement of a set means everything else other than what’s in that set.
$ echo " Home : 011 1234 4321" | tr -cd [:digit:],'\n'
Mentioning ‘\n’ (newline) is important as otherwise the output doesn’t have a newline and gets mixed-up with the next line in the terminal. Another reason to not ignore newlines while deleting characters is that your file could have multiple digits in multiple lines. If the newline character is deleted then all the numbers will appear together without any space.
Extracting words out of text
This process is the exact opposite of the one performed above. Here we will ignore the digits and focus only on words made up of letters.
$ echo " Home : 011 1234 4321" | tr -d [:digit:]
In this example we have simply deleted all the digits from out text.
A more controlled way to do the same would be through complement.
$ echo " Home : 011 1234 4321" | tr -cd [:alpha:],'\n'
[:alpha:] represents the set of alphabets. Think of it as a collection of the two sets, lower and upper.
[:alpha:] = [:lower:] + [:upper:]
Counting number of occurrences of words
Counting how many times a word appears in a text can be useful to build histograms. It is also very useful in building probabilistic models for email spam detection.
First, let’s create a file with some recurring words.
Sometimes it can be useful to display each word of the text in a new line.
$ tr -cs "[:alpha:]" "\n" < input.txt
To get the number of occurrence for each word use:
$ tr -cs "[:alpha:]" "\n" < input.txt | sort | uniq -c
Sort is used to sort the list lexicographically. uniq -c counts the individual occurrences of each word and outputs the result as a list of words with a count.
tr command is useful for performing character-based translations. When combined with other commands like sort or uniq, tr command can turn out to be very powerful. Read more about tr command on its man page. When applying transformations over an entire line, sed command can be used.