Using the tr command in Linux to play with characters

Filed Under: UNIX/Linux
Tr Command in linux

tr command in Linux translates one set of characters to another. It can replace a character or a set of characters by another character or set of characters. tr reads the input from standard input and displays the output on standard output. Input can also be given in a file or by using echo command.

tr is short for translate.

The standard format for tr command is :

$ tr [option] [char_set 1] [char_set 2]

Based on the option(s) specified the tr command replaces the set of characters in “set 1” by “set 2”.

Replacing characters

To replace characters using tr command simply mention the characters to be replaced in 1st set and characters that are to be put in their place after replacing in 2nd set.

$ tr 'a' '1'

This command will wait for the input from STDIN. After getting the input, output on the screen will appear with all instances of ‘a’ replaced by ‘1’.

Simple Tr command

1. Using echo with tr command

The example above reads input from STDIN. Echo command can provide input along with tr command. Use Pipe(|) operator to run the commands together.

$  echo "apples and bananas" | tr 'a' '1'
Echo Tr

2. Taking input from a file

tr can also take its input from a file. This is useful when the translation is to be done over a voluminous collection of text. Redirection (<) operator is used to give input from a file.

$ tr 'a' '1' < input.txt
Tr And File

input.txt contains the same text as the example above.

To save the text to a file use redirection(>) operator to redirect the output to a file.

$ tr 'a' '1' < input.txt > output.txt

Changing the case of text with tr command

One of the most common uses of tr command is in translating text from lowercase to uppercase or vice-versa.

As tr works on sets of characters, we can explicitly mention the set of lowercase characters as set 1 and set of uppercase characters as set 2 to make the switch.

$ echo "apples and bananas" | tr a-z A-Z

Set a-z represents the set of lower case letters and the set A-Z represents the set of uppercase letters.

Another way of doing the same is :

$ echo "apples and bananas" | tr [:lower:] [:upper:]

Here, [:lower:] represents the set of lowercase alphabets and [:upper:] represents the set of uppercase alphabets.

lowercase to Uppercase

Deleting characters with tr

tr has the ability to delete a set of characters from the text. This is achieved by using tr along with -d command.

$  echo "apples and bananas" | tr -d 'n'

This command will eliminate all occurrences of ‘n’ in the text.

Tr D Command

To remove occurrences of multiple characters, mention all the characters in single quote.

$  echo "apples and bananas" | tr -d 'na'

This command will remove occurrences of ‘n’ and ‘a’

remove multiple characters using tr

Since tr works on the character level, all individual occurrences of ‘n’ and ‘a’ are removed. It’s easy to be mistaken and think that the command will only remove occurrences of ‘na’ occurring in that sequence. However, that’s not the case.

Squeeze multiple occurrences into one

Squeezing multiple occurrences into one can be useful to compress the text. It is often used to remove instances of multiple space between lines.

-s option is used with tr to squeeze.

$ echo "apples and bananas" | tr -s 'p'
Squeeze

Multiple occurrences of ‘p’ in apple have been reduced to a single occurrence.

$  echo "apples and bananas" | tr -s 'na' '1'
Squeeze Replace

The output of this command is equivalent to that of first replacing occurrences of characters ‘n’ and ‘a‘ with ‘1’, followed by a squeeze operation. To compare look at the second command in the output. The result of the second command is of simple character substitution.

Let’s squeeze all the 1’s in the second command’s output to see if we get the same output as the first.

comparing the output of squeeze command.

We get the same output as the first command in the output.

To remove consecutive whitespaces in text use :

$  echo "apples    and    bananas" | tr -s " " 
Removing Space

Alternatively [:space:] can be used in place of ” “

$ echo "apples    and    bananas" | tr -s [:space:]

Extracting digits out of text

To achieve operations where only a particular set of characters need to be preserved. It’s best to use -c option. -c is used for complementing the set.

Complement of a set means everything else other than what’s in that set.

$ echo " Home : 011 1234 4321" | tr -cd [:digit:],'\n' 
Extracting Digits 1

Mentioning ‘\n’ (newline) is important as otherwise the output doesn’t have a newline and gets mixed-up with the next line in the terminal. Another reason to not ignore newlines while deleting characters is that your file could have multiple digits in multiple lines. If the newline character is deleted then all the numbers will appear together without any space.

output without newline
without ‘\n’

Extracting words out of text

This process is the exact opposite of the one performed above. Here we will ignore the digits and focus only on words made up of letters.

$ echo " Home : 011 1234 4321" | tr -d [:digit:]
Extracting Words

In this example we have simply deleted all the digits from out text.

A more controlled way to do the same would be through complement.

$ echo " Home : 011 1234 4321" | tr -cd [:alpha:],'\n'
Removing Digits

[:alpha:] represents the set of alphabets. Think of it as a collection of the two sets, lower and upper.

[:alpha:] = [:lower:] + [:upper:]

Counting number of occurrences of words

Counting how many times a word appears in a text can be useful to build histograms. It is also very useful in building probabilistic models for email spam detection.

First, let’s create a file with some recurring words.

Cat File

Sometimes it can be useful to display each word of the text in a new line.

$ tr -cs "[:alpha:]" "\n" < input.txt
Displaying Each Word In A New Line
The list goes on. Output has been cut short to fit

To get the number of occurrence for each word use:

$ tr -cs "[:alpha:]" "\n" < input.txt | sort | uniq -c
Count Of Words

Sort is used to sort the list lexicographically. uniq -c counts the individual occurrences of each word and outputs the result as a list of words with a count.

Conclusion

tr command is useful for performing character-based translations. When combined with other commands like sort or uniq, tr command can turn out to be very powerful. Read more about tr command on its man page. When applying transformations over an entire line, sed command can be used.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages