Google Search from Java Program Example

Filed Under: Java

Sometime back I was looking for a way to search Google using Java Program. I was surprised to see that Google had a web search API but it has been deprecated long back and now there is no standard way to achieve this.

Basically google search is an HTTP GET request where query parameter is part of the URL, and earlier we have seen that there are different options such as Java HttpUrlConnection or Apache HttpClient to perform this search. But the problem is more related to parsing the HTML response and get the useful information out of it. That’s why I chose to use jsoup that is an open source HTML parser and it’s capable to fetch HTML from given URL.

So below is a simple program to fetch google search results in a java program and then parse it to find out the search results.


package com.journaldev.jsoup;

import java.io.IOException;
import java.util.Scanner;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class GoogleSearchJava {

	public static final String GOOGLE_SEARCH_URL = "https://www.google.com/search";
	public static void main(String[] args) throws IOException {
		//Taking search term input from console
		Scanner scanner = new Scanner(System.in);
		System.out.println("Please enter the search term.");
		String searchTerm = scanner.nextLine();
		System.out.println("Please enter the number of results. Example: 5 10 20");
		int num = scanner.nextInt();
		scanner.close();
		
		String searchURL = GOOGLE_SEARCH_URL + "?q="+searchTerm+"&num="+num;
		//without proper User-Agent, we will get 403 error
		Document doc = Jsoup.connect(searchURL).userAgent("Mozilla/5.0").get();
		
		//below will print HTML data, save it to a file and open in browser to compare
		//System.out.println(doc.html());
		
		//If google search results HTML change the <h3 class="r" to <h3 class="r1"
		//we need to change below accordingly
		Elements results = doc.select("h3.r > a");

		for (Element result : results) {
			String linkHref = result.attr("href");
			String linkText = result.text();
			System.out.println("Text::" + linkText + ", URL::" + linkHref.substring(6, linkHref.indexOf("&")));
		}
	}

}

Below is a sample output from above program, I saved the HTML data into file and opened in a browser to confirm the output and it’s what we wanted. Compare the output with below image.

Google Search API Java, java google search, google search java program


Please enter the search term.
journaldev
Please enter the number of results. Example: 5 10 20
20
Text::JournalDev, URL::=https://www.journaldev.com/
Text::Java Interview Questions, URL::=https://www.journaldev.com/java-interview-questions
Text::Java design patterns, URL::=https://www.journaldev.com/tag/java-design-patterns
Text::Tutorials, URL::=https://www.journaldev.com/tutorials
Text::Java servlet, URL::=https://www.journaldev.com/tag/java-servlet
Text::Spring Framework Tutorial ..., URL::=https://www.journaldev.com/2888/spring-tutorial-spring-core-tutorial
Text::Java Design Patterns PDF ..., URL::=https://www.journaldev.com/6308/java-design-patterns-pdf-ebook-free-download-130-pages
Text::Pankaj Kumar (@JournalDev) | Twitter, URL::=https://twitter.com/journaldev
Text::JournalDev | Facebook, URL::=https://www.facebook.com/JournalDev
Text::JournalDev - Chrome Web Store - Google, URL::=https://chrome.google.com/webstore/detail/journaldev/ckdhakodkbphniaehlpackbmhbgfmekf
Text::Debian -- Details of package libsystemd-journal-dev in wheezy, URL::=https://packages.debian.org/wheezy/libsystemd-journal-dev
Text::Debian -- Details of package libsystemd-journal-dev in wheezy ..., URL::=https://packages.debian.org/wheezy-backports/libsystemd-journal-dev
Text::Debian -- Details of package libsystemd-journal-dev in sid, URL::=https://packages.debian.org/sid/libsystemd-journal-dev
Text::Debian -- Details of package libsystemd-journal-dev in jessie, URL::=https://packages.debian.org/jessie/libsystemd-journal-dev
Text::Ubuntu – Details of package libsystemd-journal-dev in trusty, URL::=http://packages.ubuntu.com/trusty/libsystemd-journal-dev
Text::libsystemd-journal-dev : Utopic (14.10) : Ubuntu - Launchpad, URL::=https://launchpad.net/ubuntu/utopic/%2Bpackage/libsystemd-journal-dev
Text::Debian -- Details of package libghc-libsystemd-journal-dev in jessie, URL::=https://packages.debian.org/jessie/libghc-libsystemd-journal-dev
Text::Advertise on JournalDev | BuySellAds, URL::=https://buysellads.com/buy/detail/231824
Text::JournalDev | LinkedIn, URL::=https://www.linkedin.com/groups/JournalDev-6748558
Text::How to install libsystemd-journal-dev package in Ubuntu Trusty, URL::=http://www.howtoinstall.co/en/ubuntu/trusty/main/libsystemd-journal-dev/
Text::[global] auth supported = cephx ms bind ipv6 = true [mon] mon data ..., URL::=http://zooi.widodh.nl/ceph/ceph.conf
Text::UbuntuUpdates - Package "libsystemd-journal-dev" (trusty 14.04), URL::=http://www.ubuntuupdates.org/libsystemd-journal-dev
Text::[Journal]Dev'err - Cursus Honorum - Enjin, URL::=http://cursushonorum.enjin.com/holonet/m/23958869/viewthread/13220130-journaldeverr/post/last

That’s all for google search in a java program, use it cautiously because if there is unusual traffic from your computer, chances are Google will block you.

Comments

  1. Sachin says:

    getting error org.jsoup package does not exist.

  2. Zohidbek says:

    How can I treat a leap year program in java? Please, share your ideas!

  3. Aidan says:

    How can i get this program to only return image URL’s? when i change the GOOGLE_SEARCH_URL to https://www.google.com/images , nothing prints out at the end of the program. My knowledge of java is very basic so if anyone can help me out thank you very much!

  4. Manish Mandal says:

    We are not gettng the full result html as we normally see in the browser. All the sponsored links are not appearing in the result.

  5. anuja tatpuje says:

    I am getting this exception
    Exception in thread “main” java.io.IOException: 400 error loading URL https://www.google.com/search?q=relaxed thoughts&num=2

  6. Star Apple says:

    Hi,
    Thanks for the tutorial. I have a followup question.
    What if I want to search multiple keywords in one go in the same way?
    Please help !

  7. André Vilela says:

    It’s easier if you open directly to predefined browser

    if (Desktop.isDesktopSupported())
    try {
    Desktop.getDesktop().browse(new URI(searchURL));
    } catch (IOException e1) {
    e1.printStackTrace();
    } catch (URISyntaxException e1) {
    e1.printStackTrace();
    }

    and you don’t need the number of results, just the searchTerm

    String searchURL = GOOGLE_SEARCH_URL + “?q=”+searchTerm

  8. Alex says:

    Thanks amigo!

  9. sasmi samantaray says:

    sir,plz tell how can i coding to insert a video in project using jsp and java program…

  10. De Saha says:

    I am unable to compile since the computer says “package org.jsoup does not exist”
    what do I do?

  11. BASANT KUMAR says:

    Hi,

    How to get content for particular search in other words i need to implement URL , Title And body of content also,
    in above example i can get only title and URL for particular search.

    please reply as soon as possible.

    Thanks

    1. Rajinder says:

      To get body of content you can use following code:

      Elements results = doc.select(“span.st”);

      for (Element result : results) {

      String linkText = result.text();
      System.out.println(“Text::” + linkText );

      }

      1. Shama says:

        hello Rajinder Sir

        Using above snippet i am only able to retrieve one line of page. how can i retrieve the text of full

        page whose URL is being generated. Reply asap.

        Thankuuu

        1. Rajinder says:

          Hi Shama,

          By using following snippet you will be able to retrieve text of document in html.

          System.out.println(doc.body());
          or System.out.println(doc.tostring());

          Thanks

          1. Santosh says:

            Thanks a lot Rajinder Kaur Mam.. your comment helped me to extract html content 🙂

  12. Ps17 says:

    I am getting an Exception in thread “main” java.net.ConnectException: Connection refused: connect

    here Document doc = Jsoup.connect(searchURL).userAgent(“Mozilla/5.0”).get();

    1. Rajinder says:

      Check your installed Mozilla version or you can try the following statment :

      Document doc = Jsoup.connect(searchURL).userAgent(“Mozilla”).get();

  13. Vijay says:

    Thanks Pankaj

  14. ozan says:

    How could I get snippets under links as well?

  15. shama says:

    thankuu its really helpful for me.

  16. Shashank Makkar says:

    Hi Pankaj,

    Thank for sharing this. But I am in a dilemma whether to use their interface like “/search” or not as according to google it’s considered as illegal.
    I have also checked their robot.txt file: www.google.com/robot.txt
    interface: /search is not allowed.

    So if I use this interface for 10Millions times in my java program, it will definitely create network congestion for Google (particularly on this exposed interface) and the problem to me. Isn’t?

    But before that please assist me in screen scraping activity I am doing.
    I am trying to fetch data provided by Google up-front like for word-meaning using jSoup:

    https://www.google.in/?gws_rd=ssl$#q=pretend+meaning

    Thanks in Anticipation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages