Home » , , , , » Download PDF Web Crawling and Data Mining with Apache Nutch Perform web crawling and apply data mining in your application by Zakir Laliwala

Download PDF Web Crawling and Data Mining with Apache Nutch Perform web crawling and apply data mining in your application by Zakir Laliwala


Sinopsis

Apache Nutch is a very robust and scalable tool for web crawling; it can be integrated with the scripting language Python for web crawling. You can use it whenever your application contains huge data and you want to apply crawling on your data.

This chapter covers the introduction to Apache Nutch and its installation, and also guides you on crawling, parsing, and creating plugins with Apache Nutch. It will start from the basics of how to install Apache Nutch and then will gradually take you to the crawling of a website and creating your own plugin.

Content

  1. Introduction to Apache Nutch
  2. Installing and configuring Apache Nutch
  3. Crawling your website using the crawl script 
  4. Crawling the Web, the CrawlDb, and URL filters
  5. Parsing and parse filters
  6. The Apache Nutch plugin
  7. Understanding the Nutch Plugin architecture
  8. Deployment, Sharding, and AJAX Solr with Apache Nutch
  9. Deployment of Apache Solr
  10. Sharding using Apache Solr
  11. Working with AJAX Solr
  12. Integration of Apache Nutch with Apache Hadoop and Eclipse
  13. Integrating Apache Nutch with Apache Hadoop
  14. Configuring Apache Nutch with Eclipse
  15. Apache Nutch with Gora, Accumulo, and MySQL
  16. Introduction to Apache Accumulo
  17. Introduction to Apache Gora
  18. Use of Apache Gora
  19. Integration of Apache Nutch with Apache Accumulo
  20. Integration of Apache Nutch with MySQL



0 komentar:

Posting Komentar