Using a Hadoop to MongoDB pipeline

An alternative to using the MongoDB Connector for Hadoop is to use the programming language of our choice to export data from Hadoop, and then write into MongoDB using the low-level driver or an ODM, as described in previous chapters.

For example, in Ruby, there are a few options:

  • WebHDFS on GitHub, which uses the WebHDFS or the HttpFS Hadoop API to fetch data from HDFS
  • System calls, using the Hadoop command-line tool and Ruby's system() call

Whereas in Python, we can use the following:

  • HdfsCLI, which uses the WebHDFS or the HttpFS Hadoop API
  • libhdfs, which uses a JNI-based native C wrapped around the HDFS Java client

All of these options require an intermediate server between our Hadoop infrastructure and our MongoDB server, but, on the other hand, allow for more flexibility in the ETL process of exporting/importing data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset