Appendix

Useful Reading

No matter how much the authors have tried, it is virtually impossible to cover the Hadoop ecosystem in a single book. This appendix provides additional reading recommendations that you might find useful. They are organized by the main topics covered in the book.

STORING AND ACCESSING HADOOP DATA

“Apache HBase Book.” http://hbase.apache.org/book.html.

“Bloom Filter.” http://en.wikipedia.org/wiki/Bloom_filter.

“BloomMapFile — Fail-Fast Version of MapFile for Sparsely Populated Key Space.” https://issues.apache.org/jira/browse/HADOOP-3063.

Borthakur, Dhruba. “Hadoop AvatarNode High Availability.” http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html.

Chang, Fay; Dean, Jeffrey; Ghemawat, Sanjay; Hsieh, Wilson C.; Wallach, Deborah A.; Burrows, Mike; Chandra, Tushar; Fikes, Andrew; and Gruber, Robert E. “BigTable: A Distributed Storage System for Structured Data.” http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf.

Chen, Yanpei; Ganapathi, Archana Sulochana; and Katz, Randy H. “To Compress or not to Compress — Compute vs. I/O Tradeoffs for MapReduce Energy Efficiency.” http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-36.pdf.

Dikant, Peter. “Storing Log Messages in Hadoop.” http://blog.mgm-tp.com/2010/04/hadoop-log-management-part2/.

Dimiduk, Nick, and Khurana, Amandeep. HBase in Action (Shelter Island, NY: Manning Publications, 2012). http://www.amazon.com/HBase-Action-Nick-Dimiduk/dp/1617290521/.

George, Lars. HBase: The Definitive Guide (Sebastopol, CA:O’Reilly Media, 2011). http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100.

Ghemawat, Sanjay; Gobioff, Howard; and Leung, Shun-Tak. “The Google File System.” http://www.cs.brown.edu/courses/cs295-11/2006/gfs.pdf.

“HDFS Architecture Guide.” http://hadoop.apache.org/docs/stable/hdfs_design.html.

“HDFS High Availability with NFS.” http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html.

“HDFS High Availability Using the Quorum Journal Manager.” http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html.

Radia, Sanjay. “HA Namenode for HDFS with Hadoop 1.0.” http://hortonworks.com/blog/ha-namenode-for-hdfs-with-hadoop-1-0-part-1/.

“Simple Example to Read and Write Files from Hadoop DFS.” http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample.

Srinivas, Suresh. “An Introduction to HDFS Federation.” http://hortonworks.com/blog/an-introduction-to-hdfs-federation/.

“The Hadoop Distributed File System.” http://developer.yahoo.com/hadoop/tutorial/module2.html.

White, Tom. Hadoop: The Definitive Guide (Sebastopol, CA:O’Reilly Media, 2012). http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/.

White, Tom. “HDFS Reliability.” http://www.cloudera.com/wp-content/uploads/2010/03/HDFS_Reliability.pdf.

Zuanich, Jon. “Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files.” http://www.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/.

MAPREDUCE

Adjiman, Philippe. “Hadoop Tutorial Series, Issue #4: To Use or not to Use a Combiner.” http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/.

“Apache Hadoop NextGen MapReduce (YARN).” http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.

Blomo, Jim. “Exploring Hadoop OutputFormat.” http://www.infoq.com/articles/HadoopOutputFormat.

Brumitt, Barry. “MapReduce Design Patterns.” http://www.cs.washington.edu/education/courses/cse490h/11wi/CSE490H_files/mapr-design.pdf.

“C++ World Count.” http://wiki.apache.org/hadoop/C%2B%2BWordCount.

Cohen, Jonathan. “Graph Twiddling in a MapReduce World.” http://www.adjoint-functors.net/su/web/354/references/graph-processing-w-mapreduce.pdf.

“Configuring Eclipse for Hadoop Development (a Screencast).” http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/.

Dean, Jeffrey, and Ghemawat, Sanjay. “MapReduce: Simplified Data Processing on Large Clusters.” http://www.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf.

Ghosh, Pranab. “Map Reduce Secondary Sort Does it All.” http://pkghosh.wordpress.com/2011/04/13/map-reduce-secondary-sort-does-it-all/.

Grigorik, Ilya. “Easy Map-Reduce with Hadoop Streaming.” http://www.igvita.com/2009/06/01/easy-map-reduce-with-hadoop-streaming/.

“Hadoop MapReduce Next Generation — Writing YARN Applications.” http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html.

“Hadoop Tutorial.” http://archive.cloudera.com/cdh/3/hadoop/mapred_tutorial.html#Partitioner.

“How to Include Third-Party Libraries in Your Map-Reduce Job.” http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/.

Katsov, Ilya. “MapReduce Patterns, Algorithms, and Use Cases.” http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/.

Lin, Jimmy, and Dyer, Chris. Data-Intensive Text Processing with MapReduce (San Francisco: Morgan & Claypool, 2010). http://www.amazon.com/Data-Intensive-Processing-MapReduce-Synthesis-Technologies/dp/1608453421.

Mamtani, Vinod. “Design Patterns in Map-Reduce.” http://nimbledais.com/?p=66.

MapReduce website. http://www.mapreduce.org/.

Mathew, Ashwin J. “Design Patterns in the Wild.” http://courses.ischool.berkeley.edu/i290-1/s08/presentations/Day6.pdf.

Murthy, Arun C. “Apache Hadoop: Best Practices and Anti-Patterns.” http://developer.yahoo.com/blogs/hadoop/posts/2010/08/apache_hadoop_best_practices_a/.

Murthy, Arun C.; Douglas, Chris; Konar, Mahadev; O’Malley, Owen; Radia, Sanjay; Agarwal, Sharad; Vinod; K V. “Architecture of Next Generation Apache Hadoop MapReduce Framework.” https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf.

Noll, Michael G. “Writing an Hadoop MapReduce Program in Python.” http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/.

Owen, Sean; Anil, Robin; Dunning, Ted; and Friedman, Ellen. Mahout in Action (Shelter Island, NY: Manning Publications, 2011). http://www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684/ref=sr_1_1?s=books&ie=UTF8&qid=1327246973&sr=1-1.

Rehman, Shuja. “XML Processing in Hadoop.” http://xmlandhadoop.blogspot.com/.

Riccomini, Chris. “Tutorial: Sort Reducer Input Values in Hadoop.” http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Shewchuk, Richard. “An Introduction to the Conjugate Gradient Method Without the Agonizing Pain.” http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf.

“Splunk App for HadoopOps.” http://www.splunk.com/web_assets/pdfs/secure/Splunk_for_HadoopOps.pdf.

Thiebaut, Dominique. “Hadoop Tutorial 2.2 — Running C++ Programs on Hadoop.” http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop.

“When to Use a Combiner.” http://lucene.472066.n3.nabble.com/When-to-use-a-combiner-td3685452.html.

Winkels, Maarten. “Thinking MapReduce with Hadoop.” http://blog.xebia.com/2009/07/02/thinking-mapreduce-with-hadoop/.

“Working with Hadoop under Eclipse.” http://wiki.apache.org/hadoop/EclipseEnvironment.

“Hadoop Streaming with Ruby and Wukong.” http://labs.paradigmatecnologico.com/2011/04/29/howto-hadoop-streaming-with-ruby-and-wukong/.

“Yahoo! Hadoop Tutorial.” http://developer.yahoo.com/hadoop/tutorial/.

Zaharia, Matei; Borthakur, Dhruba; Sarma, Joydeep Sen; Elmeleegy, Khaled; Shenker, Scott; and Stoica, Ion. “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling.” http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf.

OOZIE

“Oozie Bundle Specification.” http://oozie.apache.org/docs/3.1.3-incubating/BundleFunctionalSpec.html.

“Oozie Client javadocs.” http://archive.cloudera.com/cdh/3/oozie/client/apidocs/index.html.

“Oozie Command Line Utility.” http://rvs.github.io/oozie/releases/1.6.0/DG_CommandLineTool.html.

“Oozie Coordinator Specification.” http://archive.cloudera.com/cdh/3/oozie/CoordinatorFunctionalSpec.html.

“Oozie Custom Action Nodes.” http://oozie.apache.org/docs/3.3.0/DG_CustomActionExecutor.html.

“Oozie Source Code.” https://github.com/apache/oozie.

“Oozie Specification, a Hadoop Workflow System.” http://oozie.apache.org/.

“Oozie Web Services APIs.” http://archive.cloudera.com/cdh4/cdh/4/oozie/WebServicesAPI.html.

“xjc Binding Compiler.” http://docs.oracle.com/javase/6/docs/technotes/tools/share/xjc.html.

REAL-TIME HADOOP

“Actors Model.” http://c2.com/cgi/wiki?ActorsModel.

“Add Search to HBASE.” https://issues.apache.org/jira/browse/HBASE-3529.

“Apache Solr.” http://lucene.apache.org/solr/.

Bienvenido, David, III. “Twitter Storm: Open Source Real-Time Hadoop.” http://www.infoq.com/news/2011/09/twitter-storm-real-time-hadoop.

Borthakur, Dhruba; Muthukkaruppan, Kannan; Ranganathan, Karthik; Rash, Samuel; Sarma; Joydeep Sen, Spiegelberg, Nicolas; Molkov, Dmytro; Schmidt, Rodrigo; Gray, Jonathan; Kuang, Hairong; Menon, Aravind; and Aiyer, Amitanand. “Apache Hadoop Goes Realtime at Facebook.” http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf.

“Cassandra.” http://cassandra.apache.org/.

Haller, Mike. “Spatial Search with Lucene.” http://www.mhaller.de/archives/156-Spatial-search-with-Lucene.html.

“HBase Avro Server.” http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/avro/AvroServer.HBaseImpl.html.

“HBasene.” https://github.com/akkumar/hbasene.

“HBasePS.” https://github.com/sentric/HBasePS.

“HStreaming.” http://www.hstreaming.com/.

Ingersoll, Grant. “Location-Aware Search with Apache Lucene and Solr.” http://www.ibm.com/developerworks/opensource/library/j-spatial/.

Kumar, Animesh. “Apache Lucene and Cassandra.” http://anismiles.wordpress.com/2010/05/19/apache-lucene-and-cassandra/.

Kumar, Animesh. “Lucandra — An Inside Story!” http://anismiles.wordpress.com/2010/05/27/lucandra-an-inside-story/.

Lawson, Loraine. “Exploring Hadoop’s Real-Time Potential.” http://www.itbusinessedge.com/cm/blogs/lawson/exploring-hadoops-real-time-potential/?cs=49692.

“Local Lucene Geographical Search.” http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html.

“Lucandra.” https://github.com/tjake/Lucandra.

Marz, Nathan. “A Storm Is Coming: More Details and Plans for Release.” http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html.

Marz, Nathan. “Preview of Storm: The Hadoop of Realtime Processing.” https://www.memonic.com/user/pneff/folder/queue/id/1qSgf.

McCandless, Michael; Hatcher, Erik; and Gospodnetic, Otis. Lucene in Action, Second Edition (Shelter Island, NY: Manning Publications, 2010). http://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp/1933988177/ref=sr_1_1?ie=UTF8&qid=1292717735&sr=8-1.

“OpenTSDB.” http://opentsdb.net/.

“Powered by Lucene.” http://wiki.apache.org/lucene-java/PoweredBy.

“Stargate.” http://wiki.apache.org/hadoop/Hbase/Stargate.

“Thrift APIs.” http://wiki.apache.org/hadoop/Hbase/ThriftApi.

AWS

“Amazon CloudWatch.” http://aws.amazon.com/cloudwatch/.

“Amazon Elastic MapReduce.” http://aws.amazon.com/elasticmapreduce/.

“Amazon Simple Storage Service.” http://aws.amazon.com/s3/.

“Amazon Simple Workflow Service.” http://aws.amazon.com/swf/.

“Apache Whirr.” http://whirr.apache.org/.

“AWS Data Pipeline.” http://aws.amazon.com/datapipeline/.

“How-to: Set Up an Apache Hadoop/Apache HBase Cluster on EC2.” http://blog.cloudera.com/blog/2012/10/set-up-a-hadoophbase-cluster-on-ec2-in-about-an-hour/.

Linton, Rob. Amazon Web Services: Migrating Your .NET Enterprise Application (Olton, Birmingham, United Kingdom: Packt Publishing, 2011). http://www.amazon.com/Amazon-Web-Services-Enterprise-Application/dp/1849681945.

“What Are the Advantages of Amazon EMR, Vs. Your Own EC2 Instances, Vs. Running Hadoop Locally?” http://www.quora.com/What-are-the-advantages-of-Amazon-EMR-vs-your-own-EC2-instances-vs-running-Hadoop-locally. (quora account required).

HADOOP DSLS

“Apache Hama.” http://hama.apache.org/.

Capriolo, Edward; Wampler, Dean; and Jason Rutherglen. Programming Hive (Sebastopol, CA: O’Reilly Media, 2012). http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335/ref=sr_1_1?s=books&ie=UTF8&qid=1368408335&sr=1-1&keywords=hive.

“Cascading/CoPA.” https://github.com/Cascading/CoPA.

“Cascading Lingual.” http://www.cascading.org/lingual/.

“Cascading Pattern.” http://www.cascading.org/pattern/.

Cascading website. http://www.cascading.org/.

Cascalog website. https://github.com/nathanmarz/cascalog.

Crunch website. https://github.com/cloudera/crunch/tree/master/scrunch.

Czajkowski, Grzegorz. “Large-Scale Graph Computing at Google.” http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html.

“Domain Specific Language.” http://c2.com/cgi/wiki?DomainSpecificLanguage.

Gates, Alan. Programming Pig (Sebastopol, CA: O’Reilly Media, 2011). http://www.amazon.com/Programming-Pig-Alan-Gates/dp/1449302645/ref=sr_1_1?ie=UTF8&qid=1375109835&sr=8-1&keywords=Gates%2C+Alan.+Programming+Pig.

“Introduction to Apache Crunch.” http://crunch.apache.org/intro.html.

Fowler, Martin. Domain-Specific Languages (Boston: Addison-Wesley, 2010). http://www.amazon.com/Domain-Specific-Languages-Addison-Wesley-Signature-Fowler/dp/0321712943.

Scalding website. https://github.com/twitter/scalding.

“Welcome to Apache Giraph!” http://giraph.apache.org/.

“What Are the Differences between Crunch and Cascading?” http://www.quora.com/Apache-Hadoop/What-are-the-differences-between-Crunch-and-Cascading.

Wills, Josh. “Apache Crunch: A Java Library for Easier MapReduce Programming.” http://www.infoq.com/articles/ApacheCrunch.

HADOOP AND BIG DATA SECURITY

“Accumulo User Manual — Security.” http://accumulo.apache.org/1.4/user_manual/Security.html.

“Apache Accumulo.” http://accumulo.apache.org/.

“Authentication for Hadoop Web-Based Consoles.” http://hadoop.apache.org/docs/stable/HttpAuthentication.html.

Becherer, Andrew. “Hadoop Security Design – Just Add Kerberos? Really?” https://media.blackhat.com/bh-us-10/whitepapers/Becherer/BlackHat-USA-2010-Becherer-Andrew-Hadoop-Security-wp.pdf.

Dwork, Cynthia. “Differential Privacy”, from 33rd International Colloquium on Automata, Languages, and Programming, Part II (ICALP 2006) (Springer Verlag, 2007), available at http://research.microsoft.com/apps/pubs/default.aspx?id=64346.

“Hadoop Service Level Authorization Guide.” http://hadoop.apache.org/docs/stable/service_level_auth.html.

“HDFS Permissions Guide.” http://hadoop.apache.org/docs/stable/hdfs_permissions_guide.html.

IETF. “Simple Authentication and Security Layer (SASL).” http://www.ietf.org/rfc/rfc2222.txt.

IETF. “The Kerberos Version 5 Generic Service Application Program Interface (GSS-API) Mechanism: Version 2.” http://tools.ietf.org/html/rfc4121.

IETF. “The Simple and Protected GSS-API Negotiation (SPNEGO) Mechanism.” http://tools.ietf.org/html/rfc4178.

“Kerberos: The Network Authentication Protocol.” http://web.mit.edu/kerberos/.

Naryanan, Shmatikov, “Robust De-Anonymization of Large Sparse Datasets.” http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf.

O’Malley, Owen; Zhang, Kan; Radia, Sanjay; Marti, Ram; and Harrell, Christopher. “Hadoop Security Design”, October 2009, available at https://issues.apache.org/jira/secure/attachment/12428537/security-design.pdf.

“Project Rhino.” https://github.com/intel-hadoop/project-rhino/.

“Security Features for Hadoop”, JIRA HADOOP-4487, https://issues.apache.org/jira/browse/HADOOP-4487.

Williams, Alex. “Intel Releases Hadoop Distribution and Project Rhino — An Effort to Bring Better Security to Big Data.” http://techcrunch.com/2013/02/26/intel-launches-hadoop-distribution-and-project-rhino-an-effort-to-bring-better-security-to-big-data/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset