Tuesday, July 17, 2012

Running your Scalding jobs in Eclipse

In a nutshell, Scalding is a Cascading DSL in Scala. If you know what that line means, skip to the meat below else read the next section for a small bit of background. Note: If you are reading this again, I have updated the below sections to rely on Maven instead of SBT and I have included a link to my sample project to help get you started and fixed some serious omissions when I revisited my own blog post to setup a new scalding project.

Introduction

The Hadoop ecosystem is full of wonderful libraries/frameworks/tools to make it easier to query your big data sets without having to write raw Map/Reduce jobs. Cascading(http://www.cascading.org) is one such framework that, simply put, provides APIs to define work flows that process your large datasets. Cascading provides facilities to think about data in terms of pipes through which flows tuples that can be filtered and processed with operations. Couple this with the fact that pipes can be joined together or split apart to produce new pipes and that Flows (which connects data sources and data sinks) can be tied together, you can create some pretty powerful data work flows. 
Cascading is a wonderful API which provides the ability to do all these great things but because it's Java and the language is verbose, it's always a bit hard to get started with Cascading from scratch. I've been using it for years and I find that each new project requires me to go back to a previous one and copy/paste some boilerplate code. I think others had similar problems and hence came out with numerous DSL (Domain Specific Language) written in Ruby, Clojure (Cascalog), Scala (Scalding) etc to wrap Cascading to make it easier to write these flows.

I won't pretend to be a Scalding expert so I advise you to visit their site (https://github.com/twitter/scalding/) but what I do know is that it's a Scala DSL around Cascading with some slight tweaks to make it easy to build big data processing flows in Scala. The API is designed to look like the collections API so the same code that works on a small list of data could be used to also work on a stream of billions of tuples. I wanted to play with Scalding so I read the wiki page, downloaded it and copied the tutorial but then I wondered, how can I run this in Eclipse? Mainly because it provides me the ability to write, debug and run (locally mainly) my jobs without having to hit the command line for some tool. At the end of the day, it's a JVM language so it must be able to run in Eclipse right?

Maven + Scalding + Eclipse, Oh My!

I don't have much of an opinion about SBT and can't really say much good or bad about it but I do know Maven is popular and I tend to like it for managing my project dependencies and assembly descriptors etc. It also reduces the amount of stuff to install when setting up a new laptop or bringing new team members up to speed on this technology so I wanted to get this working with as few moving parts as possible.

Pre-Requisites:

  1. Eclipse
  2. Maven
  3. Scala Plugin for Eclipse

Running Scalding in Eclipse

Perhaps the simplest way to get started is to clone my sample project from git and modify as necessary. Once cloned, simply run
mvn eclipse:eclipse
to generate the eclipse project files and everything should build as expected. The sample job is the word count job found from the scalding tutorial.
Once you have a working eclipse project, to run the scalding job:
  1. Create a new runtime configuration:
    Main class: com.twitter.scalding.Tool
    Program Args: <Fully Qualified Path to Job Class> <Other CLI Args>
    Example: org.hokiesuns.scaldingtest.WordCountJob --local --input ./input/input.txt --output ./output.txt
    VM Args: -Xmx3072m
    
To create a job jar that can be submitted to your hadoop cluster, simply run
mvn package
which will generate a fat jar with all the dependencies. This job jar can be submitted to your cluster by executing
 hadoop jar scaldingsample-0.0.1-SNAPSHOT.jar org.hokiesuns.scaldingtest.WordCountJob --hdfs --input <some path> --output <some path> 
I just started using Scalding and got this working in Eclipse. If there are any problems or inaccuracies, please post a comment and I'll update my steps. Happy scalding-ing!

139 comments:

  1. Was there anything special you needed to do on your Scalding job to get it to run under Hadoop (pseudo-distributed)? I am trying to do the same thing but it looks like the --hdfs does not seem to affect (regardless of the --hdfs or --local the logs say flow started: local) and it cannot find the input file in HDFS (I have tried specifying hdfs://, absolute and relative). I did copy the following jars over to my hadoop/lib directory from my application classpath (scala-library.jar, scalding_2.9.2.jar, cascading-core-2.0.2.jar, cascading-hadoop-2.0.2.jar, maple-0.2.2.jar, cascading-local-2.0.2.jar, jgrapht-jdk1.6-0.8.1.jar and guava-10.0.1.jar). In my scalding class, I made it extend Job(args) and then just put in a linear sequence of commands. I've also tried a companion object with a main method in it. I am able to call it with the hadoop jar command like you showed, but I cannot get it to see the file. Thanks in advance for any help you can provide.

    ReplyDelete
  2. Hey Sujit,

    Unfortunately, I haven't tried to run this in any other mode but local. I'm surprised that this isn't working but will have to try and run this myself to see if there is anything I can glean but honestly my scalding knowledge is not very good. I was hoping to do more with it but didn't have the time :-(

    Thanks for reading through my post and posting this.. maybe someone else who stumbles across this will have some comment but I'll certainly give it a try myself.

    Cheers
    Amit

    ReplyDelete
  3. Thank you. I will also ask on the mailing list.

    ReplyDelete
  4. Can you help me in this error:



    hadoop jar scaldingsample-0.0.1-SNAPSHOT.jar org.hokiesuns.scaldingtest.WordCountJob --hdfs --input /NOTICE --output /out


    Warning: $HADOOP_HOME is deprecated.

    Exception in thread "main" java.lang.NoSuchMethodException: org.hokiesuns.scaldingtest.WordCountJob.main([Ljava.lang.String;)
    at java.lang.Class.getMethod(Class.java:1624)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:150)

    ReplyDelete
    Replies
    1. Oh no! I have an error in my blog. I will fix that.. my apologies. Not having tested immediately (I'll have to check it out and re-test), I think it's something like
      hadoop jar scaldingsample-0.0.1-SNAPSHOT.jar com.twitter.scalding.Tool org.hokiesuns.scaldingtest.WordCountJob --hdfs --input --output

      Delete
    2. This is what worked for me (btw, thanks for the blog, it helps !):

      hadoop fs -mkdir /input

      # you get this file when you git clone
      # copy it onto cluster before firing the job

      hadoop fs -copyFromLocal input/input.txt /input/

      # mvn package will create this jar under target
      hadoop jar target/scaldingsample-0.0.1-SNAPSHOT.jar com.twitter.scalding.Tool org.hokiesuns.scaldingtest.WordCountJob --hdfs --input /input/input.txt --output output/output.txt

      # time to see some output (gratification, ie)

      hadoop fs -cat output/output.txt/part-00000

      Delete
  5. Hello ,
    what do you mean by "Create a new runtime configuration" is that a maven or eclipse setting ?

    ReplyDelete
    Replies
    1. It's an Eclipse thing. Run menu open Run Configurations

      Delete
    2. Please explain this run confuguration step in detail, if possible with screenshots also

      Delete
  6. I've just decided to create a blog, which I have been wanting to do for a while. Thanks for this post, it's really useful! Management Jobs in London

    ReplyDelete
  7. where do i run the command mvn eclipse:eclipse? I am using windows and i have downloaded the zip file of your project on my pc. now when i open cmd at the downloaded unzipped copy of your project and i run mvn eclipse:eclipse on cmd i get error that mvn is not recognized ...

    ReplyDelete
    Replies
    1. Never mind, I have installed maven and added it to environment variable.

      Delete
  8. Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.

    hadoop training in chennai

    hadoop training in bangalore

    hadoop online training

    hadoop training in pune

    ReplyDelete
  9. Great!it is really nice blog information.after a long time i have grow through such kind of ideas.thanks for share your thoughts with us.
    best selenium training institute in bangalore
    selenium certification bangalore
    Selenium Training in Vadapalani
    Selenium Training in Kelambakkam

    ReplyDelete
  10. Do you have a spam issue on this website; I also am a blogger, and I wanted to know your situation; many of us have developed some nice methods, and we are looking to trade methods with others, why not shoot me an e-mail if interested.
    safety course in chennai

    ReplyDelete
  11. I think this is the best article today about the future technology. Thanks for taking your own time to discuss this topic, I feel happy about that curiosity has increased to learn more about this topic. Artificial Intelligence Training in Bangalore. Keep sharing your information regularly for my future reference.

    ReplyDelete
  12. Thanks for sharing informative post. Looking for best cleaning companies Sunshine Coast? We are the professional cleaners offering bond cleaning & other cleaning service at affordable price.

    ReplyDelete
  13. Looking to add extra style to your content, make use of our strikethrough Text Generator to add amazing line through text in desired platform. Strikethrough in Google Docs.

    ReplyDelete
  14. Thanks for sharing informative post. If you are based in Melbourne and looking for best cleaners to concentrate on your daily task contact Carpet Cleaning Melbourne from Drymaster for professional service.

    ReplyDelete
  15. very good post!!! Thanks for sharing with us... It is more useful for us..
    I am amazed by the way you have explained things in this post. This post is quite interesting and i am looking forward to read more of your posts.
    lenovo service center in chennai
    lenovo mobile service center in chennai
    lenovo service centre chennai
    lenovo service center
    lenovo mobile service center near me
    lenovo mobile service centre in chennai
    lenovo service center in velachery

    ReplyDelete
  16. BSNL Speed Test:- Today the high-speed internet is considered as the most important requirement of an internet connection. It ensure comfort Bsnl speedtest.

    speed test bsnl

    ReplyDelete
  17. Hello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
    AWS Training in Chennai | Best AWS Training in Chennai
    Best Data Science Training in Chennai
    Best Python Training in Chennai
    Best RPA Training in Chennai
    Digital Marketing Training in Chennai
    Matlab Training in Chennai
    Best AWS Course Training in Chennai
    <

    ReplyDelete
  18. Now a days getting job is very tough thing. So in this blog very useful for job searching candidates. very big help for those peoples. Useful content for Many job searching peoples.

    Python Training | Python Course | Python Training in Chennai | Python Course in Chennai

    ReplyDelete
  19. Hey, would you mind if I share your blog with my twitter group? There’s a lot of folks that I think would enjoy your content. Please let me know. Thank you.
    Java Training in Chennai | J2EE Training in Chennai | Advanced Java Training in Chennai | Core Java Training in Chennai | Java Training institute in Chennai

    ReplyDelete
  20. This comment has been removed by the author.

    ReplyDelete
  21. Hi, thank you very much for new information, i learned something new. Very well written.It was so good to read and usefull to improve knowledge.Keep posting. If you are looking for any big data hadoop related information please visit our website.
    big data hadoop training in bangalore.

    ReplyDelete
  22. Great Post. It was so informative. Are you looking for the best Home Elevator in India. Click here: Home lift India

    ReplyDelete
  23. I thank you for this; really your efforts are appreciable. Keep on do this.
    Data science with python training in Bangalore

    ReplyDelete
  24. This comment has been removed by the author.

    ReplyDelete
  25. I have to voice my passion for your kindness giving support to those people that should have guidance on this important matter.
    MCSE Training in chennai | mcse training class chennai


    ReplyDelete
  26. Thanks for sharing useful information. I learned something new from your bog. Its very interesting and informative. keep updating. If you are looking for any apache spark scala related information, please visit our website apache spark scala training institute in bangalore

    ReplyDelete
  27. We as a team of real-time industrial experience with a lot of knowledge in developing applications in python programming (7+ years) will ensure that we will deliver our best in python training in vijayawada. , and we believe that no one matches us in this context.

    ReplyDelete
  28. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, learn azure but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..

    ReplyDelete
  29. Thanks for sharing a great article.Leather cleaning sydney with best price and offers. Call at 0414 534 770

    ReplyDelete
  30. Very creative blog!!! I learned a lot of new things from your post. It is really a good work and your post is the knowledgeable. Home lifts Melbourne
    Home lifts

    ReplyDelete
  31. Thanks for sharing this informations.
    data science course in coimbatore

    data science training in coimbatore

    android training institutes in coimbatore

    ios training in coimbatore

    aws training in coimbatore

    amazon web services training in coimbatore

    big data training in coimbatore

    ReplyDelete
  32. It is actually a great and helpful piece of information about Java. I am satisfied that you simply shared this helpful information with us. Please stay us informed like this. Thanks for sharing.
    Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

    ReplyDelete
  33. Thanks for sharing this informations.
    CCNA Training Institute in Coimbatore

    CCNA Course in Coimbatore

    Java training in coimbatore

    Selenium Training in Coimbatore

    ios training in coimbatore

    aws training in coimbatore

    big data training in coimbatore

    hadoop training in coimbatore

    ReplyDelete
  34. This comment has been removed by the author.

    ReplyDelete
  35. I am really very happy to find this particular site. I just wanted to say thank you for this huge read!! I absolutely enjoying every petite bit of it and I have you bookmarked to test out new substance you post.Oneyes Technologies
    Inplant Training in Chennai
    Inplant Training in Chennai for CSE IT MCA
    Inplant Training in Chennai ECE EEE EIE
    Inplant Training in Chennai for Mechanical
    Internship in Chennai

    ReplyDelete
  36. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    SAP HCM Online Training

    SAP HCM Classes Online

    SAP HCM Training Online

    Online SAP HCM Course

    SAP HCM Course Online

    ReplyDelete
  37. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.

    SAP HANA Online Training

    SAP HANA Classes Online

    SAP HANA Training Online

    Online SAP HANA Course

    SAP HANA Course Online

    ReplyDelete
  38. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.

    Data Science Online Training

    Data Science Classes Online

    Data Science Training Online

    Online Data Science Course

    Data Science Course Online

    ReplyDelete
  39. Forex Signals, MT4 and MT5 Indicators, Strategies, Expert Advisors, Forex News, Technical Analysis and Trade Updates in the FOREX IN WORLD

    Forex Signals Forex Strategies Forex Indicators Forex News Forex World

    ReplyDelete
  40. Thanks for sharing very interesting article. I likes your post. They are read so awesome. Rajasthan Budget Tours

    ReplyDelete
  41. very interesting, good job and thanks for sharing such a good blog. Seo Services Delhi

    ReplyDelete
  42. Really awesome blog!!! I finally found great post here. I really enjoyed reading this article. It's really a nice experience to read your post. Thanks for sharing your innovative ideas. Excellent work! I will get back here.
    Java Training in Chennai

    Java Training in Velachery

    Java Training inTambaram

    Java Training in Porur

    Java Training in Omr

    Java Training in Annanagar



    ReplyDelete
  43. This comment has been removed by the author.

    ReplyDelete
  44. It is good article! If you want to post it on twitter too, go to this site https://viplikes.net/ and buy twitter likes

    ReplyDelete
  45. Good blog informative for readers such a nice content keep posting thanks for sharing this.

    ReplyDelete
  46. Such a very useful article. Very interesting to read this article. I would like to thank you for the efforts you had made for writing this awesome article.
    DevOps Training in Chennai

    DevOps Course in Chennai

    ReplyDelete
  47. Great Content & Thanks For Shaaring. But Do You Know Which Is Top 10 Digital Marketing Company In Dehradun

    ReplyDelete
  48. good work.this article provides a detailed information in a effective way

    Python Training in chennai | Python Classes in Chennai

    ReplyDelete
  49. Want to do
    Data Science Course in Chenna
    i with Certification Exam? Catch the best features of Data Science training courses with Infycle Technologies, the best Data Science Training & Placement institutes in and around Chennai. Infycle offers the best hands-on training to the students with the revised curriculum to enhance their knowledge. In addition to the Certification & Training, Infycle offers placement classes for personality tests, interview preparation, and mock interviews for clearing the interviews with the best records. To have all it in your hands, dial 7504633633 for a free demo from the experts

    ReplyDelete
  50. On your place I would make a video version of this tutorial and publish the video on youtube. I often post my video on youtube. Sometimes I use this site https://soclikes.com/ to buy more youtube likes

    ReplyDelete
  51. The best quality hair extensions are available to clients directly. If only you could feel how SILKY this hair is and will continue to be for over a year. This hair is worth the investment and will give you stress-free beautiful hair. The hand-tied weft is ideal for thin hair because it is extremely flat.

    ReplyDelete

  52. Click this LINK
    Because Cyberlink is perfectly GOOD software if you enjoy the convenience of being able to shove a DVD into your computer and watch a movie. Something the Windows Media Player no longer supports.

    ReplyDelete
  53. CyberLink YouCam uses a virtual driver to easily work with most webcam devices and messaging software, Here are the main functions: Add effects to your webcam video, including Avatars, Filters and Particles, Emotions, Distortions and Frames. Add accessory gadget effects, such as hats and masks to your webcam image.

    Click this LINK

    ReplyDelete
  54. This impressed me so much amazing. Keep working and providing information
    attribute patch

    ReplyDelete
  55. I’m glad that you just shared this helpful info with us. Please keep us informed like this
    full version of QBittorrent

    ReplyDelete
  56. MKVToolnix Crack is an amazing Post with good content.FIND CRACK is the best crack software site for all Mac and Windows users all over the world.

    ReplyDelete

  57. what a informative and knowlegeable websites.TrueCAD Crack

    ReplyDelete


  58. This is very interesting blog. A lot of article I read nowadays don't really offer anything that I'm enthusiastic about, but I'm most certainly hooked about this one.

    Nitlimiter crack

    ReplyDelete



  59. It is really what I wanted to see hope in future you will continue for sharing such an excellent. Your writing skills are gorgeous. Keeo it up!

    Nitlimiter crack

    ReplyDelete
  60. You've created a very useful website.
    I really like your blog as it is not only extremely useful but also creative at the same time.
    Substance painter crack

    ReplyDelete

  61. Thank you for reading,
    I hope this post was useful to you.
    I appreciate you sharing such an informative and interesting post.
    AVG secure vpn patch

    ReplyDelete

  62. Thank you for reading,
    I hope this post was useful to you.
    I appreciate you sharing such an informative and interesting post.
    Crackssea

    ReplyDelete
  63. If you move a file or directory to a new directory without specifying a new name, it retains its original name.coreldraw graphics suite crack

    ReplyDelete
  64. https://hokiesuns.blogspot.com/2012/07/running-your-scalding-jobs-in-eclipse.html?showComment=1638278402991#c8506808699717413146

    ReplyDelete
  65. The blogs you shared are really very helpful and inspiring.

    ceramic coating in chennai

    ReplyDelete
  66. Your blogs very crystal and clear to understand thank you for sharing them.

    Buy Home Theatre Systems In Chennai

    ReplyDelete
  67. I should assert barely that its astounding! The blog is informational also always fabricate amazing entitys. hanumanchalisalyrics

    ReplyDelete
  68. Choose Best Divorce Lawyers in Chennai with well experienced. Icon Legal Service Provide the Best Divorce Advocate in Chennai location.
    Best Divorce Advocate in chennai

    ReplyDelete
  69. Well done! I am really glad to read your fantastic posting and keep sharing...
    Spousal Support in VA
    Virginia Spousal Support

    ReplyDelete
  70. Personal style: Choose Jewellery that aligns with your personal style. Classic dressers can opt for simple and timeless pieces, while those with a more adventurous style can go for bold and statement Jewellery.
    https://www.dishisjewels.com/mangalsutras

    ReplyDelete
  71. This comment has been removed by the author.

    ReplyDelete
  72. Good One. https://spectrumdigitalinfocom.com

    ReplyDelete