RabbitMQ Deux: SUCCESS!

I spoke with catlee today to see if he could send over a copy of the scripts that he used to setup buildapi as a user on rabbitmq, and he did. Coop warned that there may be some finicky issues that are enironment specific to my Mac (ie paths, etc). Indeed when I attempted to run the script, with the RabbitMQ server off, I got the error "Error: unable to connect to node rabbit@localhost: nodedown". Then, when I turned the server on, I got the error "Error: {noproc,{gen_server2,call,[worker_pool,next_free,infinity]}}". Obviously something was not quite right, so I did some more looking around. I found that RabbitMQ has a set of plugins that it comes with and they are disabled by default, once I enabled those, I could go into the web app, add buildapi as a user and then changed some config options on buildapi, and BAM! It magically begam accepting entries into the db.

Here is the step by step I used to get RabbitMQ up and running and working with buildapi on Mac OSX.

  1. If MacPorts is not already installed, then go here.
  2. Once you've ensured that MacPorts is installed you can install RabbitMQ: sudo port install rabbitmq-server

    • The instructions for this can be found here
  3. Once RabbitMQ is installed, you need to add buildapi as a user. Enable the rabbitmq_management plugin: rabbitmq-plugins enable rabbitmq_management

    • The instructions for this can be found here
  4. Then restart RabbitMQ: sudo /opt/local/etc/LaunchDaemons/org.macports.rabbitmq-server/rabbitmq-server.wrapper restart
  5. Now go to http://localhost:15672/ and use the username/password combo of guest/guest
  6. Once in, go to 'Admin'
  7. Select the 'Add a user' option and enter the following

    • Username: buildapi
    • Password: buildapi
    • Tags: administrator
  8. Now submit the new user by selecting 'Add user'
  9. Once you have added 'buildapi' as a new user, you will see it listed undet the 'All users' section above
  10. Select 'buildapi' and a window for permissions will come up
  11. Make sure that the permissions are set to the following

    • Virtual Host: /
    • Configure regexp: .*
    • Write regexp: .*
    • Read regexp: .*
  12. Now submit these permissions by selecting 'Set Permission'
  13. Once you have done this, the only thing left is to adjust the config.ini file at the root of buildapi to include the following lines

    • carrot.hostname = localhost
    • carrot.userid = buildapi
    • carrot.password = buildapi
    • carrot.exchange = buildapi.control
    • carrot.consumer.queue = buildapi-web
  14. Once you have made sure that the previous lines were added to your config.ini file in buildapi, then start up buildapi
  15. Go to http://localhost:15672/#/connections and a connection with the username 'buildapi' should be listed and the state should be 'running'

And that's that! I attempted to click 'rebuild' again from a branch page like try and it worked! The database entry was successful!

Now that I have been able to get this mq issue figured out with the help of catlee and coop, thanks guys!, I will now move onto the following:

  • Update the wiki doc on Setting up a Local Virtualenv for BuildAPI with the new found instructions on getting RabbitMQ installed on Mac.
  • Begin writting up unittests to test for proper entry of new buildrequests into the schedulerdb
  • Write up the needed logic to enter a single buildrequest
  • Review the logic
  • Lather, Rinse, Repeat
 

RabbitMQ

I received an email back from ccop today and it sounds like he was getting similar exceptions to what I was, when trying to submit a build. He said that catlee helped him to install and integrate RabbitMQ with buildapi and he was then able to submit builds. Based on that, I am installing and integrating RabbitMQ into buildapi. I have hit a little snag in integrating on a Mac, since the original script from catlee is for linux, but I should be able to get more info on that in the morning when the EST folks are back online. Additionally, coop expanded the buildapi setup docs on the wiki with info on the RabbitMQ and setting up the databases, so this should prove useful for me as well!

 

Rock < Me < Hardplace

Bug 793989: It's been a few days since my last update, but here is the gist. I am still chasing the issue I mentioned before. It doesn't look like I am able to run any controller function that ends up calling g.mq.* (where g is app_globals), because g.mq is returning NoneType. It appears as though buildapi.lib.mq is never actually added to app_globals, or if it is, I cannot seem to find it… How is this setup in the production version of buildapi? For instance, I am assuming that when an 'authorized' user enters a valid revision into a the box at the bottom of https://secure.pub.build.mozilla.org/buildapi/self-serve/try where it says "Create new dep builds on try revision", that it'll successfully kick-off that functionality. In my instance, this simply fails with "AttributeError: 'NoneType' object has no attribute 'newBuildAtRevision'". I have played with pdb a bit to try and unearth something, but it seems to me that there is simply a configuration of some sort missing in my local instance, that is present in the production environment. I am throwing out these questions to coop to see if he has run into this issue before.

Bug 931580: So, in the meantime, I am back to working on bug 931580.

Add-On Idea: Additionally, I threw an idea around to some devs about making an add-on for Firefox that takes your hg-related email (the one you always use to make checkins on hg), and it'll look for, track/log and alert you when a checkin you have made has completed all builds/tests and if it Passed or Failed (Some issue other than all greens). This plugin would make use of the buildapi extension that I already built this summer which returns json to tell whether a checkin has finished all builds/tests and if it has passed them all or failed (again, something other than all greens)… that extension relates to bug 900318

 

Taking a swing at database entry with Pylons

Catlee answered the secondary questions I had concerning bug 793989 and clarified a bunch of things. For phase 1, I am going to implement the functionality at /self-serve/{branch}/builders/{buildername} that simply allows a user to construct their own POST message complete with JSON arguments for changes and properties. A properly structured call to this URL should correctly enter a new buildrequest into the schedulerdb, that buildbot could then grab to quick of the new build. In order to test this, I am going to write up a unit test that checks the schedulerdb for a proper entry… this test should already succeed upon submission of a retrigger. Note: The tables that I need to enter data into are (via catlee):

buildrequests
buildsets
sourcestamps
sourcestamp_changes
changes
change_files

I started poking around with the existing rebuild/retrigger functionality that already exists in self-serve, and I have hit an issue. When I attempt to hit the 'rebuild' button from a branch page such as /self-serve/try, on an existing build/test, I am getting a sever error 500 and the traceback (btw, I disabled the who = self._require_auth() line by instead making who = "Me!" for the time being:

Error - : 'NoneType' object has no attribute 'rebuildBuild'
URL: http://127.0.0.1:5000/self-serve/try/build
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/weberror/errormiddleware.py', line 162 in __call__
  app_iter = self.application(environ, sr_checker)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/beaker/middleware.py', line 152 in __call__
  return self.wrap_app(environ, session_start_response)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/routes/middleware.py', line 131 in __call__
  response = self.app(environ, start_response)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/pylons/wsgiapp.py', line 107 in __call__
  response = self.dispatch(controller, environ, start_response)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/pylons/wsgiapp.py', line 312 in dispatch
  return controller(environ, start_response)
File '/Users/jzeller/buildapi-test/buildapi/buildapi/lib/base.py', line 20 in __call__
  return WSGIController.__call__(self, environ, start_response)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/pylons/controllers/core.py', line 211 in __call__
  response = self._dispatch_call()
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/pylons/controllers/core.py', line 162 in _dispatch_call
  response = self._inspect_call(func)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/pylons/controllers/core.py', line 105 in _inspect_call
  result = self._perform_call(func, args)
File '/Users/jzeller/buildapi-test/lib/python2.7/site-packages/pylons/controllers/core.py', line 57 in _perform_call
  return func(**args)
File '/Users/jzeller/buildapi-test/buildapi/buildapi/controllers/selfserve.py', line 414 in rebuild_build
  retval = g.mq.rebuildBuild(who, build_id, priority)
AttributeError: 'NoneType' object has no attribute 'rebuildBuild'

Upon further investigation of this error, it turns out that the g references app_globals, which is imported from pylons. Within app_globals should be mq (located at /buildapi/lib/mq.py), and as you can see the function rebuildBuild does exist. Looks like g, ie app_globals, is of type <class 'paste.registry.StackedObjectProxy'>, which should be right, but g.mq is indeed of type 'NoneType'… I do not know what gives here. This function is one that is included in the source code already and I am under the impression that it works fine in production, so why is it not working here? More digging is necessary to solve this.

My next steps here are to discover what the cause of this NoneType error is, resolve it and then run a rebuild/retrigger and manually check that the database entries for that build now exist. Once I have done that, I am going to build a unit test that will run a similar check on the database to check that a new build has indeed been added to the schedulerdb, into the proper tables, with valid info. Once this is complete, I am going to then complete the functions necessary to insert a new single buildrequest into the schedulerdb, and use the unittests to verify that the buildrequest entry is properly constructed for buildbot to be able to grab it and start up builds/tests.

 

Waiting for answers, breaking into Bug 931580

I am still waiting for some answers from catlee pertaining to how to properly construct a buildrequest and what tables to fill in and where.

In the meantime, I have started breaking into bug 931580. I am looking through the schedulerdb and statusdb schemas, as well as some existing models in buildapis, to determine how to query for, build and respond with a json list of all slaves organized by master/build_id given a branch and revision.

 

Lots of new information pertaining to Bug 793989

So catlee was able to get back to be yesterday… super fast!

The information that I was able to gain from this pertains to the partial patch that catlee already wrote for the changes needed to buildapi for bug 793989.

  1. Selection of a builder_name for the new end point /self-serve/{branch}/builders/{builder_name} is going to be up to the UI. Catlee imagines a simple text string in a form and some JS to take that to alter the POST url rather than including as form data. Most likely the builders we're interested in would have run at some point, perhaps just not per push. Catlee thinks the builder names could come from TBPL or some auto-bisect tool that's yet to be written.
  2. The JSON for properties and changes in the request POST is not coming from statusdb.properties and statusdb.changes, it's all submitted by the client.
  3. Not sure where the JSON is going to generate from to be added to the request POST.

    • There are a few things we need to create the buildrequest properly:

      • branch, revision: these go into sourcestamps and changes
      • files that have changed: these go into change_files. We're currently using change files to record the URLs of build and tests for the test jobs.
      • properties: used by builders like l10n to determine which locale to repack
    • All of those end up in different tables. We could look for them in different POST parameters.
  4. A very very simple HTML UI would be made for this, and most often this API would be used from other tools like TBPL or the auto-bisect tool.
  5. do_new_build_for_builder in buildapi/scripts/selfserve-agent.py handles inserting the new build request into the schedulerdb, at which point the buildbot masters will find the buildrequest and start it on a slave.
  6. In relation to the comments in selfserve.py

    • Line 508: # TODO: Make sure that the 'fake' branches for sourcestamps are obeyed?

      • Still doesn't remember what 'fake' refers to.
    • Line 527: # TODO: What do we do with change branches? If they're set to something real, then this will trigger schedulers. Can we use a fake value?

      • This piece ties into how buildbot scheduling works. Because we need to have files associated with the changes so we have a place to record the build and test URLs [1], we need to insert new changes into the DB. Changes need a branch.
    • Line 538: # TODO: What value do we choose for branch? If we choose a real value, like "mozilla-central", or "mozilla-central-opt-linux", then we risk triggering the regular schedulers and having a full suite of test runs scheduled instead of just the builder we're interested in.

      • I was thinking that probably ${branch}-selfserve would work.
    • Line 538: # TODO: invalidate cache for branch

      • There are similar comments on the other methods here. buildapi maintains a cache of pending/running/finished jobs per branch and revision. The comment says that we should explicitly invalidate that cache so that subsequent requests will see the new pending job. As it stands, you need to wait for the cache to expire (which is like 60 seconds I think). Not a major issue, and certainly doesn't need to be dealt with as part of your work here.
  7. In relation to the comments in selfserve-agent.py

    • Line 3.32 : # TODO: Attach files sourcestamps -> sourcestamp_changes -> changes -> change_files but new changes may trigger work by other schedulers…

      • This ties into your questions above about the data coming into the API, and also the choice of branch to use for the changes we're inserting.
    • Line 3.35 : # TODO: accept change objects here instead of contructing from files

      • I had started writing this code by passing in just a list of files that would get associated with a new change object. I wondered if it would make more sense to have the client fully specify the change object rather than restricting the API to only deal with change files.

Additionally, I was able to confirm/clarify these assumptions:

  1. At the bottom of a typical revision page on BuildAPI and at the bottom of a branch page, there are 3 boxes which allow you to retrigger a set of builds for dep, PGO and nightly. Ideally, we want to be able to (re)trigger a job on any builder without launching the entire suite of builds/tests that normally happen. The UI could use some cleanup here so that if you're on the per-revision page, there's just a button for new dep/PGO/nightly build with no revision field.
  2. Pulling a list of builds from TryChooser isn't a good idea, we need to find a better way.
  3. Taking in the computed syntax from TryChooser Syntax Builder for retriggering a build/test is not a good idea. Some things that aren't immediately available are the URLs for the build and test packages.
  4. For the first pass we should focus on making the API able to trigger jobs, and leave some of the complicated UI elements until later (e.g. which buildername, which change files, properties, etc). We could also change our logic for finding the build/test urls
  5. Pylons is a pretty madass MVC

More to come!

 

Questions have been sent to catlee

Took a good look over catlee's partial patch after applying it to my buildapi instance so I could play with it a bit. I came up with a good list of questions for him and sent them off. I expect to hear back no sooner than Wednesday since he is traveling currently.

It is looking like the patch is designed with the notion that the UI will be a simple form that allows the user to make a POST request detailing the build to (re)trigger and a priority value for that build, etc. The patch looks mostly done, but there is some obvious parts that need completing. So hopefully after catlee answers back this week I can start hammering away at a solution to the missing pieces of this patch!

Until catlee responds to my questions, I am going to start looking into bug 941580 and answering back comments that have been added to bug 900318.

 

Firing up BuildAPI, among other things

I was finally able to get BuildAPI up and running, and everything works great! Didn't take long, I had moved it to a new directory and it had broken many paths

Was able to gain some context for Bug 793989 and I have a much better idea of what is happening now. At the bottom of a typical revision page on BuildAPI, there are 3 boxes which allow you to retrigger a set of builds for dep, PGO and nightly. Ideally, we'd like to be able to retrigger a given build individually regardless of whether or not it was run the first time, and without launching the entire suite of builds that it belongs to. It is possible that the build types that are necessary in this list of manual retriggers are listed on the TryChooser Syntax Builder page. I am thinking that simply adding a box on the bottom of a typical revision page on BuildAPI that takes in a computed syntax like the one that is available from the TryChooser Syntax Builder would work well. However, that just gives me an idea of the front end user interface. Now I am going to look through the patch that catlee already began to write and determine what his angle of attack is on this. It may turn out that it'd be easier to first implement this as a REST API and then once that is complete, I could just allow the user interface to take advantage of the new REST API functionality. I am also going to be compiling a list of assumptions and questions to send over to catlee to clarify this further before taking a true stab at it.

In other news, also gained my LDAP credentials back again today! So now I shouldn't be inhibited by anything that requires my LDAP :)

 

First day back with Release Engineering as a “Student Worker”

Today is my first day back on the Release Engineering team and it feels great! Now down to da business.

tumblr_mtakhplLyB1rmvfheo1_400

Right off the bat I am handling 2 bugs, and I’ve read into both to get the jist of what is happening. Also compiling some questions.

  • Bug 793989 of primary concern
    • Self-serve should be able to manually request trigger of any of the standard test suite jobs/additional build jobs that were not run in the original push (not just retriggers or complete sets of dep/PGO/Nightly builds).
    • Useful Files
  • Bug 931580
    • Bug 923213 referes to adding a USD $ amount to each try report to improve the discussion surrounding build inefficiencies. In order to generate some data for this, we need to have buildapi be capable of returning a list of buildslaves after a job is complete.

Next, I am getting my local buildapi instance running again. I have made some changes to my environment and I suspect it’s behind some wierdness that now exists within the virtualenv for buildapi, so I reran pip install -r requirements.txt to reinstall all of the proper dependancies. Then I danced around with some errors, and am saving it for tomorrow, but it seems to be almost ready to go.