At Mozilla we run thousands of jobs to build and test Firefox every day. We also try to do smarter scheduling to save resources under high load which results in the need to do further investigation when an issue arises. Even when we have the data, sometimes we have intermittent data or performance data which needs additional runs to detect a pattern. The Mozilla CI Tools project is designed to arbitrarily schedule jobs on given revisions and job types based on different scenarios. This is a difficult problem to solve as we communicate with many systems to get accurate information which is needed for us to ensure we are sending the right parameters to trigger a specific job. In addition we have a set of specific higher level scenarios to solve for when we get a failure or intermittent failure. As the tool chain matures these scenarios will be integrated into existing tools and dashboards.
Some of what this project can potentially accomplish is:
- Trigger any jobs (builds, tests, nightly, L10n et al)
- Query any information related to our VCS systems
- Determine completeness of jobs run on a revision
- Find hidden jobs that are permanently wasting resources
- Help us bisect intermittent oranges
- Help us backfill any missing jobs
- Help us find any files/artifacts generated by any job in our CI
This year’s goal is to answer some of these needs based on Release Engineering’s current Buildbot CI. In the near future, we should also be able to do the same for the TaskCluster CI.
In order to accomplish this we need to add the following basic features:
- Determine accurately the current state of jobs
- Determine the full set of jobs that can be run for a given revision
- Log jobs triggered in a consumable manner
- Allow a user monitor jobs triggered
- Create test framework to test the various CI data sources or mock them
The remainder of this document will describe our roadmap and potential use cases.