Developing batch changes
Getting started
Welcome, new batch change developer! This section will give you a rough overview of what Batch Changes is and how it works.
NOTE: Never hesitate to ask in
#batch-changes-internalfor help!
What are batch changes?
Before diving into the technical part of batch changes, make sure to read up on what batch changes are, what they're not and what we want them to be:
- Look at the batch changes product page.
- Watch the 2min demo video
Next: create your first batch change!
Creating a batch change locally
NOTE: Make sure your local development environment is set up by going through the "Getting started" guide.
- Since Batch Changes is an enterprise-only feature, make sure to start your local environment with
sg start(which defaults tosg start enterprise). - Go through the Quickstart for Batch Changes to create a batch change in your local environment. See "Testing repositories" for a list of repositories in which you can safely publish changesets.
- Now combine what you just did with some background information by reading the following:
Code walkthrough
To give you a rough overview where each part of the code lives, let's take a look at which code gets executed when you
- run
src batch preview -f your-batch-spec.yaml - click on the preview link
- click Apply to publish changesets on the code hosts
It starts in src-cli:
src batch previewstarts the "preview" command insrc-cli- That executes your batch spec, which means it parses it, validates it, resolves the namespace, prepares the docker images, and checks which workspaces are required
- Then, for each repository (or workspace in each repository), it runs the
stepsin the batch spec by downloading a repository archive, creating a workspace in which to execute thesteps, and then starting the Docker containers. - If changes were produced in a repository, these changes are turned into a
ChangesetSpec(a specification of what a changeset should look like on the code host—title, body, commit, etc.) and uploaded to the Sourcegraph instance src batch preview's last step is then to create aBatchSpecon the Sourcegraph instance, which is a collection of theChangesetSpecs that you can then preview or apply
When you then click the "Preview the batch change" link that src-cli printed, you'll land on the preview page in the web frontend:
- The
BatchChangePreviewPagecomponent then sends a GraphQL request to the backend to query theBatchSpecByID. - Once you hit the Apply button, the component uses the
applyBatchChangeto apply the batch spec and create a batch change. - You're then redirected to the
BatchChangeDetailsPagecomponent that shows you you're newly-created batch change.
In the backend, all Batch Changes related GraphQL queries and mutations start in the Resolver package:
- The
CreateChangesetSpecandCreateBatchSpecmutations thatsrc-clicalled to create the changeset and batch specs are defined here. - When you clicked Apply the
ApplyBatchChangeresolver was executed to create the batch change. - Most of that doesn't happen in the resolver layer, but in the service layer: here is the
(*Service).ApplyBatchChangemethod that talks to the database to create an entry in thebatch_changestable. - The most important thing that happens in
(*Service).ApplyBatchChangeis that it calls therewirerto wire the entries in thechangesetstable to the correctchangeset_specs. - Once that is done, the
changesetsare created or updated to point to the newchangeset_specsthat you created withsrc-cli.
After that you can look at your new batch change in the UI while the rest happens asynchronously in the background:
- In a background process (which is started in (
enterprise/cmd/repo-updater](https://github.com/sourcegraph/sourcegraph/blob/e7f26c0d7bc965892669a5fc9835ec65211943aa/enterprise/cmd/repo-updater/main.go#L58)) aworkeris running that monitors thechangesetsthe table. - Once a
changesethas been rewired to a newchangeset_specand reset, this worker, called theReconciler, fetches the changeset from the database and "reconciles" its current state (not published yet) with its desired state ("published on code host X, with this diff, that title and this body") - To do that, the
Reconcilerlooks at the changeset's current and previousChangesetSpecto determine a plan for what it should do ("publish", "push a commit", "update title", etc.) - Once it has the plan, it hands over to the
Executorwhich executes the plan. - To push a commit to the code host, the
Executorsends a request to thegitserverservice - To create or update a pull request or merge request on the code host it builds a
ChangesetSourcewhich is a wrapper around the GitHub, Bitbucket Server, Bitbucket Data Center and GitLab HTTP clients.
While that is going on in the background the BatchChangeDetailsPage component is polling the GraphQL to get the current state of the Batch Change and its changesets.
Once all instances of the Reconciler worker are done determining plans and executing them, you'll see that your changesets have been published on the code hosts.
Glossary
Batch changes introduce a lot of new names, GraphQL queries & mutations, and database tables. This section tries to explain the most common names and provide a mapping between the GraphQL types and their internal counterpart in the Go backend.
| GraphQL type | Go type | Database table | Description |
|---|---|---|---|
Changeset | batches.Changeset | changesets | A changeset is a generic abstraction for pull requests and merge requests. |
BatchChange | batches.BatchChange | batch_changes | A batch change is a collection of changesets. The central entity. |
BatchSpec | batches.BatchSpec | batch_specs | A batch spec describes the desired state of a single batch change. |
ChangesetSpec | batches.ChangesetSpec | changeset_specs | A changeset spec describes the desired state of a single changeset. |
ExternalChangeset | batches.Changeset | changesets | Changeset is the unified name for pull requests/merge requests/etc. on code hosts. |
ChangesetEvent | batches.ChangesetEvent | changeset_events | A changeset event is an event on a code host, e.g. a comment or a review on a pull request on GitHub. They are created by syncing the changesets from the code host on a regular basis and by accepting webhook events and turning them into changeset events. |
Structure of the Go backend code
The following is a list of Go packages in the sourcegraph/sourcegraph repository and short explanations of what each package does:
-
enterprise/internal/batches/types:Type definitions of common
batchestypes, such asBatchChange,BatchSpec,Changeset, etc. A few helper functions and methods, but no real business logic. -
enterprise/internal/batches:The hook
InitBackgroundJobsinjects Batch Changes code intoenterprise/repo-updater. This is the "glue" in "glue code". -
enterprise/internal/batches/backgroundAnother bit of glue code that starts background goroutines: the changeset reconciler, the stuck-reconciler resetter, the old-changeset-spec expirer.
-
enterprise/internal/batches/rewirer:The
ChangesetRewirermaps existing/new changesets to the matchingChangesetSpecswhen a user applies a batch spec. -
enterprise/internal/batches/state:All the logic concerned with calculating a changesets state at a given point in time, taking into account its current state, past events synced from regular code host APIs, and events received via webhooks.
-
enterprise/internal/batches/search:Parsing text-field input for changeset searches and turning them into database-queryable structures.
-
enterprise/internal/batches/search/syntax:The old Sourcegraph-search-query parser we inherited from the search team a week or two back (the plan is not to keep it, but switch to the new one when we have time)
-
cmd/frontend/internal/batches/resolvers:The GraphQL resolvers that are injected into the
enterprise/frontendincmd/frontend/internal/batches/init.go. They mostly concern themselves with input/argument parsing/validation, (bulk-)reading (and paginating) from the database via thebatches/store, but delegate most business logic tobatches/service. -
cmd/frontend/internal/batches/resolvers/apitest:A package that helps with testing the resolvers by defining types that match the GraphQL schema.
-
enterprise/internal/batches/testing:Common testing helpers we use across
enterprise/internal/batches/*to create test data in the database, verify test output, etc. -
enterprise/internal/batches/reconciler:The
reconcileris what gets kicked off by theworkerutil.Workerinitialised inbatches/backgroundwhen achangesetis enqueued. It's the heart of the declarative model of batches: compares changeset specs, creates execution plans, executes those. -
enterprise/internal/batches/syncer:This contains everything related to "sync changeset data from the code host to sourcegraph". The
Synceris started in the background, keeps state in memory (rate limit per external service), and syncs changesets either periodically (according to heuristics) or when directly enqueued from theresolvers. -
enterprise/internal/batches/service:This is what's often called the "service layer" in web architectures and contains a lot of the business logic: creating a batch change and validating whether the user can create one, applying new batch specs, calling the
rewirer, deleting batch changes, closing batch changes, etc. -
cmd/frontend/internal/batches/webhooks:These
webhooksendpoints are injected byInitFrontendinto thefrontendand implement thecmd/frontend/webhooksinterfaces. -
enterprise/internal/batches/store:This is the batch changes
Storethat takesenterprise/internal/batches/typestypes and writes/reads them to/from the database. This contains everything related to SQL and database persistence, even some complex business logic queries. -
enterprise/internal/batches/sources:This package contains the abstraction layer of code host APIs that live in
internal/extsvc/*. It provides a generalized interfaceChangesetSourceand implementations for each of our supported code hosts.
Diving into the code as a backend developer
- Read through
./cmd/frontend/graphqlbackend/batches.goto get an overview of the batch changes GraphQL API. - Read through
./enterprise/internal/batches/types/*.goto see all batch changes related type definitions. - Compare that with the GraphQL definitions in
./cmd/frontend/graphqlbackend/batches.graphql. - Start reading through
./enterprise/internal/batches/resolvers/resolver.goto see how the main mutations are implemented (look atCreateBatchChangeandApplyBatchChangeto see how the two main operations are implemented). - Then start from the other end,
enterprise/cmd/repo-updater/main.go.enterpriseInit()creates two sets of batch change goroutines: batches.NewSyncRegistrycreates a pool of syncers to pull changes from code hosts.batches.RunWorkerscreates a set of reconciler workers to push changes to code hosts as batch changes are applied.
Testing repositories
Batch changes create changesets (PRs) on code hosts. For testing Batch Changes locally we recommend to use the following repositories:
- The sourcegraph-testing GitHub organization contains testing repositories in which you can open pull requests.
- We have an
automation-testingrepository that exists on Github, Bitbucket Server, and GitLab - The GitHub user
sd9was specifically created to be used for testing Batch Changes. See "GitHub testing account" for details.
If you're lacking permissions to publish changesets in one of these repositories, feel free to reach out to a team member.
GitHub testing account
To use the sd9 GitHub testing account:
- Find the GitHub
sd9user in 1Password - Copy the
Campaigns Testing Token - Change your
dev-private/enterprise/dev/external-services-config.jsonto only contain a GitHub config with the token, like this:
{
"GITHUB": [
{
"authorization": {},
"url": "https://github.com",
"token": "<TOKEN>",
"repositoryQuery": ["affiliated"]
}
]
}Batch Spec examples
Take a look at the following links to see some examples of batch changes and the batch specs that produced them:
Server-side execution
Database tables
There are currently (Sept '21) four tables at the heart of the server-side execution of batch specs:
batch_specs. These are the batch_specs we already have, but in server-side mode they are created through a special mutation that also creates a batch_spec_resolution_job, see below.
batch_spec_resolution_jobs. These are worker jobs that are created through the GraphQL when a user wants to kick of a server-side execution. Once a batch_spec_resolution_job is created a worker will pick them up, load the corresponding batch_spec and resolve its on part into RepoWorkspaces: a combination of repository, commit, path, steps, branch, etc. For each RepoWorkspace they create a batch_spec_workspace in the database.
batch_spec_workspace. Each batch_spec_workspace represents a unit of work for a src batch exec invocation inside the executor. Once src batch exec has successfully executed, these batch_spec_workspaces will contain references to changeset_specs and those in turn will be updated to point to the batch_spec that kicked all of this off.
batch_spec_workspace_execution_jobs. These are the worker jobs that get picked up the executor and lead to src batch exec being called. Each batch_spec_workspace_execution_job points to one batch_spec_workspace. This extra table lets us separate the workspace data from the execution of src batch exec. Separation of these two tables is the result of us running into tricky concurrency problems where workers were modifying table rows that the GraphQL layer was reading (or even modifying).
Here's a diagram of their relationship:
