Extracting a JS library from a snippets repository
Context
A couple of years ago, I created a repository for all those JavaScript snippets and files that were were too small to turn into their own project, and could just be copy-pasted into wherever they were needed, or imported via services like jsdelivr directly through github.
As some of these files grew in size and scope, this approach started to be less and less convenient, and eventually it became clear that to continue to use and maintain these bigger libraries, it would be better to just migrate them into their own repository.
In principle, it would have been good enough to simply copy the files into a new repo, check them in with an "Initial Commit 🎉" and refer to the collection repository where their original development history resides.
But that would be boring, and I like to make things look neat, so I decided I wanted to migrate the commit history of these individual files into their new repositories.
How did I do it?
The first file on my migration list is skooma.js
, a JS port of a Lua-library of the same name that I had gotten so used to that I wanted to use it in my front-end code as well.
This file is a good case-study because it started out as render.js
, but got renamed to skooma.js
after only a few commits. This isn't too much of a problem, it just means that I also had to keep all commits including the original render.js
file, as well as skooma.js
, and its corresponding skooma.md
file explaining what it does.
As a starting point, it was clear that the best tool for the job would be git rebase
in some way; maybe there are other git commands to get this done with an even higher degree of automation, but for a one-off task all I needed was to automate ca. 90% of the work so I wouldn't have to cherry-pick commit by commit.
Starting by just running git rebase --root --interactive
and having a look at the output and the available commands (I have used interactive rebase many times, but seeing the output of the to-do file before me just helped thinking about how to best handle it), I wasn't really sure if it'd be possible to really automate the entire process, so I went for the simpler and only partly automated option:
- Start an interactive rebase
- Run a script over the to-do file
- Run the rebase and intervene manually where needed
The script
This is the part where the magic happens. When you call git rebase --interactive
, you get an editor window listing everything the rebase will do, a "program" in the classic sense, which you can edit to make git do different things. The syntax looks something like this:
pick 2c5a6d3 Initial commit 🎊
pick 1e6c0de Add template module to more easily write templates
pick bbbb234 Add Better HTML element class for custom elements
pick c8fd5b1 Refactor scripts into ES6 modules
pick 79af2b7 Add skooma-like functional DOM rendering helper
pick da2f082 Improve HTML render helper
pick 24ee30f Add special case for templates to render script
pick 7a61b96 Rename render to skooma.js
pick 7e72b33 Remove uppercase node special case from skooma.js
pick f97ae06 Remove setup function from template.js
...
The first word is the command of what will be done. pick
means it just uses the commit as-is. A complete list of commands can be found in a commit that git adds at the bottom of the to-do file.
The second word is the commit hash, which is needed because you can move lines around in this file to re-order commits (this may of course cause conflicts that will require manual intervention, but git is smart enough to pause the rebase when that happens, give you a rough description of what's confusing it, and tell you what to do and how to continue after fixing it).
Anything after that is the commit message. This only gets added to make it easier to edit the file interactively, but git itself will ignore these. Nevertheless, it would be nice to preserve these messages after modifying the to-do program to make it easier to check what is ultimately going to happen.
To build my script, I decided to use Lua, for no other reason than it being the language I use for most scripting I do at home and I didn't want to bother with something exotic. I could probably have done the same with a lot less code in Ruby, but since I mostly use that at work for bigger projects, I'm used to using only the kinds of features that Lua also has, and almost never have any contact with the more Perl-like features that make Ruby good for ad-hoc scripts; so ultimately, so the advantage isn't all that big in the end.
Before I break this up and explain what everything does, here is the finished script:
#!/usr/bin/env luajit
local function files(commit)
local handle = io.popen("git diff-tree --no-commit-id --name-only " .. commit)
return handle:lines()
end
local wanted = { "render.js", "skooma.js", "skooma.md" }
for i, file in ipairs(wanted) do
wanted[file] = i
end
for line in io.stdin:lines() do
local commit, message = line:match("pick ([0-9a-f]+) (.*)$")
if commit then
local changed = {}
for file in files(commit) do
table.insert(changed, file)
changed[file] = #changed
end
local want for i, file in ipairs(changed) do
if wanted[file] then
want = true
end
end
if want then
print(string.format("pick %s %s", commit, message))
for i, file in ipairs(changed) do
if not wanted[file] then
print(string.format([[exec if [ -f "%s" ]; then git rm %s && git commit --amend --no-edit; fi]], file, file))
end
end
else
print(string.format("drop %s %s", commit, message))
end
end
end
Ignoring the files
function for now, I start by defining a list of files that I want to keep. The for loop is only there to make the table function as a set as well as a list, which is a neat feature of Lua that I won't get into detail here.
The script loops over all of its input lines (so I can just pipe the whole file through it from vim), and parses out a commit hash and the following commit message, discarding the "pick"
at the start of the line.
When the line matches (which excludes empty lines and comments), it first collects a list of changed files, using the helper function from earlier. This is done using the git diff-tree
command (admittedly, I just googled this after spending a minute or two trying to find out how to do this from the git manpages). When given the --no-commit-id
and --name-only
flags, as well as a commit hash, it essentially just lists all the files that were changed, added, deleted, etc. in that specific commit. Exactly what I needed.
After collecting the list, I loop over it again and look for each file name in the list of wanted files. If it appears, a flag is set to true, otherwise I can just drop
this commit. This could have been merged into the first loop, but I'm not gonna win any prizes for making a script that's gonna run 10 or 20 times in total run a few milliseconds faster.
When the commit modifies one of the files I want, I add a new pick
command for it to keep the commit.
At this point, there's a bit of a problem: Most commits creating new files will usually not touch any of my relevant files. Skooma is its own thing, and I usually try to keep the git history clean. But I don't bother splitting "housekeeping" commits by file, so things like running a linter over the whole project and fixing all the warnings. This will create commits that 1. modify files I want to keep and 2. modify unwanted files which, at this point, aren't part of the repository anymore, as the commits creating them have been dropped.
This is the part where git will pause the rebase, show a description of what's wrong, and ask the user to fix it. Luckily for me, and anyone who might want to do something similar, this almost always follows the same pattern: The commit says to modify a file, but the file is no longer known to git. The fix:
- Call
git status
to see what files are no longer there - Call
git rm
on all of the files, or justgit rm -r <dir>
if they're all in a subdirectory - Call
git rebase --continue
to tell git all is fine now
This leaves me with only one possible problem: If any file was, hypothetically, created in a commit that modifies one of my wanted files, then git would have no problem with that, and I'd end up with an extra file in my repo. So I added a little loop that, after each commit gets picked, loops over all the extra files, and adds an extra command for each of them which deletes them if it still exists. Since git updates the working directory as it applies commits, I only need to check for the actual file, without any git magic to see if it exists in the last applied commit. I'm fairly sure this wasn't the case in my repo, but I just wanted to add that check while I was at it to make the script a bit more robust.
Here's what the resulting to-do file looks like, after feeding it through the script:
drop 2c5a6d3 Initial commit 🎊
drop 1e6c0de Add template module to more easily write templates
drop bbbb234 Add Better HTML element class for custom elements
drop c8fd5b1 Refactor scripts into ES6 modules
pick 79af2b7 Add skooma-like functional DOM rendering helper
pick da2f082 Improve HTML render helper
pick 24ee30f Add special case for templates to render script
pick 7a61b96 Rename render to skooma.js
pick 7e72b33 Remove uppercase node special case from skooma.js
drop f97ae06 Remove setup function from template.js
drop d703ae1 Refactor BetterHTMLElement with more meta-magic ✨
pick 5f27242 Fix checking for template objects in skooma.js
pick c16eb29 Extend skooma to support SVG as well as HTML
pick 6f85103 Fix skooma.js syntax for strict mode
pick 60af077 Make hyphenation in consistent with browser APIs
exec if [ -f "BetterHTMLElement.js" ]; then git rm BetterHTMLElement.js && git commit --amend --no-edit; fi
drop 137f586 Add option for customized built-in elements
pick fb6b86b Fix undefined variable in skooma.js
drop 63a85b7 Fix undeclared variable
pick 0448cd2 Update skooma.js to handle numbers
drop be31d1f Add mutation observer to BetterHTMLElement
...