Just: How I Organize Large Rust Programs programming

One of the things that I personally struggled with when learning Rust was how to organize large programs with multiple modules.

In this post, I'll explain how I organize the codebase of just, a command runner that I wrote.

just was the first large program I wrote in Rust, and its organization has gone through many iterations, as I discovered what worked for me and what didn't.

There are some things that could use improvement, and many of the choices I made are somewhat strange, so definitely don't consider the whole project a normative example of how to write rust.

Overview

Users mostly interact with just by running the main binary from the command line.

However, the crate actually consists of an executable target, in src/main.rs, and a library in src/lib.rs. The main function in main.rs is a thin wrapper that calls the run function in src/run.rs.

The reason that just is split into a executable target and a library target is because there is a fuzz tester in fuzz, and a regression testing framework at janus, and both of these use testing functions exposed by the library.

Submodule Organization

I prefer to keep my module tree flat, so you'll notice that all my source files are directly under src. I find that this makes it easy to remember what's where, since I don't need to remember what subdirectory each source file is in.

I use a fuzzy file searcher, fzf, to switch between files, so having a ton of files in my src directory doesn't bother me. If I used a tree-based file viewer in my editor, I might prefer to group modules into directories by topic.

Common Use Statements

I prefer to group all my use statements together in a single file called src/common.rs.

Then, at the top of every other file, I include them all with use crate::common::*;.

I think this is somewhat uncommon, and most people prefer to put use statements at the top of every file, with just those things used in that particular file.

Both approaches are totally reasonable. I find that grouping use statements into a single file saves a lot of duplication at the top of every file, and makes it easy to start a new file, since you can just write use crate::common::*; and have everything you need in scope.

This does require that I pick unique names for everything that I want to put in common.rs, but I haven't found that to be particularly burdensome.

Submodule Names and Contents

Most modules contain a single definition, either a function, trait, struct, or enum, that is used in the rest of the codebase. These modules are all named after the public definition, and that definition is exported in common.rs, so use crate::common::*; will bring that definition into scope, without needing to qualify it with the module name.

As an example, just's lexer is called Lexer, and is in src/lexer.rs.

In common.rs, it is exported with pub(crate) use lexer::Lexer;.

Since modules are always named after their sole export, it's pretty easy to figure out where something comes just from the name. Since Lexer is a type from just, and not a dependency, that means that it's probably defined in lexer.rs.

A few modules, like src/keyword.rs, contain more than one thing. For modules like that, common.rs just exports the module itself, with pub(crate) use crate::keyword;, and the module name is used when referring to the definitions inside keyword, like keyword::EXPORT.

If a name is from a dependency, then you can see where it comes from in common.rs.

Error Handling

For testing purposes, I like to use error enums, instead of Box<dyn Error, or equivalent. I find that this makes it easy to write tests that look for specific errors. Also, just has detailed error messages, and this lets me separate error message formatting from error value generation.

There are two main kinds of errors, CompilationError, in src/compilation_error.rs, and RuntimeError, in runtime_error.rs.

As you can guess, CompilationError is for problems related to compilation, e.g. lexing, parsing, and analyzing a justfile, and RuntimeError is for problems that occur when running a justfile, e.g. I/O errors and command execution errors.

I currently don't use any of the many error handling helper crates, but in other projects I use snafu, and if I were to rewrite just, I would definitely use snafu.

Clippy

I use clippy, the animate paperclip/rust linter, to automatically check the codebase for issues.

Clippy has many lints that are either pedantic, or which restrict things that are totally reasonable in many contexts. I like a lot of these, but I didn't want to go through all of them and decide which lints to enable, so I turned on all the lints, even the annoying ones, and then just disable lints that I don't like as I encounter them.

You can see this at the top of src/lib.rs.

How does it work?

just does a lot of stuff! This makes it hard to give a concise overview of how everything works, but I'll do my best!

Run

The run function is pretty short, so definitely check it out. It's in src/run.rs. It does some setup, like initializing Windows terminal color support and logging, then parses the command line arguments.

Configuration

Just calls parsed command line arguments a Config. The command line arguments are parsed with the venerable clap, and then stored in a Config struct, which is passed around the rest of the program.

Everything related to setting up the clap parser, and parsing the command line arguments is in src/config.rs.

Subcommand Running

just has a few distinct modes it can run in, e.g. actually running a recipe in a justfile, listing the recipes in a justfile, or evaluating the variables in a justfile. These are called subcommands, and you can see the different subcommands in the Subcommand enum in src/subcommand.rs.

Once a config is parsed, the function run_subcommand in config.rs handles executing the correct subcommand.

For the rest of this post, I'll cover Subcommand::Run, which is the subcommand which is responsible for actually running a justfile.

Compilation

The justfile source is read and the compiler is invoked in the run_subcommand function in config.rs. The Compiler is defined in src/compiler.rs, and has a single short method that calls the lexer, the parser, and the analyzer.

There's no particular reason for having a Compiler struct, since it doesn't have any fields, so it's really just for organization. I would be totally fine with having a module src/compile.rs, and just exporting a single compile function from that module.

Lexing

The first step of compilation is to split the source text into tokens, which is done by the Lexer in src/lexer.rs. The lexer looks a lot like a recursive descent parser. It has a bunch of different methods, and those methods call each other to produce the different tokens.

The entry-point to the lexer is Lexer::lex.

Lexer is relatively well-commented, so please take a look if you're interested!

Tokens

The lexer produces a Vec of Tokens. The Token type is in src/token.rs. Each Token contains a TokenKind, defined in src/token_kind.rs. A Token contains a reference to the soruce program, as well as information about the offset, length, line, and column of the token. A TokenKind tells you what kind of token it actually is.

Parsing

The Tokens produced by the Lexer are passed to a Parser, defined in src/parser.rs, and the main entry point is Parser::parse.

The parser is a recursive descent parser that walks over the tokens, figuring out what kind of construct it's parsing as it goes.

Modules

The output of the parser is a Module, defined in src/module.rs. A Module represents a successful parse, but has not been fully validated. Just does a lot of static analysis, like resolving names, inter-recipe dependencies, and inter-variable dependencies, so not every Module is valid.

You can think of a Module as being like an AST, that still hasn't been statically analyzed for correctness. Inside a Module are Items (src/item.rs), which contain the different source constructs, like Alias, Assignment, UnresolvedRecipe, and Set.

Analysis

The next phase of compilation is analysis, performed by the Analyzer, defined in src/analyzer.rs.

The Analyzer makes sure that all references to recipes and variables can be resolved, and that there are no circular dependencies.

Justfile

The output of the Analyzer is a Justfile, defined in src/justfile.rs.

A Justfile represents a parsed and analyzed justfile. It contains all the recipes, variables, and expressions, all resolved and ready to run. It is the totus porcus, as it were.

Running

A justfile is run with Justfile::run, which takes a Config, a Search with information about where the justfile is and where the working directory is, variable overrides passed on the command line, and a list of arguments.

The arguments are parsed into recipes and arguments to those recipes, and finally those recipes are run with Justfile::run_recipe, which actually executes each recipe, and all dependencies.

Testing

just takes files, parses commands out of those files, and then runs them. I am acutely aware of how this might go wrong, and would feel real bad if just somehow got confused and ran a command that nuked someone's hard drive.

Because of this, I go pretty crazy with testing. There are four kinds of tests: unit tests, integration tests, fuzz testing, and ecosystem-wide regression testing.

Unit Testing

Unit tests are spread around the codebase, in submodules named tests. Each tests submodule contains tests for whatever's in the containing module.

I'm not strict about covering everything with unit tests, but I do cover everything with integration tests. If something doesn't seem to be tested in a unit test, it's probably tested in an integration test.

Integration Testing

Integration tests are in the tests subdirectory, roughly organized by topic. The vast majority are in tests/integration.rs, which test a full run of the just binary, supplying standard input, args, and a justfile, and checking that standard output, standard error, and the exit code are correct.

Fuzz Testing

Fuzz testing was contributed to just by @RadicalZephyr, and is located in the fuzz directory. It generates random strings and feeds them to the parser. (NOT THE RUNNER, DEFINITELY NOT THE RUNNER.) If the parser succeeds or returns an error, that's a successful run. If the fuzzer is able to trigger a panic, then it's found a bug that needs to be fixed.

Regression Testing

Since a lot of people have written a lot of justfiles, I want to make sure I don't break them when I update just.

To do this, I wrote a tool called janus. Janus is inspired by Rust's crater.

Janus downloads all the justfiles that it can find on GitHub, and then compares how two versions of just compiles those justfiles. The two versions of just are usually the latest release, and a new version with a big, scary change.

Janus compiles all justfiles with both versions, and then compares the result. Ideally, every valid justfiles parses into the same Justfile with both versions, and every justfile with an error produces the same error.

Wrapping Up

That's everything I can think of! Ultimately, much of how you organize your Rust programs comes down to personal preference, so just start mashing the keyboard, see what works, and iterate on whatever doesn't.

glhf!


Lexiclean programming

I just published a simple crate that performs lexical path cleaning: lexiclean.

Lexical path cleaning simplifies paths by removing ., .., and double separators: //, without querying the filesystem. It is inspired by Go's Clean function, and differs from the Go version by not removing . if that is the only path component left.

I implemented this for a command line utility I'm working on, but split it off so others could use it.

There are a few reasons I prefer lexical path cleaning to fs::canonicalize:

There are some reasons you might prefer fs::canonicalize:

Are there any other reasons to prefer one over the other? I'd love to hear them!

It is very lightly tested! If you intend to use it, I encourage you to submit additional tests containing paths you might encounter, if you think the existing tests don't cover them. In particular, I haven't thought about all the exotic prefixes that Windows paths might be adorned with, so there might be bugs there.

I don't expect to modify the crate or add features to it beyond what I need for my own purposes, so if there are additional features you want, please consider opening a PR! Of course, if you find a bug, I will happily fix it.


Just Hack programming

Just is a general-purpose command runner written in Rust with a make-like syntax.

If you're interested in hacking on just, I'd love to help!


cyberpunk-rust


no-there-there


This is the 200th time I have Googled "CSS Box Model" and I have become exceedling efficient at it.


Intermodal sharing · programming

TL;DR

Intermodal is a new command-line BitTorrent metainfo1 utility for Linux, Windows, and macOS. The binary is called imdl.

It can create, display, and verify .torrent files, as well as generate magnet links.

demonstration animation

It has lots of features and niceties, is easy to install and run, and is hopefully just the beginning of an ambitious project to make decentralized content sharing better.

Features include:

You can install the latest version of imdl to ~/bin with:

curl --proto '=https' --tlsv1.2 -sSf https://imdl.io/install.sh | bash

Development is hosted on GitHub, where you can find the code, the issue tracker, and more installation options.

Give it a try and let me know what you think!

I'm eager to hear what works, what doesn't, and what features you'd like to see added. I'll be working on novel functionality—more on that below—and I'd love to hear your critical feedback and ideas.

You can get in touch by open an issue, joining the discord server, or sending me an email.

Happy sharing!


Popcorn Time Should Become a Browser programming

I am not a lawyer. This is not legal advice.

Popcorn Time-style video streaming apps seem to be vulnerable to legal action by rightsholders.

For example, the popcorntime.sh domain was recently suspended, and the operator of a site which merely provided information about how to obtain and use Popcorn Time was sentenced to prison.

Although given that Popcorn Time's servers do not themselves host infringing content this may seem a bit unfair, it is simply the reality of the world we live in.

It is interesting to note, however, that although web browsers can be used in exactly the same way as Popcorn Time, namely searching for and viewing copyrighted movies, the developers of web browsers have thus far not faced successful legal challenges.


The Stack programming

Computering is a party. The stack is best visualized as a bunch of Jenga blocks on the floor, and the heap as a bunch of balloons floating around bumping into each other on the ceiling. The fact that the stack usually grows downwards in memory is a travesty.


Unix Utilities in Rust for Great Success programming

I've often been asked for suggestions for an appropriate first project in Rust, and I think that writing a version of a unix utility is a great choice, for a bunch of reasons!


New Rustacean Resources programming

I only program in PL/I because I'm BASED.

Delinearization programming

Programs first crawled from the murky oceans as simple lists of instructions that executed in sequence. From these humble beginnings they have since evolved an astonishing number of ways of delinearizing.

In fact, most programming paradigms simply amount to different ways to transform a linear source file into a program with nonlinear behavior.

Some examples:


Parse structure from the languageless void.