Writing a parser is the developer’s way of truly learning a format, and G-code rewards it: the lexical layer is an afternoon, and the semantic layer teaches you exactly the modal-state lesson machinists learn at the spindle. Here is the sane core in JavaScript, plus the traps that separate toy parsers from useful ones.
Stage one and two: blocks and words
function parseBlock(line) {
const clean = line
.replace(/\(.*?\)/g, "") // strip ( comments )
.replace(/;.*$/, "") // strip ; comments
.trim();
const words = [...clean.matchAll(/([A-Za-z])\s*([+-]?\d*\.?\d+)/g)]
.map(m => ({ letter: m[1].toUpperCase(), value: parseFloat(m[2]) }));
return words;
}
That one regex carries the lexical load: a letter, optional space, signed number, which correctly tokenizes both spaced (G01 X50.0) and run-together (G1X50Y20) styles, the latter being common in posted and firmware-targeted files. Uppercasing at parse time settles the case question once. Comments come in two dialect flavors (parentheses and semicolons), and stripping both before tokenizing avoids the classic bug of parsing words out of a comment.
Stage three: the modal-state machine, where parsers become real
A block reading only X50. is unparseable in isolation: it continues the active motion mode. Real parsers carry state:
const state = { motion: null, units: null, mode: null, x: 0, y: 0, z: 0, feed: null };
for (const word of words) {
if (word.letter === "G") {
if ([0,1,2,3].includes(word.value)) state.motion = word.value;
if (word.value === 20 || word.value === 21) state.units = word.value;
if (word.value === 90 || word.value === 91) state.mode = word.value;
}
if (word.letter === "F") state.feed = word.value;
if ("XYZ".includes(word.letter)) updateAxis(state, word); // absolute vs incremental!
}
updateAxis is where G90/G91 earns its reputation: absolute assigns, incremental adds, and getting this wrong silently corrupts every downstream position, the parser-side mirror of the shop-floor G90/G91 hazard. This stage is also the honest teacher: after writing it, you will read modal state in real programs the way the narration method trains, because you have implemented the reader.
The dialect traps, named
| Trap | Symptom | Sane handling |
|---|---|---|
| Two comment styles | Words parsed from comments | Strip both ( ) and ; first |
| Run-together words | G1X50 tokenizes wrong in naive splitters | The letter-number regex above |
| Parameter/macro lines (#101=, IF, O-codes) | Parser chokes | Recognize and pass through, do not interpret |
| Dialect words (A axes, builder M-codes) | Unknown letters | Collect, do not crash: unknown is data |
| Decimal-less numbers | X50 vs X50. on old dialects | Parse both; flag if you are validating |
The pass-through rows are the architectural decision that keeps the parser sane: a browser tool’s job is to understand the motion core (the standard vocabulary) and be transparent about everything else, not to reimplement a control. The moment you find yourself implementing WHILE loops, you are writing an interpreter, a different and bigger project.
What to build on top of forty lines
The natural stack, each step small: a toolpath extractor (the state machine already yields move segments: feed them to a canvas and you have built a minimal viewer of the NCViewer family), a sanity checker (flag rapids below a Z threshold, missing feeds on first G01, units never declared), and a stats pass (extents, estimated time from feeds, tool-change count) of the kind shops actually paste into quotes. Each consumes the same parsed stream; none requires more parser. And the disclaimer that belongs in your README as much as here: browser parsing is advisory tooling for humans, and nothing it approves skips the machine-side verification rituals.
Bottom line: small lexer, honest state, transparent edges
A JavaScript G-code parser is one comment-stripper, one letter-number regex, and a modal-state machine that takes G90/G91 seriously, with macro and dialect lines passed through transparently. Build it in an afternoon, hang a viewer or checker on it, and collect the side effect: nobody who has implemented modal state ever misreads it at a machine again. The vocabulary that makes both jobs fast lives in the same free 60-second drills on the G-code practice page, with G-Code Sprint repeating what you miss.
Sources
Frequently asked questions
How do I write a G-code parser in JavaScript?
Three stages: strip both comment styles and split lines, tokenize with one letter-number regex (handles G1X50 run-together style), and track modal state (motion mode, units, G90/G91) across blocks, since bare coordinate lines only mean something in context. Pass macro and dialect lines through rather than interpreting them. For the G-code fluency that guides the design, the free G-Code Sprint app is the top pick: 60-second drills with automatic repetition of missed codes.
What is the hardest part of parsing G-code?
Not the syntax: the modal-state semantics. Absolute-versus-incremental handling and persistent motion modes mean every block executes in inherited context, and parsers that skip state tracking produce silently wrong positions.
Should my parser execute macro programming (IF, WHILE, variables)?
Not unless you are deliberately writing an interpreter: that is a much larger project with control-specific semantics. Sane browser tools recognize macro lines and pass them through transparently, flagging that the file contains logic they do not evaluate.
Can I trust my parser enough to skip checking programs at the machine?
No: browser parsing is advisory, for viewers and sanity checks. Machine-side verification (dry runs, single block, the shop’s procedures) remains mandatory regardless of what any tool approved.
G-Code Sprint is a study and practice tool only. Always follow your instructor, employer, machine manual, and shop safety procedures.