Skip to content

Rust

Status: 🌱

Motivation

Study Rust to deepen systems thinking, memory reasoning, CLI design, and networking fundamentals in contexts where these concerns are central.

Starter Points

  • Practice ownership patterns on real I/O-heavy examples.
  • Model domain invariants with type-driven design.
  • Benchmark critical paths and document tradeoffs.
  • Convert reading fluency into writing fluency through short, complete tools.

Recent Learnings

Ownership

Every value has a single owner. When the owner goes out of scope, Rust runs drop automatically and releases resources deterministically.

Why this matters: - Predictable cleanup without garbage collection. - Fewer hidden lifetime/resource bugs in long-running services and CLI tools.

Borrowing

References (&T or &mut T) let code use values without taking ownership.

Why this matters: - Share read access safely without cloning by default. - Mutability is explicit and constrained, which improves API clarity.

Lifetimes

Lifetimes encode how long references are valid. The compiler rejects any reference that could outlive its source value.

Why this matters: - Prevents dangling references at compile time. - Makes data-flow constraints explicit in function signatures.

Traits

Traits define behavior contracts and enable composition over inheritance.

Why this matters: - Build reusable abstractions with explicit capabilities. - Keep designs modular by composing behavior through trait bounds and implementations.

Bits, Bytes and Meaning

A byte is eight bits. Eight binary positions yield 2^8 = 256 distinct bit patterns, from 00000000 to 11111111.

That pattern is representation. Meaning is interpretation. The same stored byte can be:

  • 255 as u8
  • -1 as i8
  • 'A' as text when interpreted through ASCII/UTF-8 rules

ASCII is a 7-bit character set (0..127). "Extended ASCII" is an umbrella label for multiple incompatible 8-bit code pages (0..255), not one universal standard.

UTF-8 is a variable-length encoding:

  • ASCII characters use one byte (0xxxxxxx)
  • Other code points use two to four bytes
  • Backward compatible with ASCII at the byte level

In Rust, String::len() returns bytes, not scalar values or grapheme clusters:

fn main() {
    let ascii = "A";        // U+0041
    let latin = "é";        // U+00E9
    let cjk = "界";         // U+754C

    assert_eq!(ascii.len(), 1);
    assert_eq!(latin.len(), 2);
    assert_eq!(cjk.len(), 3);
}

This is consistent with Rust's model: strings are UTF-8 byte buffers with validity guarantees.

Signed vs Unsigned Integers

u8, i8, u32, and i32 differ by interpretation, not storage size within each width pair:

  • u8 and i8: 8 bits each
  • u32 and i32: 32 bits each

Range examples:

  • u8: 0..=255
  • i8: -128..=127
  • u32: 0..=4_294_967_295
  • i32: -2_147_483_648..=2_147_483_647

Two's complement defines signed interpretation in modern hardware:

  • Highest bit is the sign contribution (-2^(n-1))
  • Remaining bits contribute positive powers of two
  • Negation is bitwise invert plus one

For i8, -1 is 11111111:

  • Invert 00000001 -> 11111110
  • Add 1 -> 11111111

So the same physical byte 0xFF maps to:

  • 255 as u8
  • -1 as i8

Endianness

"Most significant" and "least significant" refer to positional weight, not memory location.

Decimal analogy for 4827:

  • 4 means 4 * 10^3 (most significant digit)
  • 7 means 7 * 10^0 (least significant digit)

Binary is identical in principle: leftward bits carry higher powers of two.

For 0x12345678 in memory:

  • Big-endian: 12 34 56 78
  • Little-endian: 78 56 34 12

x86 is little-endian by architecture design and compatibility lineage. Network byte order is big-endian by protocol convention (historically standardized for interoperability), so protocol documents read multi-byte fields in a single canonical order.

Little-endian simplifies some arithmetic and microarchitectural paths because low-order bytes are at the lowest addresses:

  • Incrementing counters often touches low bytes first
  • Partial-width operations align naturally with low-addressed bytes

Network Protocols and Byte Order

DNS starts with a fixed 12-byte header:

  • ID (16)
  • Flags (16)
  • QDCOUNT (16)
  • ANCOUNT (16)
  • NSCOUNT (16)
  • ARCOUNT (16)

All are transmitted in network byte order (big-endian). That rule ensures two hosts with opposite native endianness parse identical byte streams.

In Rust, conversion should be explicit at boundaries:

#[derive(Debug)]
struct DnsHeader {
    id: u16,
    flags: u16,
    qdcount: u16,
    ancount: u16,
    nscount: u16,
    arcount: u16,
}

fn parse_dns_header(buf: [u8; 12]) -> DnsHeader {
    DnsHeader {
        id: u16::from_be_bytes([buf[0], buf[1]]),
        flags: u16::from_be_bytes([buf[2], buf[3]]),
        qdcount: u16::from_be_bytes([buf[4], buf[5]]),
        ancount: u16::from_be_bytes([buf[6], buf[7]]),
        nscount: u16::from_be_bytes([buf[8], buf[9]]),
        arcount: u16::from_be_bytes([buf[10], buf[11]]),
    }
}

fn serialize_id(id: u16) -> [u8; 2] {
    id.to_be_bytes()
}

The core rule: internal representation may be native-endian, but wire/storage formats must be explicit.

Personal Insight

The main shift was from language-level intuition to memory-level reasoning.

Types stopped being syntax and became interpretation contracts for raw bytes. That reframing made several Rust behaviors feel coherent instead of surprising:

  • byte-oriented string length
  • strict numeric conversions
  • explicit boundary handling for I/O and protocols

Learning the hard part was gratifying because it removed "magic." Once bytes, layout, and interpretation were explicit, Rust felt less like a new language to memorize and more like a precise system to reason about.