Record of Learning Rust | Part 2 of 2


Smart Pointers

Smart pointers are data structures that not only act like a pointer but also have additional metadata and capabilities.

In Rust, which uses the concept of ownership and borrowing, an additional difference between references and smart pointers is that references are pointers that only borrow data; in contrast, in many cases, smart pointers own the data they point to.

The characteristic that distinguishes a smart pointer from an ordinary struct is that smart pointers implement the Deref and Drop traits:

  • Deref traits allows an instance of the smart pointer struct to behave like a references or smart pointers.
  • Drop traits allows you to customize the code that is run when an instance of the smart pointer goes out of scope.

Using Box<T> to Point to Data on the Heap

Boxes allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.

Boxes don’t have performance overhead, other than storing their data on the heap instead of on the stack. But they don’t have many extra capabilities either and often used in these situations:

  • When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
  • When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
  • When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type

Use a Box<T> to store data on the heap

use a box to store an i32:

1
2
let b = Box::new(5);
println!("b = {}", b);

Enabling recursive types with boxes

At compile time, Rust needs to know how much space a type takes up. One type whose size can’t be known at compile time is a recursive type, where a value can have as part of itself another value of the same type.

Use Box<T> to get a recursive type with a known size

1
2
3
4
5
6
7
8
9
10
enum List {
Cons(i32, Box<List>),
Nil,
}

use crate::List::{Cons, Nil};

fn main() {
let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}

A finite Cons list

The Box<T> type is a smart pointer because it implements the Deref trait, which allows Box<T> values to be treated like references. When a Box<T> value goes out of scope, the heap data that the box is pointing to is cleaned up as well because of the Drop trait implementation.

Treating smart pointers like regular references with Deref trait

Following the Pointer to the value with the dereference operator

1
2
3
4
5
6
7
fn main() {
let x = 5;
let y = &x;

assert_eq!(5, x);
assert_eq!(5, *y);
}

Using Box<T> like a reference

1
2
3
4
5
6
7
fn main() {
let x = 5;
let y = Box::new(x);

assert_eq!(5, x);
assert_eq!(5, *y);
}

Define our own smart pointer

In order to dereference and get inner data, we should implement deref method in Deref trait.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
use std::ops::Deref;

struct MyBox<T>(T); // tuple struct

impl<T> MyBox<T> {
fn new(x: T) -> MyBox<T> {
MyBox(x)
}
}

impl<T> Deref for MyBox<T> {
type Target = T;

fn deref(&self) -> &T {
&self.0
}
}
fn main() {
let x = MyBox::new(4);
assert_eq!(*x,4);
}

*a is equal to what Rust run below code behind the scenes:

1
*(y.deref())

The reason the deref method returns a reference to a value, and that the plain dereference outside the parentheses in *(y.deref()) is still necessary, is the ownership system. If the deref method returned the value directly instead of a reference to the value, the value would be moved out of self. We don’t want to take ownership of the inner value inside MyBox<T> in this case or in most cases where we use the dereference operator.

Implicit Deref coercions with functions and methods

Deref coercion is a convenience that Rust performs on arguments to functions and methods. Deref coercion works only on types that implement the Deref trait. Deref coercion converts such a type into a reference to another type.

When the Deref trait is defined for the types involved, Rust will analyze the types and use Deref::deref as many times as necessary to get a reference to match the parameter’s type.

How Deref coercion interacts with mutability

Rust does deref coercion when it finds types and trait implementations in three cases:

  • From &T to &U when T: Deref<Target=U>
  • From &mut T to &mut U when T: DerefMut<Target=U>
  • From &mut T to &U when T: Deref<Target=U>Rust does deref coercion when it finds types and trait implementations in three cases:

Running code on cleanup with the Drop trait

An instance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
struct MyBox(i32);

impl MyBox {
fn new(x: i32) -> MyBox {
println!("{} created.",x);
MyBox(x)
}
}

impl Deref for MyBox{
type Target = i32;

fn deref(&self) -> &i32 {
&self.0
}
}

impl Drop for MyBox {
fn drop(&mut self) {
println!("{:} dropped.",self.0);
}
}
1
2
3
4
5
6
fn main() {
{
let a = MyBox::new(4);
}
let b = MyBox::new(5);
}

and output is:

Note: Drop is included in the prelude.

Drop a value early with std::mem::drop

We’re not allowed to explicitly call drop method which is destructor of the instance. Because it will lead to a double free error.

Instead,

1
2
3
4
5
6
7
8
fn main() {
let c = CustomSmartPointer {
data: String::from("some data"),
};
println!("CustomSmartPointer created.");
drop(c);
println!("CustomSmartPointer dropped before the end of main.");
}

Rc<T>, the reference counted smart pointer

Note that Rc<T> is only for use in single-threaded scenarios.

We use the Rc<T> type when we want to allocate some data on the heap for multiple parts of our program to read and we can’t determine at compile time which part will finish using the data last.

Use Rc<T> to share data

For instance:

1
2
3
4
5
6
7
8
9
10
11
12
13
enum List {
Cons(i32, Rc<List>),
Nil,
}

use crate::List::{Cons, Nil};
use std::rc::Rc;

fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
let b = Cons(3, Rc::clone(&a));
let c = Cons(4, Rc::clone(&a));
}

I think the functionality of Rc<T> is revealing the lifetime of variables. In the version of Box<T> implementation, we must change what Box hold to reference and add lifetime annotation to guarantee their value are valid in certain field.

By using Rc::clone for reference counting, we can visually distinguish between the deep-copy kinds of clones and the kinds of clones that increase the reference count.

Clone an Rc<T> increases the reference count

strong_count function

1
2
3
4
5
6
7
8
9
10
11
fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
println!("count after creating a = {}", Rc::strong_count(&a));
let b = Cons(3, Rc::clone(&a));
println!("count after creating b = {}", Rc::strong_count(&a));
{
let c = Cons(4, Rc::clone(&a));
println!("count after creating c = {}", Rc::strong_count(&a));
}
println!("count after c goes out of scope = {}", Rc::strong_count(&a));
}

If Rc<T> allowed you to have multiple mutable references too, you might violate one of the borrowing rules discussed in Chapter 4: multiple mutable borrows to the same place can cause data races and inconsistencies.

RefCell<T> and interior mutability pattern

Interior mutability is a design pattern in Rust that allows you to mutate data even when there are immutable references to that data; normally, this action is disallowed by the borrowing rules.

Enforcing borrowing rules at runtime with RefCell<T>

a recap of reasons to choose …:

  • Rc<T> enables multiple owners of the same data; Box<T> and RefCell<T>have single owners.
  • Box<T> allows immutable or mutable borrows checked at compile time; Rc<T> allows only immutable borrows checked at compile time; RefCell<T>allows immutable or mutable borrows checked at runtime.
  • Because RefCell<T> allows mutable borrows checked at runtime, you can mutate the value inside the RefCell<T>even when the RefCell<T> is immutable.

Interior mutability: a mutable borrow to an immutable value

a value can mutate itself in its methods but appear immutable to other code.

A use case for interior mutability: Mock objects

test double: a general programming concept for a type used in place of another type during testing.

mock objects: specific types of test doubles that record what happens during a test so you can assert that the correct actions took place.

Example

Imagine we have a library used to keep track of a user’s quota for the number of API calls they’re allowed to make.

One of the function we should implement is to notify the user through sending message. Thus:

1
2
3
pub trait Messenger {
fn send(&self, msg: &str);
} // method borrows immutable object

Assume that we want to store the message into a queue which is wating be dealt with:

1
2
3
4
5
6
7
8
struct MockMessenger {
sent_messages: Vec<String>,
}
impl Messenger for MockMessenger {
fn send(&self, message: &str) {
self.sent_messages.push(String::from(message));
}
}

and thats the question, we cannot borrow the object as mutable.

Solution:

store Vec<String> into RefCell<T>

1
2
3
struct MockMessenger {
sent_messages: RefCell<Vec<String>>,
}

then we can implement send method like this:

1
2
3
4
5
impl Messenger for MockMessenger {
fn send(&self, message: &str) {
self.sent_messages.borrow_mut().push(String::from(message));
}
}

finnally we achieve to get mutable reference from immuable object.

Keep track of borrows at runtime with RefCell<T>

RefCell<T> have borrow and borrow_mut methods:

  • borrow returns the smart pointer type Ref<T>
  • borrow_mut returns the smart pointer type RefMut<T>
1
2
3
4
5
6
7
8
9
impl Messenger for MockMessenger {
fn send(&self, message: &str) {
let mut one_borrow = self.sent_messages.borrow_mut();
let mut two_borrow = self.sent_messages.borrow_mut();

one_borrow.push(String::from(message));
two_borrow.push(String::from(message));
}
}

the code will compile without any error, but when it runs at test, it will cause a panic with the message “already borrowed: BorrowMutError”.

Having multiple owners of mutable data by combining Rc<T> and RefCell<T>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#[derive(Debug)]
enum List {
Cons(Rc<RefCell<i32>>, Rc<List>),
Nil,
}

use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

fn main() {
let value = Rc::new(RefCell::new(5));

let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));

let b = Cons(Rc::new(RefCell::new(3)), Rc::clone(&a));
let c = Cons(Rc::new(RefCell::new(4)), Rc::clone(&a));

*value.borrow_mut() += 10;

println!("a after = {:?}", a);
println!("b after = {:?}", b);
println!("c after = {:?}", c);
}

The whole thing is shared and each shared owner gets to mutate the contents. The effect of mutating the contents will be seen by all of the shared owners of the outer Rc because the inner data is shared.

Rust_doc

also,

rust - What is the difference between Rc<RefCell> and RefCell<Rc>? - Stack Overflow

The runtime checks of the borrowing rules protect us from data races, and it’s sometimes worth trading a bit of speed for this flexibility in our data structures.

Also in standard library:

  • Cell<T> the value is copied in and out
  • Mutex<T> offers thread-safe interior mutability

Reference cycles can leak memory

Instance

// TODO

https://doc.rust-lang.org/book/ch15-06-reference-cycles.html

Preventing reference cycles: turning an Rc<T> into a Weak<T>

You can also create a weak reference to the value within an Rc<T> instance by calling Rc::downgrade and passing a reference to the Rc<T>.

When you call Rc::downgrade, you get a smart pointer of type Weak<T> and weak_count will be increased by 1.

The difference is the weak_countdoesn’t need to be 0 for the Rc<T> instance to be cleaned up.

Because the value that Weak<T> references might have been dropped, to do anything with the value that a Weak<T> is pointing to, you must make sure the value still exists.

Do this by calling the upgrademethod on a Weak<T> instance, which will return an Option<Rc<T>>.

Create a tree data structure

1
2
3
4
5
6
7
8
9
use std::cell::RefCell;
use std::rc::{Rc, Weak};

#[derive(Debug)]
struct Node {
data: i32,
parent: RefCell<Weak<Node>>,
children: RefCell<Vec<Rc<Node>>>,
}

Fearless concurrency

Using thread to run code simultaneously

Creating a new thread with spawn

The threads will probably take turns, but it depends on how the OS operates them.

The spawn thread only print out 4 because of the shutdown of main thread.

Waiting all threads to finish using join

1
2
3
4
5
let handle = thread::spawn(||{
// ---ignored
})
handle.join().unwrap();
// --- below code will not be executed until thread above finished.

Using move closures with threads

The move closure allows you to use data from one thread in another thread.

problem introduce

1
2
3
4
5
6
7
8
9
10
11
12
use std::thread;

fn main() {
let v = vec![1, 2, 3];

let handle = thread::spawn(|| {
println!("Here's a vector: {:?}", v);
});

drop(v); // oh no!
handle.join().unwrap();
}

When println in substhread using the reference to v, v may has been dropped by main thread, which causes the invalid reference.

However, if we do this:

1
2
3
4
5
6
let handle = thread::spawn(move || {
println!("Here's a vector: {:?}", v);
});

drop(v);
handle.join().unwrap();

the drop function in main thread will be also receive a invalid reference.

Using message passing to transfer data between threads

Do not communicate by sharing memory; instead, share memory by communicating.

One major tool Rust has for accomplishing message-sending concurrency is the channel.

channel consists of :

  • transmitter
  • receiver

create channel

1
2
use std::sync::mpsc;
let (tx,rx) = mpsc::channel();

mpsc is short for multiple producer, single consumer

create thread and send message

1
2
3
4
5
thread::spawn(move || {
let val = String::from("hi");
tx.send(val).unwrap();
// send() returns a `Result<T, E>`
})

the receiveing end of the channel

two methods:

  • recv

    will block the main thread’s execution and wait until a value is sent down the channel

  • try_recv

    will NOT block

    We could write a loop that calls try_recv every so often, handles a message if one is available, and otherwise does other work for a little while until checking again.

Channels and ownership transference

In this code(will not be compiled)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
use std::thread;
use std::sync::mpsc;

fn main() {
let (tx, rx) = mpsc::channel();

thread::spawn(move || {
let val = String::from("hi");
tx.send(val).unwrap();
// ???????????????
println!("val is {}", val);
});

let received = rx.recv().unwrap();
println!("Got: {}", received);
}

The problem is that if there’s modifications and results brought by inconsistent or nonexistent data in other thread, this will cause unexpected result.

Sending multiple values and seeing the receiver waiting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
use std::thread;
use std::sync::mpsc;
use std::time::Duration;

fn main() {
let (tx, rx) = mpsc::channel();

thread::spawn(move || {
let vals = vec![
String::from("hi"),
String::from("from"),
String::from("the"),
String::from("thread"),
];

for val in vals {
tx.send(val).unwrap();
thread::sleep(Duration::from_secs(1));
}
});

for received in rx {
println!("Got: {}", received);
}
}

In the main thread, we’re NOT calling the recv function explicitly anymore: instead, we’re treating rx as an iterator.

For each value received, we’re printing it. When the channel is closed, iteration will end.

Shared-state concurrency

Mutex(mutual exclusion) allows only one thread to access some data at any given time.

To access the data in a mutex, a thread must first signal that it wants access by asking to acquire the mutex’s lock. The lock is a data structure that is part of the mutex that keeps track of who currently has exclusive access to the data. Therefore, the mutex is described as guarding the data it holds via the locking system.

Mutex<T>

Example:

1
2
3
4
5
6
7
8
9
10
11
12
use std::sync::Mutex;

fn main() {
let m = Mutex::new(5);

{
let mut num = m.lock().unwrap();
*num = 6;
}

println!("m = {:?}", m);
}
  • To access the data inside the mutex, we use lock method to acquire the lock.

    This call will block the current thread.

  • The call to lock would fail if another thread holding the lock panicked.

Mutex<T> returns a MutexGuard smart pointer which has implementation of Deref and Drop

Sharing Mutex<T> between multiple threads

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use std::sync::Mutex;
use std::thread;

fn main() {
let counter = Mutex::new(0);
let mut handles = vec![];

for _ in 0..10 {
let handle = thread::spawn(move || {
let mut num = counter.lock().unwrap();

*num += 1;
});
handles.push(handle);
}

for handle in handles {
handle.join().unwrap();
}

println!("Result: {}", *counter.lock().unwrap());
}

This cause compile error. The counter value was moved in the previous iteration of the loop. So Rust is telling us that we can’t move the ownership of lock counter into multiple threads.

Atomic reference counting with Arc<T>

Arc<T> is a type like Rc<T> that is safe to use in concurrent situations.(atomically reference counted)

Thread safety comes with a performance penalty that you only want to pay when you really need to.

Example(TODO)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
use std::sync::{Mutex, Arc};
use std::thread;

fn main() {
let counter = Arc::new(Mutex::new(0));
let mut handles = vec![];

for _ in 0..10 {
let counter = Arc::clone(&counter);
let handle = thread::spawn(move || {
let mut num = counter.lock().unwrap();

*num += 1;
});
handles.push(handle);
}

for handle in handles {
handle.join().unwrap();
}

println!("Result: {}", *counter.lock().unwrap());
}

Similarities between RefCell<T>/Rc<T> and Mutex<T>/Arc<T>

Rust can’t protect you from all kinds of logic errors when you use Mutex<T>. Using Rc<T> came with the risk of creating reference cycles. Similarly, Mutex<T> comes with the risk of creating deadlocks.

Extensible concurrency with Sync and Send traits

Two concurrency concepts are embedded in the language: the std::marker traits

  • Sync
  • Send

Allowing transference of ownership between threads with Send

Send marker traits indicates that ownership of values of the type can be transferred between threads.

Almost every Rust type is Send but there are some exceptions.

  • Rc<T>: cannot be Send because if you clone an Rc<T> value and tried to transfer ownership of the clone to another threadd, both threads might update the reference count at the same time.
  • raw pointers

Allowing access from multiple threads with Sync

Sync marker trait indicates that it’s safe for that type to be referenced from multiple threads.

not Sync:

  • Rc<T>
  • RefCell<T>
  • Cell<T>

is Sync:

  • Mutex<T>
  • Arc<T>

Implementing Send and Sync manually is unsafe

As marker traits, they don’t even have any methods to implement. They’re just useful for enforcing invariants related to concurrency.

Patterns and matching

All the places patterns can be used

match arms

1
2
3
4
5
match VALUE {
PATTERN => EXPRESSION,
PATTERN => EXPRESSION,
PATTERN => EXPRESSION,
}

_ can be used to ignore value not specified.

if let expressions

example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
fn main() {
let favorite_color: Option<&str> = None;
let is_tuesday = false;
let age: Result<u8, _> = "34".parse();

if let Some(color) = favorite_color {
println!("Using your favorite color, {}, as the background", color);
} else if is_tuesday {
println!("Tuesday is green day!");
} else if let Ok(age) = age {
if age > 30 {
println!("Using purple as the background color");
} else {
println!("Using orange as the background color");
}
} else {
println!("Using blue as the background color");
}
}

while let conditional loops

That is similar to while(condition){} in c++

1
2
3
4
5
6
7
8
9
10
11
fn main() {
let mut stack = Vec::new();

stack.push(1);
stack.push(2);
stack.push(3);

while let Some(top) = stack.pop() {
println!("{}", top);
}
}

for loops

In a for loop, the pattern is the value that directly follows the keyword for, so in for x in y the x is the pattern.

Example:

1
2
3
4
5
6
7
fn main() {
let v = vec!['a', 'b', 'c'];

for (index, value) in v.iter().enumerate() {
println!("{} is at index {}", value, index);
}
}

we will get:

a is at index 0

enumerate method is to adapt an iterator to produce a value and that value’s index in the iterator, placed into a tuple.

let statements

in variable assignment:

1
2
let PATTERN = EXPRESSIOIN;
let (x, y, z) = (1, 2, 3);

_ and .. can also be used to ignore values in the tuple.

Function parameters

match tuple as argument:

1
2
3
4
5
6
7
8
fn print_coordinates(&(x, y): &(i32, i32)) {
println!("Current location: ({}, {})", x, y);
}

fn main() {
let point = (3, 5);
print_coordinates(&point);
}

Also, we can also use patterns in closure parameter lists in the similar way.

Refutability: whether a pattern might fail to match

Patterns come in two forms:

  • refutable
  • irrefutable: will match for any possible value passed

let and for loops can only accept irrefutable patterns.

if let and while let accept both refutable and irrefutable patterns.

Pattern syntax

Matching literals

match patterns against literals directly:

1
2
3
4
5
6
7
8
let x = 1;

match x {
1 => println!("one"),
2 => println!("two"),
3 => println!("three"),
_ => println!("anything"),
}

Matching named variables

Named variables are irrefutable patterns that match any value.

match starts a new scope, variables declared as part of a pattern inside the match expression will shadow those with the same name outside the match construct.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
fn main() {
let x = Some(5);
let y = 10;

match x {
Some(50) => println!("Got 50"),
Some(y) => println!("Matched, y = {:?}", y),
_ => println!("Default case, x = {:?}", x),
}

println!("at the end: x = {:?}, y = {:?}", x, y);
}

Multiple patterns

| syntax means or.

Matching ranges of values with ..=

1
2
3
4
match x {
1..=5 => println!("one through five"),
_ => println!("something else"),
}

for example, 1..=5 is equal to 1 | 2 | 3 | 4 | 5

Ranges are only allowed with numeric values or char values

Destructuring to break apart values

We can also use patterns to destructure structs, enums, tuples, and references to use different parts of these values.

Destructuring structs

Example:

1
2
3
4
5
6
7
8
9
10
11
12
struct Point {
x: i32,
y: i32,
}

fn main() {
let p = Point { x: 0, y: 7 };

let Point { x: a, y: b } = p;
assert_eq!(0, a);
assert_eq!(7, b);
}
  • There’s no restriction for sequence of x and y.
  • To destruct structs into the same names of variables as the field in structs:
1
2
3
4
5
6
7
8
9
10
11
struct Point {
x: i32,
y: i32,
}

fn main() {
let p = Point { x: 0, y: 7 };
let Point { x, y } = p;
assert_eq!(0, x);
assert_eq!(7, y);
}
  • destructure with literal values as part of the struct pattern rather than creating variables for all the fields.
1
2
3
4
5
6
7
let p = Point { x: 0, y: 7 };

match p {
Point { x, y: 0 } => println!("On the x axis at {}", x),
Point { x: 0, y } => println!("On the y axis at {}", y),
Point { x, y } => println!("On neither axis: ({}, {})", x, y),
}

Destructuring enum variants

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
enum Message {
Quit,
Move { x: i32, y: i32 }, // struct-like matching
Write(String), // tuple matching
ChangeColor(i32, i32, i32),
}

fn main() {
let msg = Message::ChangeColor(0, 160, 255);

match msg {
Message::Quit => {
println!("The Quit variant has no data to destructure.")
}
Message::Move { x, y } => {
println!(
"Move in the x direction {} and in the y direction {}",
x, y
);
}
Message::Write(text) => println!("Text message: {}", text),
Message::ChangeColor(r, g, b) => println!(
"Change the color to red {}, green {}, and blue {}",
r, g, b
),
}
}

Destructuring nested structs and enums

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
enum Color {
Rgb(i32, i32, i32),
Hsv(i32, i32, i32),
}

enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(Color),
}

fn main() {
let msg = Message::ChangeColor(Color::Hsv(0, 160, 255));

match msg {
Message::ChangeColor(Color::Rgb(r, g, b)) => println!(
"Change the color to red {}, green {}, and blue {}",
r, g, b
),
Message::ChangeColor(Color::Hsv(h, s, v)) => println!(
"Change the color to hue {}, saturation {}, and value {}",
h, s, v
),
_ => (),
}
}

Destructuring structs and tuples

We can mix, match and nest destructuring patterns in more complex ways.

1
2
3
4
let ((feet, inches), Point { x, y })

= ((3, 10), Point { x: 3, y: -10 }
);

Ignore values in a pattern

ignore an entire value with _

1
2
3
fn foo(_: i32, y: i32) {
println!("This code only uses the y parameter: {}", y);
}

Ignoring a function parameter can be especially useful in some cases, for example, when implementing a trait when you need a certain type signature but the function body in your implementation doesn’t need one of the parameters.

ignore parts of the value with _

1
2
3
4
5
6
7
8
9
10
11
let mut setting_value = Some(5);
let new_setting_value = Some(10);

match (setting_value, new_setting_value) {
(Some(_), Some(_)) => {
println!("Can't overwrite an existing customized value");
}
_ => {
setting_value = new_setting_value;
}
}

ignore an unused variable by starting its name with _

1
2
3
4
fn main() {
let _x = 5;
let y = 10;
}

Note that there is a subtle difference between using only _ and using a name that starts with an underscore. The syntax _x still binds the value to the variable, whereas _ doesn’t bind at all.

Example will provide an error:

1
2
3
4
5
6
7
let s = Some(String::from("Hello!"));

if let Some(_s) = s {
println!("found a string");
}

println!("{:?}", s); // s is moved to _s

An unused variable starting with an underscore still binds the value, which might take ownership of the value

ignore remaining parts of a value with ..

We can use the .. syntax to use only a few parts and ignore the rest, avoiding the need to list underscores for each ignored value.

Note that .. can only be used once per tuple pattern to prevent the ambiguity of parts referred.

Extra conditionals with match guards

A match guard is an additional if condition specified after the pattern in a match arm that must also match, along with the pattern matching, for that arm to be chosen.

1
2
3
4
5
6
7
8
9
fn main() {
let num = Some(4);

match num {
Some(x) if x < 5 => println!("less than five: {}", x),
Some(x) => println!("{}", x),
None => (),
}
}

@ bindings

@ lets us create a variable that holds a value at the same time we’re testing that value to see whether it matches a pattern.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
fn main() {
enum Message {
Hello { id: i32 },
}

let msg = Message::Hello { id: 5 };

match msg {
Message::Hello {
id: id_variable @ 3..=7,
} => println!("Found an id in range: {}", id_variable),
Message::Hello { id: 10..=12 } => {
println!("Found an id in another range")
}
Message::Hello { id } => println!("Found some other id: {}", id),
}
}

Advanced features

Unsafe rust

Why it exists?

  • Static analysis is conservative. Although the code might be okay, if the Rust compiler doesn’t have enough information to be confident, it will reject the code. In these cases, unsafe code can let compiler to trust you to use the code at our own risk.
  • The underlying computer hardware is inherently unsafe.

Unsafe superpowers

unsafe keyword can let the block that holds the unsafe code:

  • Dereference a raw pointer
  • Call an unsafe function or method
  • Access or modify a mutable static variable
  • Implement an unsafe trait
  • Access fields of union

unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks

To isolate unsafe code as much as possible, it’s best to enclose unsafe code within a safe abstraction and provide a safe API. Wrapping unsafe code in a safe abstraction prevents uses of unsafe from leaking out into all the places that you or your users might want to use the functionality implemented with unsafe code, because using a safe abstraction is safe.

Dereferencing a raw pointer

Raw pointers can be mutable or immutable:

  • *const T
  • *mut T

Different from references and smart pointers, raw pointers:

  • Are allowed to ignore the borrowing rules by having both immutable and mutable pointers or multiple mutable pointers to the same location
  • Aren’t guaranteed to point to valid memory
  • Are allowed to be null
  • Don’t implement any automatic cleanup

Create raw pointers from references

1
2
3
let mut n = 5;
let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;

Creating a raw pointer to an arbitrary memory address

1
2
let addr = 0x012345usize;
let r = addr as *const i32;

dereferencing raw pointers within an unsafe block

1
2
3
4
unsafe {
println!("r1 is: {}", *r1);
println!("r2 is: {}", *r2);
}

Creating a pointer does no harm; it’s only when we try to access the value that it points at that we might end up dealing with an invalid value.

Note: we created *const i32and *mut i32 raw pointers that both pointed to the same memory location, where num is stored. If we instead tried to create an immutable and a mutable reference to num which** potentially creates a data race**, the code would not have compiled because Rust’s ownership rules don’t allow a mutable reference at the same time as any immutable references.

Calling an unsafe function or method

By calling an unsafe function within an unsafe block, we’re saying that we’ve read this function’s documentation and take responsibility for upholding the function’s contracts.

1
2
3
4
5
unsafe fn dangerous() {}

unsafe {
dangerous();
}

Bodies of unsafe functions are effectively unsafe blocks, so to perform other unsafe operations within an unsafe function, we don’t need to add another unsafe block.

Creating a safe abstraction over unsafe code

Take split_at_mut as an example, we may implement the function like this:

1
2
3
4
5
6
7
fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
let len = slice.len();

assert!(mid <= len);

(&mut slice[..mid], &mut slice[mid..])
}

Rust’s borrow checker can’t understand that we’re borrowing different parts of the slice; it only knows that we’re borrowing from the same slice twice. Borrowing different parts of a slice is fundamentally okay because the two slices aren’t overlapping.

use an unsafe block, a raw pointer, and some calls to unsafe functions to make the implementation of split_at_mut work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

#![allow(unused)]
fn main() {
use std::slice;

fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
let len = slice.len();
let ptr = slice.as_mut_ptr();

assert!(mid <= len);

unsafe {
(slice::from_raw_parts_mut(ptr, mid),
slice::from_raw_parts_mut(ptr.add(mid), len - mid))
}
}
}

Using extern functions to call external code

**Functions declared within extern blocks are always unsafe to call from Rust code. **

extern, that facilitates the creation and use of a Foreign Function Interface (FFI). An FFI is a way for a programming language to define functions and enable a different (foreign) programming language to call those functions.

1
2
3
4
5
6
7
8
9
extern "C" {
fn abs(input: i32) -> i32;
}

fn main() {
unsafe {
println!("Absolute value of -3 according to C: {}", abs(-3));
}
}

Within the extern "C" block, we list the names and signatures of external functions from another language we want to call. The "C" part defines which application binary interface (ABI) the external function uses: the ABI defines how to call the function at the assembly level. The "C" ABI is the most common and follows the C programming language’s ABI.

Calling Rust functions from other languages.

#[no_mangle] annotation to tell the Rust compiler not to mangle the name of this function. Manglingis when a compiler changes the name we’ve given a function to a different name that contains more information for other parts of the compilation process to consume but is less human readable. Every programming language compiler mangles names slightly differently.

1
2
3
4
#[no_mangle]
pub extern "C" fn call_from_c() {
println!("Just called a Rust function from C!");
}

Accessing or modifying a mutable static variable

If two threads are accessing the same mutable global variable, it can cause a data race.

So it’s preferable to use the concurrency techniques and thread-safe smart pointers.

Implementing an unsafe trait

A trait is unsafe when at least one of its methods has some invariant that the compiler can’t verify.

Both adding the unsafekeyword before trait and marking the implementation of the trait as unsafe are necessary.

1
2
3
4
5
6
7
unsafe trait Foo {
// methods
}

unsafe impl Foo for i32 {
// method implementations
}

Accessing fields of a union

A union is similar to a struct, but only one declared field is used in a particular instance at one time. Accessing fields of union must be worked with unsafe, because Rust can’t guarantee the type of the data currently being stored in the union instance.

Advanced traits

Specifying Placeholder Types in Trait Definitions with Associated Types

Associated types connect a type placeholder with a trait such that the trait method definitions can use these placeholder types in their signatures.

For example,

1
2
3
4
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}

Item is a placeholder type. Associated types might seem like generics

However, if using generics, you have to annotate the types in each implementation.

Default generic type parameters and operator overloading

Specifying a default concrete type for the generic type: <PlaceholderType=ConcreteType>

Rust doesn’t allow custom own operators or overloading arbitrary operators. Things we can do is to overload operations and corresponding traits listed in std::ops

Take add traits as example,

1
2
3
4
5
trait Add<RHS=Self> {
type Output;

fn add(self, rhs: RHS) -> Self::Output;
}
  • Rhs: generic type parameter(right hand side)
  • Rhs=Self: default type parameters

If we want to implement the Add trait where we want to customize the Rhs type rather than using the default:

1
2
3
4
5
6
7
8
9
struct Millimeters(u32);
struct Meters(u32);
impl Add<Meters> for Millimeters {
type Output = Millimeters;

fn add(self, other: Meters) -> Millimeters {
Millimeters(self.0 + (other.0 * 1000))
}
}

Use cases of default type parameters:

  • extend a type without breaking existing code.
  • allow customization in specific cases most users won’t need

Fully qualified syntax for disambiguation: calling methods with the same name

Imaging there’re two traits that have a method of same name as others.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
trait Pilot {
fn fly(&self);
}
trait Wizard {
fn fly(&self);
}
struct Human;
impl Pilot for Human {
fn fly(&self) {
println!("This is your captain speaking.");
}
}
impl Wizard for Human {
fn fly(&self) {
println!("Up!");
}
}

impl Human {
fn fly(&self) {
println!("*waving arms furiously*");
}
}
  • Pilot – fly
  • Wizard – fly
  • Human – fly

To specify the trait you wanna call:

1
2
3
4
5
6
fn main() {
let person = Human;
Pilot::fly(&person);
Wizard::fly(&person);
person.fly();
}

However, associated functions that are part of traits don’t have a self parameter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
trait Animal {
fn baby_name() -> String;
}
struct Dog;

impl Dog {
fn baby_name() -> String {
String::from("dog::baby_name()\n")
}
}
impl Animal for Dog {
fn baby_name() -> String {
String::from("Animal::baby_name()\n")
}
}

In this case, if you want to call Animal trait that is implemented in Dog, you cannot just use Animal::baby_name(). Instead, we use fully qualified syntax

1
<Dog as Animal>::baby_name();
1
<Type as Trait>::function(receiver_if_method, next_arg, ...);

For associated functions, there would not be a receiver: there would only be the list of other arguments.

Use spertraits to require one trait’s functionality within another trait

Sometimes, you might need one trait to use another trait’s functionality. In this case, you need to rely on the dependent trait also being implemented. The trait you rely on is a supertrait of the trait you’re implementing.

Use the newtype pattern to implement external traits on external types

Orphan rule: we’re allowed to implement a trait on a type as long as either the trait or the type are local to our crate.

It’s possible to get around this restriction using the newtype pattern.

For example, if we want to implement Display on Vec<T>, which the orphan rule prevents us from doing so because Display trait and the Vec<T> type are defined outside our crate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
use std::fmt;

struct Wrapper(Vec<String>);

impl fmt::Display for Wrapper {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "[{}]", self.0.join(", "))
}
}

fn main() {
let w = Wrapper(vec![String::from("hello"), String::from("world")]);
println!("w = {}", w);
}

Advanced types

Using the newtype pattern for type safety and abstraction

Another use of the newtype pattern is in abstracting away some implementation details of a type: the new type can expose a public API that is different from the API of the private inner type if we used the new type directly to restrict the available functionality.

The newtype pattern is a lightweight way to achieve encapsulation to hide implementation details.

Create type synonyms with type aliases

syntax:

1
type Kilometers = i32;

this can be used to reduce the repetition.

In std::io, a type alias declaration is defined:

1
type Result<T> = std::result::Result<T, std::io::Error>;

This line shows that std::io::Result<T> is equivalent to Result<T,E> with E filled in as std::io::Error

The never type that never returns any value

The special type named ! is known in type theory lingo as the empty type. For example,

1
2
3
fn bar() -> !{
//
}

What is the use case of never type?

case #1

1
2
3
4
let guess: u32 = match guess.trim().parse() {
Ok(num) => num,
Err(_) => continue,
};

Because match arms must all return the same type and we cannot return a u32 if there’s parsing error.

Here continue has a ! value. In a formal way, type ! can be coerced into any other type.

case #2

In the implementation of unwrap function in Option<T>, what is done to achieve returns a value or panic?

1
2
3
4
5
6
7
8
impl<T> Option<T> {
pub fn unwrap(self) -> T {
match self {
Some(val) => val,
None => panic!("called `Option::unwrap()` on a `None` value"),
}
}
}
  • panic! has type !
  • then it’s converted into type T

Dynamically Sized Types and sized trait

  • dynamically sized types: DSTs or unsized types

  • The same type in Rust must take up same amount of space.

&str stores the starting address of string and its length. So we can know a single &str variable takes up 2*usizes length.

That is the way in which DSTs are used in Rust: they have an extra bit of metadata that stores the size of the dynamic information. The golden rule of dynamically sized types is that we must always put values of dynamically sized types behind a pointer of some kind.

Traits is actually a DST that we can refer to by using the name of the trait.

Use traits as trait objects such as:

  • &dyn Trait
  • Box<dyn Trait>
  • Rc<dyn Trait>

Sized trait is used to determine whether or not a type’s size is known at compile time.

In addition, Rust implicitly adds a bound on Sized to every generic function:

1
2
3
fn generic<T>(t: T) {
// --snip--
}

is actually treated as though we had written this:

1
2
3
fn generic<T: Sized>(t: T) {
// --snip--
}

By default, generic functions will only work on types that have a known size at compile time.

A special syntax to refer “may not”

1
2
fn generic<T: ?Sized>(t: &T) {
}

?Sized means T may or may not be Sized. This syntax is only available for Sized.

Note that as T may not be Sized, we should use it behind some kind of pointer.

Advanced functions and closures

Function can be passed as a type function pointerfn.

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
fn add_one(x: i32) -> i32 {
x + 1
}

fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32 {
f(arg) + f(arg)
}

fn main() {
let answer = do_twice(add_one, 5);

println!("The answer is: {}", answer);
}

Note that:

  • fn implement all three of the closure traits(Fn, FnMut, and FnOnce)
  • When interfacing with external code doesn’t have closures, accepting fn would be useful.

An example for using either a closure or function:

1
2
3
4
5
let list_of_numbers = vec![1, 2, 3];
let list_of_strings: Vec<String> =
list_of_numbers.iter().map(|i| i.to_string()).collect();
let list_of_strings: Vec<String> =
list_of_numbers.iter().map(ToString::to_string).collect();

Here, we’re using the to_string function defined in the ToString trait, which the standard library has implemented for any type that implements Display.

Another example

We have another useful pattern that exploits an implementation detail of tuple structs and tuple-struct enum variants.

The initializers are actually implemented as functions returning an instance that’s constructed from their arguments.

So:

1
2
3
4
5
6
enum Status {
Value(u32),
Stop,
}

let list_of_statuses: Vec<Status> = (0u32..20).map(Status::Value).collect();

Returning closures

If we want to return a closure directly, :

1
2
3
fn returns_closure() -> dyn Fn(i32) -> i32 {
|x| x + 1
} // not compiled

The compile error references the Sized trait. Rust doesn’t know how much space it will need to store the closure. We can use a trait object.

1
2
3
fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
Box::new(|x| x + 1)
}

“Using Trait Objects That Allow for Values of Different Types”

Macros

The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:

  • Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
  • Attribute-like macros that define custom attributes usable on any item
  • Function-like macros that look like function calls but operate on the tokens specified as their argument

The difference between macros and functions

Fundamentally, macros are a way of writing code that writes other code, which is known as metaprogramming.

println! and vec! Macros are expanded to produce more code than the code written manually.

Macros have some additional powers that functions don’t.

  • Macros can take a variable number of parameters while function must declare fixed params and numbers of them.
  • Macros are expanded before the compiler interprets the code. So, a macro can implement a trait on a given type which need to be done at compile time.
  • The downside to implementing a macro instead of a function is that macro definitions are more complex than function definitions because you’re writing Rust code that writes Rust code.
  • You must define macros or bring them into scope before you call them in a file, as opposed to functions you can define anywhere and call anywhere.

Declarative macros with macro_rules! for general metaprogramming

Most common form of macros is declarative macros

what declarative macros do is similar to pattern matching. The arguments that macros receive will be passed and compared with the structure of that source code.

Take how the vec! macro is defined as example: (vec! Macro in standard library is slightly different)

1
2
3
4
5
6
7
8
9
10
11
12
#[macro_export]
macro_rules! vec {
( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::new();
$(
temp_vec.push($x);
)*
temp_vec
}
};
}
  • #[macro_export] means this macro shoud be made available whenever the crate in which the macro is defined is brought into scope.
  • the name(vec) end without !
  • macro patterns are matched against Rust code structure rather than values.

Ref:

Procedural macros for generating code from attributes

Procedural macros act more like functions.

  • custom derive
  • Attribute-like
  • Function-like

The defitions must reside in their own crate with a a special crate type.

1
2
3
4
5
use proc_macro;

#[some_attribute]
pub fn some_name(input: TokenStream) -> TokenStream {
}
  • some_attribute is a placeholder for using a specific macro.
  • The TokenStream type is defined by the proc_macro crate that is included with Rust and represents a sequence of tokens.

Custom derive macro

1
2
3
4
5
6
7
8
9
use hello_macro::HelloMacro;
use hello_macro_derive::HelloMacro;

#[derive(HelloMacro)]
struct Pancakes;

fn main() {
Pancakes::hello_macro();
}
  • #[derive(HelloMacro)] annotation is to get default implementation of the hello macro function
  • We can’t yet provide the hello_macro function with default implementation that will print the name of the type the trait is implemented on: Rust doesn’t have reflection capabilities, so it can’t look up the type’s name at runtime
  • If we create crate foo, a custom derive procedural macro crate is called foo_derive.
    • To use these crates, both of them must be added as dependencies and bring them both into scope
    • If we change the trait definition in foo, implementation of foo_derive must be changed too.

Steps:

  • Create a custom library

  • Modify its Cargo.toml

    • declare the crate as a procedural macro crate

    • add dependencies: syn and quote

    • Finally the whole file would look like this:

      1
      2
      3
      4
      5
      6
      [lib]
      proc-macro = true

      [dependencies]
      syn = "1.0"
      quote = "1.0"
  • Edit the source code

This won’t compile yet. ( proc_marcro is included in Rust )

1
2
3
4
5
6
7
8
9
10
11
12
extern crate proc_macro;
use proc_macro::TokenStream;
use quote::quote;
use syn;
#[proc_macro_derive(HelloMacro)]
pub fn hello_macro_derive(input: TokenStream) -> TokenStream {
// Construct a representation of Rust code as a syntax tree
let ast = syn::parse(input).unwrap();

// Build the trait implementation
impl_hello_macro(&ast)
}

code is split into two parts:

  • parsing the TokenStream
  • transforming the syntax tree: this makes writing a procedural macro more convenient.
  • syn crate parses Rust code from a string into a data structure that we can perform operations on.
  • quote crate turns syn data structures back into Rust code
  • The hello_macro_derive function will be called a user specifies #[derive(HelloMacro)] on a type. This is possible because we’ve annotated the hello_macro_derive function here with proc_macro_derive and specified the name, HelloMacro, which matches our trait name;
  • The following example shows parsed data from struct Pancakes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
DeriveInput {
ident: Ident { // identifier
ident: "Pancakes",
span: #0 bytes(95..103)
},
data: Struct(
DataStruct {
struct_token: Struct,
fields: Unit,
semi_token: Some(
Semi
)
}
)
}

  • An example of impl_hello_macro
1
2
3
4
5
6
7
8
9
10
11
fn impl_hello_macro(ast: &syn::DeriveInput) -> TokenStream {
let name = &ast.ident;
let gen = quote! {
impl HelloMacro for #name {
fn hello_macro() {
println!("Hello, Macro! My name is {}!", stringify!(#name));
}
}
};
gen.into()
}
  • ast.ident return the ident field in Ident
  • quote! macro is to define the rust code that we want to return
  • into method convert the result of quote! macro to TokenStream type.
  • #name will be replaced by quote! with the value in name. This is a kind of templating mechanic.
  • stringify! Macro turns the expression into a string literal at compile time.

Then manually specify the path of dependencies:

1
2
3
[dependencies]
hello_macro = { path = "../hello_macro" }
hello_macro_derive = { path = "../hello_macro/hello_macro_derive" }

Attribute-like macros

  • Derive attribute only works for structs and enums
  • Attributes can be applied to other items like functions.

An example use case of attributes

1
2
#[route(GET, "/")]
fn index() {

The signature of the macro definition function would look like this:

1
2
#[proc_macro_attribute]
pub fn route(attr: TokenStream, item: TokenStream) -> TokenStream {
  • attr is the contents of the attribute: GET,"/" in this case
  • item is the body of the item the attribute is attached to: fn index(){} and the rest of the function’s body in this case.

Function-like macros

Function-like macros define macros that look like function calls.

  • Take unknown number of arguments
  • ake a TokenStream parameter and their definition manipulates that TokenStream using Rust code as the other two types of procedural macros do

An example of function-like macro:

1
let sql = sql!(SELECT * FROM posts WHERE id=1);

This macro would parse the SQL statement inside it and check that it’s syntactically correct, which is much more complex processing than a macro_rules! macro can do.

The sql! macro’s signature would look like this:

1
2
#[proc_macro]
pub fn sql(input: TokenStream) -> TokenStream {