C/C++教程

Study Notes of CS:APP (Chapter 2)

本文主要是介绍Study Notes of CS:APP (Chapter 2),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

Study Notes of CS:APP

Part I  Program Structure and Execution

  How application programs are represented and executed.

Chapter 2 Representing and Manipulating Information

[22-01]
Two-valued Signals versus Decimal Notation

•  Computers store and process information represented as two-valued signals.

•  Lowly binary digits, or bits

•  Decimal, or base-10, representation: natural for humans.

•  Two-valued signals can readily be represented, stored, and transmitted

Encodings

•  When we group bits together and apply some interpretation that gives meaning to the different possible bit patterns, we can represent the elements of any finite set.

•  3 most important representations of numbers:

•  Unsigned encodings

•  Based on traditional binary notation

•  Represent numbers ≥ 0

•  Two's-complement encodings

•  Represent signed integers

•  Either positive or negative

•  Floating-point encodings

•  Base-2 version of scientific notation for representing real numbers

•  Arithmetic operations implemented

•  Properties of arithmetic operations

•  Properties of integer computer (vs. true integer) arithmetic

•  Difference: Use a limited number of bits, hence some operations can overflow when the results are too large to be represented.

•  Similarity: Multiplication is associative and commutative

•  Properties of floating-point (vs. integer) arithmetic

•  Different:

•  Product of a set of positive numbers will always be positive

•  Not associative due to the finite precision

•  Stem from the difference in how they handle the finiteness

•  Integer representations encode a comparatively small range of values precisely

•  Floating-point representations encode a wide range of values approximately

•  A number of computer security vulnerabilities have arisen due to some of the subtleties of computer arithmetic.

2.1  Information Storage [22-01]

Virtual Memory

•  Computers use bytes (blocks of 8 bits) as the smallest addressable unit of memory.

•  A machine-level program views memory as virtual memory (a very large array of bytes).

•  Every byte of memory is identified by its address (a unique number).

•  Virtual address space: set of all possible addresses

Storing Program Objects

•  Program objects: program data, instructions, and control information

•  Value of a pointer in C is the virtual address of the first byte of some block of storage.

•  C compiler also associate type information with each pointer.

•  Actual machine-level program simply treats each program object as a block of bytes and the program itself as a sequence of bytes.

2.1.1  Hexadecimal Notation

A Byte in Different Notations

•  Byte = 8 bits

•  Binary

•  000000002 to 111111112

•  Decimal

•  010 to 25510

•  Hexadecimal (or "hex", base-16):

•  Convenient for describing bit patterns

•  Use '0' through '9' along with 'A' through 'F'

•  0016 to FF16

•  

•  In C, starts with 0x or 0X

•  Example: write FA1D37B16 as 0xFA1D37B, as 0xfa1d37b, or even 0xFa1D37b

Converting Between Decimal, Binary, and Hexadecimal

•  Converting between binary and hexadecimal

•  Straightforward, by referring to chart

•  Simple trick: Memorize digital equivalents of A, C, F

•  Convert hexadecimal to binary

•  Example:

•  Hexadecimal: 0x173A4C

•  

•  Convert binary to hexadecimal

•  Split binary into 4-bit groups and translate each

•  Note: Make leftmost group with ≤4 bits

•  Example:

•  Binary: 1111001010110110110011

•  

•  When x = 2n, binary of x: 10…02 (n 0s).

•  For n = i + 4j, where 0 ≤ i ≤ 3, hex of x: 0x2i0…0 (j 0s).

•  Example: x = 2,048 = 211

•  n = 11= 3 + 4 · 2

•  Hexadecimal: 0x800

•  Converting between decimal and hexadecimal: multiplication or division

•  Convert decimal to hexadecimal

•  Convert decimal x to hexadecimal:

•  Let x = q · 16 + r

•  Use r as the least significant digit and generate the remaining digits by repeating the process on q

•  Example

•  Decimal: 314,156

•  Hexadecimal: 0x4CB2C

•  Convert hexadecimal to decimal

•  Multiply each of the hex digits by the appropriate power of 16

•  Example

•  Hexadecimal: 0x7AF

•  Decimal: 7 · 162 + 10 · 16 + 15 = 7 · 256 + 10 · 16 + 15 = 1,967

2.1.2  Data Sizes

The Word Size

•  Every computer has a word size (nominal size of pointer data)

•  For w-bit machine, virtual addresses range from 0 to 2w – 1

•  There's been a widespread shift from 32-bit machines to 64-bit ones

•  32-bit: virtual address space of 4 GB

•  64-bit: virtual address space of 16 EB

•  1 exabyte = 260

•  Distinction between 32-bit programs and 64-bit programs: compilation, rather than machine type

Data Formats

•  Computers and compilers support multiple data formats using different ways to encode data.

•  C supports multiple data formats for both integer and floating-point data.

•  Integer data:

•  Signed: negative, zero, and positive values

•  Unsigned: nonnegative values

•  char: single byte

•  Stores character / integer values

•  short, int, and long: provide a range of sizes.

•  Pointer (e.g., char *): full word size

•  Two different floating-point formats:

•  float: single precision, 4 bytes

•  double: double precision, 8 bytes

•  Fixed-size integer types: int32_t and int64_t

•  Used to have close control over data representations

•  Signed and unsigned values

•  Mostly signed

•  Unsigned prefixed by unsigned or uint-.

•  char: mostly signed but not guaranteed

•  signed char: 1-byte signed value

•  In many contexts, insensitive

•  C allows a variety of ways:

•  To order the keywords

•  To include or omit optional keywords

•  One aspect of portability: to make the program insensitive to the exact sizes of the different data types.

2.1.3  Addressing and Byte Ordering

•  For multi-byte objects, we must establish two conventions:

•  Addressing

•  Byte ordering

Addressing

•  Multi-byte object is stored as a contiguous sequence of bytes

•  Address: smallest address of the bytes used

•  Example: int x at 0x100

•  (Assuming 32-bit) 4 bytes would be stored in 0x100, 0x101, 0x102, and 0x103

Byte Ordering

•  Two common conventions.

•  Consider w-bit integer [xw −1, xw −2, . . . , x1, x0], where xw−1 is the most significant bit, x0 the least.

•  Assuming w is multiple of 8

•  Most significant byte: [xw−1, xw−2, …, xw−8]

•  Least significant byte: [x7, x6, …, x0]

•  Other bytes: bits from the middle

•  Little endian: least significant byte comes first

•  Big endian: most significant byte comes first

•  Example: int x at 0x100 has value of 0x01234567

•  Ordering depends on the type of machine:

•  Note: high-order byte is 0x01, while low-order byte is 0x67.

•  Machines:

•  Little-endian: most Intel-compatible machines, Android, iOS (OSs)

•  Big-endian: most machines from IBM and Oracle

•  Bi-endian: ARM microprocessors (hardware)

•  Cases when byte-ordering becomes an issue:

1. When binary data are communicated over a network between different machines.

2. When looking at the byte sequences representing integer data.

•  Occurs often when inspecting machine-level programs

•  Example: line of a disassembly an Intel x86-64 processor

4004d3: 01 05 43 0b 20 00    add    %eax,0x200b43(%rip)

•  Generated by a disassembler

•  Disassembler: tool that determines the instruction sequence represented by an executable program file

•  Bytes are in reverse order for disassembly generated for little-endian machines

3. When programs are written that circumvent the normal type system.

•  In C, using a cast or a union

•  C code that uses casting to print the byte representations of program objects.

•  size_t: preferred data type for expressing the sizes of data structures

•  sizeof(T) returns # bytes required to store an object of type T.

•  Writing portable code

•  Running code on different machines

•  Different machine/OS configurations use different conventions for storage allocation

2.1.4  Representing Strings

•  String in C

•  Encoded by an array of characters

•  Each character is most commonly represented by the ASCII character code.

•  ASCII code for decimal digit x happens to be 0x3x.

•  Terminated by terminating byte (0x00)

2.1.5  Representing Code

•  Binary code is seldom portable across different combinations of machine and OS.

•  Fundamental concept: A program, from the perspective of the machine, is simply a sequence of bytes.

2.1.6  Introduction to Boolean Algebra

•  Boolean algebra.

•  Work of George Boole

•  true, false encoded as 1, 0

Boolean Operations

•  Simplest Boolean algebra is defined over {0, 1}.

•  4 Boolean operations

Boolean operation

Corresponding logical operation

not

¬

and

or

exclusive-or

 

Boolean Operations over Bit Vectors

•  Extending the 4 Boolean operations to bit vectors (strings of 0's and 1's of fixed length w).

•  Defined according to applications to the matching elements

•  Examples

•  Application: Represent finite sets

•  Encode A ⊆ {0, 1, …, w− 1} with [aw−1, …, a1, a0], ai = 1 if and only if iA.

•  Example: a = [01101001] encodes A = {0, 3, 5, 6}, b = [01010101] encodes B = {0, 2, 4, 6}.

•  Corresponding set operations of Boolean operations

Boolean operation

Corresponding set operation 

Set union

Set intersection

Set complement

•  Continuing the example: a & b yields [01000001], AB = {0, 6}.

•  Practical applications example:

•  Signal can interrupt the execution of a program.

•  Applications can explicitly block and unblock selected signals by specifying the blocked bit-vector mask

2.1.7  Bit-Level Operations in C

•  C supports bitwise Boolean operations.

•  Symbols

C symbol 

Corresponding logical operation

not

and

or

exclusive-or

 

•  Can be applied to any "integral" data type

•  Examples: expression evaluation for char

•  Evaluating bit-level expression:

(1) Expand hexadecimal to binary

(2) Perform the operations

(3) Convert back to hexadecimal

•  Use: Implement masking operations

•  Mask: bit pattern that indicates a selected set of bits within a word.

•  Example:

•  x & 0xFF: least significant byte of x and 0's (all others).

•  Examples:

•  x = 0x89ABCDEF: 0x000000EF

•  ~0: all 1's

2.1.8  Logical Operations in C

•  Logical operators provided by C

Logical operator

Corresponding logic operation 

|| 

or

&& 

and

not

•  Argument: nonzero is true, 0 is false

•  Return either 1 (true) or 0 (false)<span style="font-variant: small-caps">

•  </span>Examples of expression evaluation

•  

Bitwise Operations vs. their Logical Counterparts

•  Matches only when arguments are restricted to 0 or 1.

•  '&&' and '||' do not evaluate 2nd argument if result of can be determined by evaluating 1st

•  Examples: a && 5/a and p && *p++ will never cause an exception

2.1.9  Shift Operations in C

•  Shift operations shift bit patterns to the left and to the right.

•  x: [xw−1, xw−2, …, x0]

•  Left shift operation: x << k

•  [xwk−1, xwk−2, …, x0, 0, …, 0]

•  x is shifted k bits to the left, dropping off k xw−1s and filling the right end with k 0's.

•  0 ≤ kw − 1

•  Associate from left to right:

•  x << j << k ⇔ (x << j) << k

•  Right shift operation: x >> k (2 forms)

•  Logical

•  Left end is filled with k 0's

•  [0, …, 0, xw−1, xw−2, …, xk]

•  Arithmetic

•  Left end is filled with k xw−1s

•  [xw−1, …, xw−1, xw−1, xw−2, …, xk]

•  Examples (italicized fill):

•  In C, no precise definition, either

•  In practice

•  For signed data, arithmetic

•  For unsigned data, logical (must)

•  Java definition

•  x >> k: arithmetic

•  x >>> k: logical

2.2  Integer Representations [earlier]

2.2.1  Integral Data Types [earlier]

Integral data types represent finite ranges of integers.

•  Size: char, short, long

•  All nonnegative or possibly negative: unsigned or the default

•  The only machine-dependent: long (8-byte with 64-bit, 4-byte with 32-bit)

•  Asymmetric

 

•  ≤ the typical

•  Symmetric except the fixed-size data types

•  int could be 2-byte (mostly for 16-bit), long can be 4-byte (typically for 32-bit)

•  The fixed-size data types

•  Ranges = those of typical numbers

•  Asymmetric

2.2.2  Unsigned Encodings [earlier]

Consider an integer data type of w bits.

•  A bit vector is written as:

•  , to denote the entire vector

•  [xw−1, xw−2, ..., x0], to denote the individual bits within the vector

•  The unsigned interpretation of : treated as binary.

•  xi = 0 or 1 (2i is part of the value).

principle: Definition of unsigned encoding

For vector = [xw−1, xw−2, ..., x0]:

B2Uw() ≐ xi2i

•  ≐: the left-hand side is defined to be equal to the right-hand side.

•  Examples:

•  UMaxw2i =2w − 1

•  A mapping: B2Uw: {0, 1}w→{0, … , UMaxw}.

principle: Uniqueness of unsigned encoding

Function B2Uw is a bijection.

•  Bijection: a function f that goes two ways: it maps a value x to a value y where y = f (x), but it can also operate in reverse, since for every y, there is a unique value x such that f (x) = y, which is given by the inverse function x = f−1(y).

•  U2Bw: the inverse of B2Uw

2.2.3  Two's-Complement Encodings [earlier]

Two's-complement form: the most common representation of signed numbers

principle: Definition of two's-complement encoding

For vector = [xw−1, xw−2, ..., x0]:

B2Tw() ≐ −xw−12w−1 +xi2i

The sign bit: xw−1

•  "Weight": −2w−1

•  1: negative; 0: nonnegative.

•  TMinw ≐ −2w−1, TMaxw2i =2w−1 − 1.

•  A mapping: B2Tw: {0, 1}w →{TMinw, …, TMaxw}.

principle: Uniqueness of two's-complement encoding

Function B2Tw is a bijection.

•  T2Bw: the inverse of B2Tw

Drop w: UMax, TMin, and TMax

Points worth highlighting:

•  Asymmetric Range: |TMin| = |TMax| + 1

•  UMax = 2TMax + 1

•  −1: same representation as UMax

•  0: a string of all 0's in both

 

  Two's-complement in languages:

•  Not required by C. <limits.h> defines a set of constants delimiting the ranges of the different integer data types for the particular machine.

•  Example: for a two's-complement machine, INT_MAX = TMaxw, INT_MIN = TMinw and UINT_MAX = UMaxw.

•  Required by Java.

Example to get a better understanding:

 

2.2.4  Conversions between Signed and Unsigned [earlier]

Casting example:

•  int x, unsigned u

•   (unsigned) x converts x to unsigned, (int) u converts u to int.

A general rule of handling conversions between signed and unsigned numbers with the same word size—the numeric values might change, but the bit patterns do not.

U2Bw and T2Bw [earlier]

•  U2Bw

•  0 ≤ x ≤ UMaxw, U2Bw(x)

•  Unique unsigned

•  T2Bw

•  TMinw ≤ x ≤ TMaxw, T2Bw(x)

•  Unique two's-complement

T2Uw and U2Tw [earlier]

•  T2Uw

•  TMinw ≤ x ≤ TMaxw, T2Uw(x) ≐ B2Uw(T2Bw(x))

•  0 ≤ T2Uw(x) ≤ UMaxw, same representation

principle: Conversion from two's complement to unsigned

For x such that TMinwxTMaxw:

T2Uw(x) =

derivation: Conversion from two's complement to unsigned

B2Uw(T2Bw(x)) = T2Uw(x) = x + xw−12w

In a two's-complement representation of x, bit xw−1 determines whether or not x is negative.

•  Examples:

 

•  The behavior of T2U:

•  < 0: converted to large positive

•  ≥ 0: unchanged

•  U2Tw(u)

•  U2Tw 0 ≤ x ≤ UMaxw, U2Tw(x) ≐ B2Tw (U2B w(x))

•  0 ≤ U2T w(x) ≤ UMaxw, same representation

principle: Unsigned to two's-complement conversion

For u such that 0 ≤ uUMaxw:

U2Tw(u) =

derivation: Unsigned to two's-complement conversion

U2Tw(u) = −uw−12w + u

In the unsigned representation of u, bit uw−1 determines whether or not u is greater than TMaxw

= 2w−1 − 1.

•  The behavior of U2T:

•  ≤ TMaxw, unchanged

•  > TMaxw, converted to negative

Summary [earlier]

The effects of converting in both directions:

•  0 ≤ xTMaxw

•  T2Uw(x) = x, U2Tw(x) = x, identical

•  Outside

•  +/− 2w

•  2 extremes:

•  T2Uw (−1) = UMaxw

•  T2Uw(TMinw) = TMaxw + 1

2.2.5  Signed versus Unsigned in C [earlier]

Numbers

•  Signed: by default

•  Example: 12345 or 0x1A2B

•  Unsigned: adding 'U' or 'u' as a suffix

•  Example: 12345U or 0x1A2Bu

Conversion between:

•  Allowed but not specified. Mostly U2Tw and T2Uw.

•  Explicit casting and implicit casting (when an expression of one type is assigned to a variable of another)

•  printf does not use type information.

•  Possibly nonintuitive behavior:

2.2.6  Expanding the Bit Representation of a Number [earlier]

One common operation: to convert between integers having different word sizes while retaining the same numeric value. Converting from a smaller to a larger data type should always be possible.

Converting Unsigned to Larger [earlier]

Zero extension: adding leading zeros to the representation.

principle: Expansion of an unsigned number by zero extension

Define bit vectors = [uw−1, uw−2, …, u0] of width w and ′ = [0, …, 0, uw−1, uw−2, …, u0] of width w, where w′>w. Then B2Uw() = B2Uw(′).

Converting Two's-complement to Larger [earlier]

Sign extension: adding copies of the most significant bit to the representation.

principle: Expansion of a two's-complement number by sign extension

Define bit vectors = [xw−1, xw−2, …, x0] of width w and ′ = [xw−1, …, xw−1, xw−1, xw−2, …, x0] of width w, where w′>w. Then B2Tw (x) = B2Tw(x).

•  The value is preserved:

derivation: Expansion of a two's-complement number by sign extension

 

•  The relative order of conversion: The program first changes the size and then the type.

2.2.7  Truncating Numbers [earlier]

Truncating = [xw−1, xw−2, …, x0] to k-bit: drop the high-order wk bits, ′ = [xk−1, xk−2, …, x0]. Truncating a number can alter its value—a form of overflow.

Truncating Unsigned [earlier]

principle: Truncation of an unsigned number

Let be the bit vector [xw−1, xw−2, …, x0], and let ′ be the result of truncating it to k bits: ′ = [xk−1, xk−2, …, x0]. Let x = B2Uw() and x′ = B2Uk(′). Then x = x′ mod 2k.

Truncating Two's-complement [earlier]

principle: Truncation of a two's-complement number

Let be the bit vector [xw−1, xw−2, …, x0], and let ′ be the result of truncating it to k bits: ′ = [xk−1, xk−2, …, x0]. Let x = B2Uw() and x′ = B2Tk(′). Then x = U2Tk(x mod 2k).

derivation: Truncation of a two's-complement number

B2Tw ([xw−1, xw−2, …, x0]) mod 2k = B2Uk ([xk−1, xk−2, …, x0])

Summary [earlier]

The effect of truncation

•  For unsigned:

B2Uk([xk−1, xk−2, …, x0]) = B2Uw([xw−1, xw−2, …, x0]) mod 2k

•  For two's-complement:

B2Tk([xk−1, xk−2, …, x0]) = U2Tk(B2Uw ([xw−1, xw−2, …, x0]) mod 2k)

2.2.8  Advice on Signed versus Unsigned [earlier]

The implicit conversion can lead to errors or vulnerabilities.

•  One way to avoid: to never use unsigned.

•  Example: Java.

  Unsigned values are useful

•  When words = collections of bits. Example:

•  Packing a word with flags describing various Boolean conditions

•  Addresses (naturally unsigned)

•  When implementing mathematical packages for modular / multiprecision arithmetic.

2.3  Integer Arithmetic [earlier]

2.3.1  Unsigned Addition [earlier]

"Word size inflation":

•  Some programming languages support arbitrary size arithmetic;
More commonly, programming languages support fixed-size arithmetic.

 

  x y for arguments x and y, where 0 ≤ x, y < 2w: the result of truncating the integer sum x + y to be w bits long, viewed as an unsigned number.

•  Characterized as a form of modular arithmetic: discarding any bits with weight > 2w−1

principle: Unsigned addition

For x and y such that 0 ≤ x, y < 2w:

x y =

•  Illustration:

Overflow: an arithmetic operation whose integer result cannot fit within the word size limits of the data type.

•  Occurs when the two operands sum to 2w or more

•  Not signaled as errors

principle: Detecting overflow of unsigned addition

For x and y in the range 0 ≤ x, yUMaxw, let s ≐ x y. Then the computation of s overflowed if and only if s < x (or equivalently, s < y).

Modular addition forms an abelian group (a mathematical structure).

•  Commutative and associative

•  The identity element: 0; every element has an additive inverse

•  Value x for every value x: x x = 0

principle: Unsigned negation

For any number x such that 0 ≤ x < 2w, its w-bit unsigned negation x is given by the following:

x =

2.3.2  Two's-Complement Addition [earlier]

x y, given integer values x and y where −2w−1 x, y ≤ 2w−1 − 1: the result of truncating the integer sum x + y to w bits, viewed as a two's-complement number.

principle: Two's-complement addition

For integer values x and y in the range −2w−1 x, y ≤ 2w−1 − 1:

•  Illustration:

•  Positive overflow: x + y exceeds TMaxw (case 4)

•  Negative overflow: x + y is less than TMinw (case 1)

•  Has the same bit-level representation as the unsigned sum

•  Examples:

principle: Detecting overflow in two's-complement addition

For x and y in the range TMinwx, yTMaxw, let sx y. Then the computation of s has had positive overflow if and only if x > 0 and y > 0 but s ≤ 0. The computation has had negative overflow if and only if x < 0 and y <0 but s ≥ 0.

•  Illustrations:

2.3.3  Two's-Complement Negation [earlier]

x: the additive inverse under

principle: Two's-complement negation

For x in the range TMinwxTMaxw, its two's-complement negation x is given by the formula

x =

2.3.4  Unsigned Multiplication [earlier]

x y for integers x and y where 0 ≤ x, y ≤ 2w − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as an unsigned number.

principle: Unsigned multiplication

For x and y such that 0 ≤ x, yUMaxw:

x y = (x · y) mod 2w

2.3.5  Two's-Complement Multiplication [earlier]

x y for integers x and y where −2w-1x, y ≤ 2w-1 − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as a two's-complement number.

principle: Two's-complement multiplication

For x and y such that TMinwx, yTMaxw:

x y = U2Tw((x · y) mod 2w)

principle: Bit-level equivalence of unsigned and two's-complement multiplication

Let and be bit vectors of length w. Define integers x and y as the values represented by these bits in two's-complement form: x = B2Tw(x) and y = B2Tw(y). Define nonnegative integers x′ and y′ as the values represented by these bits in unsigned form: x = B2Uw(x) and y = B2Uw(y). Then

T2Bw(x y) = U2Bw(x y′)

•  Illustrations:

2.3.6  Multiplying by Constants [earlier]

Integer multiply is slow. Optimization: to replace multiplications by constants with combinations of shift and addition operations.

principle: Multiplication by a power of 2

Let x be the unsigned integer represented by bit pattern [xw−1, xw−2, …, x0]. Then for any k ≥ 0, the w + k-bit unsigned representation of x2k is given by [xw−1, xw−2, …, x0, 0, …, 0], where k zeros have been added to the right.

principle: Unsigned multiplication by a power of 2

For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x << k yields the value x 2k.

principle: Two's-complement multiplication by a power of 2

For C variables x and k with two's-complement value x and unsigned value k, such that 0 ≤ k < w, the C expression x << k yields the value x 2k.

The task of generating code for the expression x * K, for some constant K.

•  K: [(0…0) (1…1) (0…0) . . . (1…1)].

•  Compute a run of 1's from bit position n down to bit position m (nm) using either form:

•  Form A: (x<<n) + (x<<(n − 1)) + . . . + (x<<m)

•  Form B: (x<<(n + 1)) - (x<<m)

•  Compute x * K by adding together the results for each run.

2.3.7  Dividing by Powers of 2 [earlier]

Performed using a right shift:

•  Logical: unsigned

•  Arithmetic: two's-complement

Integer division always rounds toward 0. Some notation for any real number a:

•  a⌋: the unique integer a′ such that a′ ≤ a < a′ + 1.

•  a⌉: the unique integer a′ such that a′ − 1< a′ ≤ a′.

Dividing by a Power of 2 with Unsigned Arithmetic [earlier]

principle: Unsigned division by a power of 2

For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x >> k yields the value ⌊x/2k⌋.

•  Examples:

Dividing by a Power of 2 with Two's-complement Arithmetic [earlier]

Using an arithmetic right shift:

principle: Two's-complement division by a power of 2, rounding down

Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression x >> k, when the shift is performed arithmetically, yields the value ⌊x/2k⌋.

•  Examples:

  Correcting for the improper rounding that occurs when a negative number is shifted right by "biasing" the value before shifting.

principle: Two's-complement division by a power of 2, rounding up

Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression (x + (1 << k) - 1) >> k, when the shift is performed arithmetically, yields the value ⌈x/2k⌉.

•  Demonstration:

 

  (x<0 ? x+(1<<k)-1 : x) >> k will compute x/2k.

2.3.8  Final Thoughts on Integer Arithmetic [earlier]

2.4  Floating Point [earlier]

2.4.1  Fractional Binary Numbers [earlier]

2.4.2  IEEE Floating-Point Representation [earlier]

2.4.3  Example Numbers [earlier]

2.4.4  Rounding

2.4.5  Floating-Point Operations [earlier]

2.4.6  Floating Point in C [earlier]

2.5  Summary [earlier]

Computers encode information as bits, generally organized as sequences of bytes. Different encodings are used for representing integers, real numbers, and character strings. Different models of computers use different conventions for encoding numbers and for ordering the bytes within multi-byte data.

The C language is designed to accommodate a wide range of different implementations in terms of word sizes and numeric encodings. Machines with 64-bit word sizes have become increasingly common, replacing the 32-bit machines that dominated the market for around 30 years. Because 64-bit machines can also run programs compiled for 32-bit machines, we have focused on the distinction between 32- and 64-bit programs, rather than machines. The advantage of 64-bit programs is that they can go beyond the 4 GB address limitation of 32-bit programs.

Most machines encode signed numbers using a two's-complement representation and encode floating-point numbers using IEEE Standard 754. Understanding these encodings at the bit level, as well as understanding the mathematical characteristics of the arithmetic operations, is important for writing programs that operate correctly over the full range of numeric values.

When casting between signed and unsigned integers of the same size, most C implementations follow the convention that the underlying bit pattern does not change. On a two's-complement machine, this behavior is characterized by functions T2Uw and U2Tw, for a w-bit value. The implicit casting of C gives results that many programmers do not anticipate, often leading to program bugs.

Due to the finite lengths of the encodings, computer arithmetic has properties quite different from conventional integer and real arithmetic. The finite length can cause numbers to overflow, when they exceed the range of the representation. Floating-point values can also underflow, when they are so close to 0.0 that they are changed to zero.

The finite integer arithmetic implemented by C, as well as most other programming languages, has some peculiar properties compared to true integer arithmetic. For example, the expression x*x can evaluate to a negative number due to overflow. Nonetheless, both unsigned and two's-complement arithmetic satisfy many of the other properties of integer arithmetic, including associativity, commutativity, and distributivity. This allows compilers to do many optimizations. For example, in replacing the expression 7*x by (x<<3)-x, we make use of the associative, commutative, and distributive properties, along with the relationship between shifting and multiplying by powers of 2.

We have seen several clever ways to exploit combinations of bit-level operations and arithmetic operations. For example, we saw that with two's-complement arithmetic, ~x+1 is equivalent to -x. As another example, suppose we want a bit pattern of the form [0, …, 0, 1, …, 1], consisting of wk zeros followed by k ones. Such bit patterns are useful for masking operations. This pattern can be generated by the C expression (1<<k)-1, exploiting the property that the desired bit pattern has numeric value 2k − 1. For example, the expression (1<<8)-1 will generate the bit pattern 0xFF.

Floating-point representations approximate real numbers by encoding numbers of the form x × 2y. IEEE Standard 754 provides for several different precisions, with the most common being single (32 bits) and double (64 bits). IEEE floating point also has representations for special values representing plus and minus infinity, as well as not-a-number.

Floating-point arithmetic must be used very carefully, because it has only limited range and precision, and because it does not obey common mathematical properties such as associativity.

  

这篇关于Study Notes of CS:APP (Chapter 2)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!