Chapter3. Arithmetic for Computers

Homework #3

Computer Arithmetic
- integers 동작
  
  Addition, Subtraction, Multiplication, division and overflow 처리
  
  뺄셈은 보수를 취해서 하고, Overflow 발생 상황에 대해 염두해야 함
- Floating-point real number
  
  Representation, operations
Multimedia Arithmetic
- 그래픽이나 미디어 처리는 8bit이나 16bit 데이터의 Vector로 동작
  
  64bit Adder with 분할된 Carry chain- 8x8, 4x16, 2x32 벡터 동작 등
  
  SIMD (single-instruction, multiple-data)
- Saturating operations
  
  OV 발생 시 result는 표현할 수 있는 가장 큰 값으로 표현 (선 그어서 날리기)
Multiplication
- Multiplicand/Product에서의 shift와 ALU에서의 add는 Parallel하게 동작
- 만약 multiple adder를 쓴다면 pipelined되어 fast하게 가능
- RISC-V Multiplication
  
  mul: product의 lower 64bit에 대한 곱
  
  mulh: product의 upper 64bit에 대한 곱
  
  mulhu: product의 upper 64bit에 대한 곱, operands unsigned 가정
  
  mulhsu: product의 upper 64bit에 대한 곱, operands가 un/signed 둘 다 일 때
  - mulh 결과는 64bit overflow에 대한 점검으로도 사용
Division

dividend: 원래 값

divisor: 뭘로 나눌지

quotient: 몫
1. divisor가 0인지 점검하고
2. divisor ≤ dividend ? quotient++ 하고 dividend - divisor 한다
3. Restoring division: remainder가 0보다 작으면 divisor를 다시 backup
4. Signed division: 필요하면 quotient랑 remainder sign 조정
- Start >
1. Remainder에서 Divisor 빼고 결과를 Remainder에 넣는다
2. Remainter가 0보다 크면, a 아니면 b
  1. Quotient 레지스터를 left shift하고, 최하단 new bit는 1 setting
  2. Divisor랑 Remainder adding해서 원래 값 복원하고 Remainder에 넣는다.
    
    여기도 Quotient는 left shift하고, 최하단 new bit은 0 setting
3. Divisor 레지스터는 shift right 1bit한다.
4. 64bit 끝날 때 까지 (64번) 반복한다.
이것도 마찬가지로 ALU랑 shift가 Parallel하게 동작 가능
- 근데 MUL처럼 Parallel HW는 못쓴다 (Remainder의 Sign에 따라 조건부라서)
- Step마다 Multiple Quotient bit을 생성하여 빠르게하긴 함 (e.g. SRT division)
div, rem: signed divide, remainder

divu, remu: unsigned divide, remainder

Overflow나 division by zero로 error 만들진 않는다. 정해진 값을 return할 뿐
Floating Point
- non-integral 숫자 표현법 + 매우 작거나 큰 수
- scientific notation이랑 비슷함. -2.34 x 10^56, 0.002 x 10^-4 등
- C에서는 float이랑 double 자료형
- Single precision(32bit), Double precision(64bit)
  - Normalized된 Significand는 1.0 ≤ |significand| < 2.0 ; 1+Fraction
  - IEEE 754에서 Bias는 single은 127(7bit최대), double은 1023(9bit 최대) 씀
    
    → Exponent는 unsigned 가정함. 그래서 최소 값은 2^(1-127), Fraction은 1.0이 최소
    
    → 최대 값은 exponent는 2^(254 - 127), fraction은 1.111…이라 근사 2.0
- Relative Precision은 single에서 대략 2^-23 (Fraction Bit수가 23bit이라서?)
-0.75를 FP로 변환하는 예시
- S = 1
- 0.75 x 2 = 1.5 (0.1x), 0.5 x 2 = 1.0 (0.11) ; Fraction은 0.11
  
  Normalize하면 1.1 x 2^-1임
- bias는 127이므로 Exponent = 126임
→ 1_01111110_00000000000000000000001b
- 참고로 Fraction은 가수부에서 1을 뺀 숫자 (hidden bit이라고 함)이고
  
  Mantissa는 1을 포함한 숫자 같음. 비트 수 자체를 강조하는 FP8 등은 Mantissa가 적절한 듯
Exponent = 111…1은 왜 못쓸까?, 약속이 있음
- Infinity: Exponent = 111…1, Faction = 000…0
  
  OV check 피하면서 연속적인 연산하기 위해 사용됨
- NaN (Not-a-Number): Exponent = 111…1, Fraction =/ 000…0
  
  똑같은데, illegal 숫자를 뜻함.
Floating Point의 Addition
1. 작은쪽으로 exponent를 통일하고
2. significands를 더한다
3. 결과를 Normalize하고 OV/UV 점검한다
FP Adder Hardware
- integer adder보다 훨씬 복잡하고 Clock Cycle 많이 먹음, pipeline 중요
FP Multiplication
1. Exponent끼리는 더하고
2. Significands들은 곱하고
3. Normalize하고
4. OV/UV check하고 Round해라
5. Sign 계산해서 붙인다
FP Arithmetic HW
FP Instruction in RISC-V
FP Example: Array Multiplication
Accurate Arithmetic
Sub-word Parallelism
gemm, using Vector