2 pages / 550 words
Computer Organization Project
1 or 2 students are allowed per group. 3 students and more are strictly not allowed.
Copy source codes and reports will get zero point. Do not share your work.
(10 points) V1: Unoptimized (Code is provided in the end of this directive. )
(10 points) V2: AVX(Code is provided in your book. )
(10 points) V3: AVX + Unroll (Code is provided in your book. )
(10 points) V4: AVX + Unroll + blocked (Code is provided in your book. )
(10 points) V5: GPU (CUDA or OpenCL) (Code from midterm assignment can be used.)
Matrix Sizes: 256×256 512×512 768×768 1024×2024 1280×1280
Measure the runtime for each version and each matrix size. Measure the runtime of functions only, not the printf() runtime.
All codes, including different optimizations are given in your reference book. GPU code for midterm assignment can be used for V5. Just run them in your computer and measure the runtimes in seconds. Draw a comparison table and plot a comparison graph. Compare it with the results shown below that are taken from the reference book. You will use seconds or milliseconds not GFLOPS as measurement unit.
Important note 1: All copy-paste works will get zero point. Do not share your homework.
Note 2: #include<immintrin.h>is needed for running AVX intrinsics.
Academic Report Evaluation:
(10 points) Abstract and Introduction
(10 points) Literature Review and References
(20 points)Explanation of Hardware Acceleration Methods and Source Codes
(10 points) Commenting on Obtained Results
(5 separate source code files for each method should be prepared.)
void dgemm(intn, constdouble* A, constdouble* B, double* C)
int i, j, k;
for (i = 0; i <n; i++)
for (j = 0; j <n; j++)
C[i*n + j] = 0;
for (k =0; k <n; k++)
C[i*n + j] += A[k + i * n] * B[k*n + j];
double *A = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
double *B = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
double *C1 = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
double *C2 = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
for (int i = 0; i <ARRAY_SIZE * ARRAY_SIZE; i++)
A[i] = rand() % 100;
B[i] = rand() % 100;
t = clock();
dgemm(ARRAY_SIZE, A, B, C1);
t = clock() – t;
double elapsed_time = ((double)t) / CLOCKS_PER_SEC;
printf(“Unoptimized DGEMM code took %.6f seconds to execute.\n”, elapsed_time);
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.