1 矩阵向量乘法(6分)
矩阵向量乘法(gemv)如何用OpenMP或pthread对其并行化(OpenMP和pthread任选一种即可)?假设矩阵按行存储(每一行数据是连续的),处理器有32个核。如果矩阵是按列存储呢?具体实现如何修改?
可用语言详细描述或写出伪代码
按行存储
void *worker1(int row, int N, int** A, int* vec, int* result){ for(int i = row; i<N; i += 32){ result[i] = innerProduct(A[i],Vec); } } int main(){ ... for(int i=0;i<32;i++){ data_array[i].row = i; pthread_create(&threads[i], NULL, worker1, (void *)&data_array[i]) } ... }
按列存储
void *worker2(int column, int N, int M, int** A, int* vec, int* result){ for(int j = column; j<M; j += 32){ for(int i = 0; i<N; i++){ result[i] += A[j][i]*Vec[j]; } } } int main(){ ... for(int i=0;i<32;i++){ data_array[i].column = i; pthread_create(&threads[i], NULL, worker1, (void *)&data_array[i]) } ... }
2 程序分析(4分)
以下程序运行时会出现什么现象?可以如何改写来避免此现象发生?
int selected_thread;
sem_t start1, start2, stop1, stop2;
void* worker1() {
sem_wait(&start1);
sem_post(&stop1);
}
void *worker2(){
sem_wait(&start2);
sem_post(&stop2);
}
int main(int argc, char *argv[]) {
selected_thread = 2;
sem_init(&start1);
sem_init(&start2);
sem_init(&stop1);
sem_init(&stop2);
pthread_create(worker1);
pthread_create(worker2);
sem_post(&start1);
sem_wait(&stop1);
sem_wait(&stop2);
return 0;
}
答: worker2会一直等待,程序无法结束。所以在主函数里加上sem_post(&start2)。