A compiler technique for utilizing instruction-level parallelism is presented. The software pipelining algorithm presented concentrates on innermost loops with array accesses dominating variable references. Dependence graphs labeled with either direction or distance information are provided as input to the pipelining algorithm. In the first step of the algorithm, a loop body of minimal schedule length is generated for a machine with infinite resources. This schedule is mapped onto a processor with finite resources in the next step. This division into two steps makes it possible to make use of existing algorithms for DAG scheduling to handle loop scheduling.< >