P |
164pcds.pdf |
|
|
P |
1997.HPCA97.user_level_dma.ps |
Evangelos p.markatos |
user-level DMA without operating system kernel modification |
P |
21164ds.pdf |
|
|
P |
21164pb.pdf |
|
|
P |
21264pb.pdf |
|
|
P |
70100079.ps |
umesh maheshwari |
collecting cyclic distributed garbage by controlled migration |
P |
93_tr479_the_search_for_lost_cycles.ps |
mark e. crovella |
the search for lost cycles: a new approach to parallel program performance evaluation |
P |
93-547.ps |
rafael h.saavedra |
characterizing the performance space of shared memory computers using micro-benchmarks |
P |
93-spaa.ps |
richard m.karp |
optimal broadcast and summation in the LogP Model |
P |
95-61.pdf |
shahid h.bokhari |
multiphase complete exchange on paragon,SP2 & CS-2 |
P |
96_17.ps |
J.M. Nash |
scalable and portable computing using the WPRAM Model1 |
P |
96001.pdf |
j simon |
on accurate performance prediction for massively parallel systems and its applications |
P |
96005.pdf |
jens simon |
performance prediction of benchmark programs for massively parallel architectures |
P |
96-73.ps |
david cronk |
thread migration in the presence of pointers |
P |
96-SCIZZL.PS |
maximilian ibel |
implementing active messages and split-C for SCI clusters and some architectural implications |
P |
98003.pdf |
j.simon |
the latency-of-data-access model for analysing parallel computation |
P |
ace-intro.ps |
douglas c.schmidt |
the adaptive communication environment an object-oriented network programming toolkit for |
P |
|
|
developing communication software |
P |
ace-ipcl.ps |
douglas c.schmidt |
IPC SAP a family of object-oriented interfaces for local and remote interprocess communication |
P |
ace-jaws.ps |
james c.hu |
JAWS: a framework for high-performance web servers |
P |
aeneas.ps |
herbert w.hamber |
AENEAS a custom-built parallel supercomputer for quantum gravity |
P |
alcover_PDP96 |
r.alcover |
interconnection network design : a statistical analysis of interactions between factors |
P |
alltoall_flood.pdf |
|
|
P |
alltoall_irregular.pdf |
wenheng liu |
portable and scalable algorithms for irregular all-to-all communication |
P |
alltoall_kport.pdf |
ming-syan chen |
optimal all-to-all broadcasting schemes in distributed systems |
P |
alltoall_mesh_paragon.pdf |
shahid h.bokhari |
balancing contention and synchronization on the intel paragon |
P |
alltoall_multi_phase.pdf |
shahid h.bokhari |
multiphase complete exchange on paragon,SP2 & CS-2 |
P |
alltoall_now.pdf |
matt jacunski |
all-to-all broadcast on switch-based clusters if workstations |
P |
alltoall_review.pdf |
ming-syan chen |
on general results for all-to-all broadcast |
P |
am-spec-2_0.ps |
alan mainwaring |
active messge applications programming interface and communication subsystem organization |
P |
analysis.ps |
peter druschel |
network subsystem design: a case for an integrated data path |
P |
arcs.ps |
giovanni chiola |
architectural issues and preliminary benchmarking of a low-cost network of workstations based on |
P |
|
|
active messge |
P |
Avalanche Message Passing.doc |
|
|
P |
bcast-async.ps |
amotz bar-noy |
designing broadcasting algorithms in the postal model for message-passing systems |
P |
bcast-kport.ps |
amotz bar-noy |
broadcasting multiple messages in the multiport model |
P |
BDM.ps |
david a. bader |
practical parallel algorithms for dynamic data redistribution, median finding, and selection |
P |
|
|
(preliminary draft ) |
P |
bench_faq.txt |
|
|
P |
bench_pro8.pdf |
|
|
P |
benchm_muticast_comm.ps |
natawut nupairoj |
benchmarking of multicast communication services |
P |
benchmark.ps |
stephen j.von worley |
microbenchmarking and performance prediction for parallel computers |
P |
benchmarkxx.ps |
brian n.bershad |
using microbenchmarks to evaluate system performance |
P |
bilas-sc97.ps |
angelos bilas |
the effects of communication parameters on end performance of shared virtual memory clusters |
P |
bip-manual.ps |
loic prylli |
BIP messages user manual for BIP 0.94 |
P |
bml.ps |
rafael h.saavedra |
analysis of benchmark characteristics and benchmark performance prediction |
P |
boden_micro95.pdf |
nanette j. boden |
myrinet: a gigabit-per-second local area network |
P |
the bridging_ model_ gap.html |
bruce m.martin |
the bridging model gap: what are bridging models missing? |
P |
cachekernel_ps |
david r.cheriton |
a caching model of operating system kernel functionality |
P |
cacm.ps |
peter druschel |
operating system support for high-speed communication |
P |
CAMPaS1.ps |
y. tanaka |
COMPaS: a pentium pro PC-based SMP cluster and its experience |
P |
cappello97.ps |
peter cappello |
Javelin : internet-based parallel computing using java |
P |
ccc97.ps |
j.m. graham |
models,paradigms and parallel languages : what else do we need ? |
P |
cc-exp_ps |
massimo bernaschi |
collective communication operations: experimental results vs.theory |
P |
challenge_paper.ps |
mike galles |
performance optimizations,implementation, and verification of the SGI challenge multiprocessor |
P |
cheating.ps |
durrell anderson |
cheating the I/O bottleneck: network storage with trapeze/myrinet |
P |
chiola.ps |
giovanni chiola |
GAMMA on dec 2114x with efficient flow control |
P |
choi.ps |
sung-eun choi |
quantifying the effects of communication optimizations |
P |
cierniak97.ps |
michal cierniak |
just-in-time optimizations for high-performance java programs |
P |
ckpt97.ps |
michael litzkow |
checkpoint and migration of UNIX processes in the condor distributed processing system |
P |
clumps.ps |
steven s.lumetta |
multi-protocol active messages on a cluster of SMP's(to appear in the proceedings of SC97) |
P |
cluster.htm |
|
|
P |
cluster.ps |
stephen donaldson |
BSP clusters : high performance,reliable and very low cost |
P |
CMU-1.ps |
jose carlos brustoloni |
user-level protocol serves with kernel-level performance |
P |
CMU-2.ps |
jose carlos brustoloni |
scaling of end-to-end latency with network transmission rate |
P |
CMU-3.ps |
jose carlos brustoloni |
evaluation of data passing and scheduling avoidance |
P |
comp9812-chien.pdf |
andrew a.chien |
design challenges for high-performance network interfaces |
P |
comp9812-lee.pdf |
whay sing lee |
an efficient,protected message interface |
P |
comp9812-user.pdf |
raoul a.f. bhoedjang |
user-level network interface protocols |
P |
comp9812-VIA.pdf |
thorsten von eicken |
evolution of the virtual interface architecture |
P |
COMPaS-report.ps |
y. tanaka |
COMPaS: a pentium pro PC-based SMP cluster and its experience |
P |
complete_exchange.pdf |
shahid h.bokhari |
multiphase complete exchange : a theoretical analysis |
P |
concepts2.ps |
|
|
P |
cong_cont.pdf |
moshe sidi |
congestion control through input rate regulation |
P |
const_multicast_tree_comm_model.ps |
ju young l. park |
construction of optimal multicast trees based on the parameterized communication model |
P |
cost_model_comm_SMP.ps |
nancy M. amato |
a cost model for communication on a symmetric multiprocessor |
P |
cpceng.zip |
|
|
P |
dbs_paper.ps |
yukio murayama |
DBS: a powerful tool for TCP performance evaluations |
P |
dcomspec.txt |
|
|
P |
dcs-tr-362.ps.gz |
b r badrinath |
gathercast: an efficient multi-point to point aggregation mechanism in IP networks |
P |
Dculler.zip |
|
|
P |
disco-tocs.ps |
edouard bugnion |
disco: running commodity operating systems on scalable multiprocessors |
P |
DISI-TR-96-12.ps |
g chiola |
operating system support for fast communications in a network of workstations |
P |
dist_obj_with_CORBA.ps |
steve vinoski |
distributed object computing with CORBA |
P |
donaldhillskill_varscal.ps |
stephen r donaldson |
communication performance optimisation requires minimising variance |
P |
dp_paper.ps |
chun ming lee |
directed point : an efficient communication subsystem for cluster computing |
P |
dsm.ps |
chris holt |
the effects of latency,occupancy,and bandwidth in distributed shared memory multiprocessors |
P |
dugki_pps94.pdf |
dugki min |
a multipath contention model for analyzing job interactions in 2-D mesh multicomputers |
P |
dxbsp-spaa95.ps |
guy e blelloch |
accounting for memory bank contention and delay in high-bandwidth multiprocessors |
P |
E10000.ps |
alan charlesworth |
gigaplane-XB: extending the ultra enterprise family |
P |
eisen_APDC97.pdf |
jorn eisenbiegler |
on the optimization by redundancy using an extended LogP Model |
P |
Elsaad_CC96.pdf |
amr elsaadany |
performance evaluation of switching in local area networks |
P |
europar.pdf |
matt welsh |
low-latency communication over fast ethernet |
P |
exokernel.ps |
dawson r engler |
exokernel : an operating system architecture for application-level resource management |
P |
fast_collective_comm_lib.ps |
prasenjit mitra |
fast collective communication libraries,please |
P |
fci.ps |
franck cappello |
performance evaluation of two programming models for a cluster of PC biprocessors |
P |
firmw_reli_comm_SAN.psa |
angelos bilas |
firmware support for reliable communication and dynamic system configuration in system |
P |
|
|
area networks |
P |
FIT-TR-97-07.ps |
ashley beitz |
a migration-friendly tasking environment for gardens |
P |
fm.pdf |
scott pakin |
fast messages : efficient,portable communication for workstation clusters and MPPs |
P |
FM_sc97bof.ps |
|
|
P |
FM-II_spec.doc |
|
|
P |
fm-pdt.ps |
scott pakin |
fast messages(FM):efficient,portable communication for workstation clusters and massively- |
P |
|
|
parallel processors |
P |
focs94.ps |
robert d blumofe |
scheduling multithreaded computations by work stealing |
P |
foolBM.ps |
david h. bailey |
twelve ways to fool the masses when giving performance results on parallel computers |
P |
frontiers96.ps |
koen langendoen |
integrating polling,interrupts,and thread management |
P |
gathering.ps |
sandeep n. bhatt |
scattering and gathering messages in networks of processors |
P |
gdcast.ps |
amotz bar noy |
multicasting in heterogeneous networks |
P |
geist96.ps |
g a geist |
PVM and MPI : a comparison of features |
P |
gigabit_lan.pdf |
david g cunningham |
IEEE802.12 GIGABIT LAN |
P |
global_mem_manage.ps |
michael j feeley |
implementing global memory management in a workstation cluster |
P |
gms_96asplos.ps |
herve a jamrozik |
reducing network latency using subpages in a global memory environment |
P |
Golin_TCS97.pdf |
mordccai golin |
optimal point-to-point broadcast algorithms via lopsided trees |
P |
gossip-kport.ps |
a bar noy |
computing global combine operations in the multi-port postal model |
P |
Harz_Sevcik_SC93.ps |
karim harzallah |
hot spot analysis in large scale shared memory multiprocessors |
P |
HillCrumptonBurgess_europar96.ps |
jonathan m d hill |
theory,practice,and a tool for BSP performance prediction |
P |
hinet95.ps |
hong xu |
improving PVM performance using ATOMIC user-level protocol |
P |
hori-ccc97.ps |
atsushi hori |
an implementation of parallel operating system for clustered commodity computers |
P |
hoti4_submitted.pdf |
richard gillett |
experience using the first-generation memory channel for PCI network |
P |
hoti97.ps |
brent n chun |
virtual network transport protocols for myrinet |
P |
hotos95.ps |
allen b montz |
scout: a communications -oriented operating system |
P |
hp_8way.pdf |
|
eight-way multiprocessing |
P |
hpc_bsp.html |
|
high performance computing archive bulk synchronous parallel model(BSP)subject area |
P |
hpca97.pdf |
matt welsh |
ATM and fast ethernet network interfaces for user-level communication |
P |
HPCA97.ps |
y tanaka |
a comparision of data-parallel collective communication performance and its application |
P |
hpca98.ps |
remzi h. arpaci-dusseau |
the architectural costs of streaming I/O: a comparisonof workstations,clusters,and SMPs |
P |
hpca98_impact.ps |
shubhendu s. mukherjee |
the impact of data transfer and buffering alternatives on network interface design |
P |
hpca98_nitrans.ps |
ioannis schoinas |
address translation mechanisms in network interfaces |
P |
hpdc7-lauria.ps |
mario lauria |
efficient layering for high speed communication : fast messages 2.X |
P |
hpdc97_final.ps |
silvia m. figueira |
predicting slowdown for networked workstations |
P |
HPVM4.ps |
mario lauria |
experiences on porting MPICH on FM and Myrinet |
P |
hpvm-siam97.ps |
andrew chien |
high performance virtual machines (HPVM): clusters with supercomputing APIs and performance |
P |
Ianne_PDS97.pdf |
giulio lannello |
efficient algorithms for the reduce-scatter operation in LogGP |
P |
ibel.ps |
maximilian ibel |
high-performance cluster computing using SCI |
P |
ibm_aix.pdf |
|
IBM AIX version 4 |
P |
icnp98.ps.gz |
|
|
P |
icpp95-collective.ps |
dhabaleswar k. panda |
issues in designing efficient and practical algorithms for collective communication on wormhole- |
P |
|
|
routed systems |
P |
ics98.ps |
francis o'carroll |
the design and implementation of zero copy MPI using commodity hardware with a high |
P |
|
|
performance network |
P |
ilpapplic_ps.gz |
bengt ahlgren |
the applicability of integrated layer processing |
P |
ilpmodel.ps |
bengt ahlgren |
a performance model for integrated layer processing |
P |
input_buff.pdf |
andreas kitstadter |
fairness and performance limits of contention resoluting mechanisms for input buffered switches |
P |
ipps.ps |
ad pimentel |
an architecture workbench for multicomputers |
P |
IPPS97.ps |
maged m. michael |
relative performance of preemption-safe locking and non-blocking synchronization on |
P |
|
|
multiprogrammed shared memory multiprocessors |
P |
ipps97ULC.ps |
stefanos n. damianakis |
reducing waiting costs in user-level communication |
P |
ISCA23.ps |
olivier maquelin |
polling watchdog: combining polling and interrupts for efficient message handling |
P |
isca92.ps |
thorsten von eicken |
active messages : a mechanism for integrated communication and computation |
P |
isca95-modelling-memory-performance.ps |
t. stricker |
optimizing memory system performance for communication in parallel computers |
P |
isca97.ps |
richard p. martin |
effects of communication latency,overhead,and bandwidth in a cluster architecture |
P |
iwcc99model.ps |
anthony tam |
realistic communication model for parallel computing on cluster |
P |
jaime.ps |
jaime bae kim |
analysis of a finite buffer queue with heterogeneous markov modulated arrival processes : a study |
P |
|
|
of traffic burstiness and priorty packet discarding |
P |
JaJa_PDS96.pdf |
joseph f jaja |
the block distributed memory model |
P |
JaJja_PPS94.pdf |
joseph f jaja |
the block distributed memory model for shared memory multiprocessors (extended abstract ) |
P |
Japan.ps |
daniel a reed |
performance analysis of parallel systems : approaches and open problems |
P |
jb.ps |
zheng wang |
analysis of burstiness and jitter in real-time communications |
P |
jpdc97.ps |
chi chung lam |
optimal algorithms for all-to-all personalized communication on rings and two dimensional tori |
P |
Klein_LT96.pdf |
leonard kleinrock |
the supercomputer supernet testbed : a WDM-based supercomputer interconnect |
P |
Lahch_GLOBECOM96.pdf |
abdelhakim lahchime |
ATM switch architecture modelling under uniform and bursty traffic |
P |
lam-mpi.ps |
greg burns |
LAM: an open cluster environment for MPI |
P |
LANai_prog.ps |
anthony skjellum |
a guide to writing myrinet control programs for LANai 3.x |
P |
LANai4_X_doc.txt |
|
|
P |
latbdw.ps |
patrick h. worley |
a study of application sensitivity to variation in message passing latency and bandwidth |
P |
Icpc96.ps |
pedro diniz |
lock coarsening : eliminating lock overhead in automatically parallelized object-based programs |
P |
LIMBO.ps |
|
the limbo programming language |
P |
Imbench-usenix.ps |
larry mc voy |
imbench : portable tools for performance analysis |
P |
LNPC.zip |
|
|
P |
locking_US-letter.ps |
mats bjorkman |
locking effects in multiprocessor implementations of protocols |
P |
logp.ps |
david culler |
LogP : towards a realistic model of parallel computation |
P |
Logp_Micro.ps |
david culler |
LogP performance assessment of Fast network interfaces |
P |
logpc.ps |
csaba andras moritz |
LoGPC: modeling network contention in message-passing programs |
P |
lopc_97ppopp.pdf |
mattew I. Frank |
LoPC: modeling contention in parallel algorithms |
P |
Lopez_PDP99.pdf |
p. lopez |
optimizing network throughput : optimal versus robust design |
P |
matching.ps |
dimitrios stiliadis |
providing bandwidth guarantees in an input-buffered crossbar switch |
P |
memo-387.ps |
boon s. ang |
message passing support on star T-voyager |
P |
microbench.pdf |
stephen j con worley |
microbenchmarking and performance prediction for parallel computers |
P |
misleadBM.ps |
david h . Bailey |
misleading performance reporting in the supercomputing field |
P |
mmbmSPEC.ps |
allen b downey |
a model for speedup of parallel programs |
P |
model-arch.ps |
mark j. clement |
architectural scaling and analytical performance prediction |
P |
modeling_comm_in_par_alg.ps |
jaswinder pal singh |
modeling communication in parallel algorithms : a fruitful interaction between theory and systems |
P |
model-PVM.ps.gz |
mark j clement |
network performance modeling for PVM clusters |
P |
model-PVM2.ps.gz |
michael r. steed |
performance prediction of PVM programs |
P |
model_of_parallelism.ps |
Todd Heywood |
Models of Parallelism |
P |
model-scaling_ps.gz |
mark j clement |
using analytical performance prediction for architectural scaling |
P |
mpcas_ps.gz |
dmitry arapov |
a parallel language for modular distributed programming |
P |
mpchc_ps.gz |
dmitry arapov |
a progrmming environment for heterogenous distributed memory machines |
P |
memo-387.ps |
dmitry arapov |
a programming environment for heterogenous distributed memory machines |
P |
mpi_guide.ps |
peter s pacheco |
a user's guide to MPI |
P |
mpi_t3d.ps |
kenneth cameron |
CRI/EPCC MPI for CRAY T3D |
P |
mpi-ap1000.ps |
david sitsky |
implementation and performance of the MPI message passing interface on the fujitsu AP1000 multicomputer |
P |
mpicharticle.ps |
william gropp |
a high-performance,portable implementation of the MPI message passing interface standard |
P |
mpidc95_pap.ps |
vasilios georgitsis |
performance of MPL and MPICH on the SP2 system1 |
P |
mpi-pcw94.ps |
david sitsky |
an efficient implementation of the message passing interface (MPI) on the fujitsu AP1000 |
P |
mppm98.ps |
jonathan m d hill |
portability of performance with the BSPLib communications library |
P |
msu-cps-acs-106.ps |
sherry q moore |
a effects of network contention on processor allocation strategies |
P |
msu-cps-acs-106.ps |
sherry q moore |
the effects of network contention on processor allocation strategies |
P |
myrinet-fm-sc95.ps |
scott pakin |
high performance messaging on workstations : illinois fast messages (FM) for myrinet |
P |
NAHU94_PERFORMANCE.ps |
erich m nahum |
performance issues in parallelized network protocols |
P |
NAHU97_CACHE.ps |
erich m nahum |
cache behavior of network protocols |
P |
NAS-97-005.PDF |
andrew sohn |
communication studies of DMP and SMP machines |
P |
NAS-97-017.PDF |
kevin t pedretti |
analysis of 2D torus and hub topologies of 100mb/s ethernet for the whitney commodity computing testbed 1 |
P |
NAS-97-023.PDF |
jeffrey c becker |
predicting cost/performance trade-offs for whitney: a commodity computing cluster |
P |
NAS-97-024.PDF |
samuel a fineberg |
a scalable software architecture booting and configuring in the whitney commodity computing testbed 1 |
P |
NAS-97-025.ps |
samuel a fineberg |
analysis of 100mb/s ethernet for the whitney commodity computing testbed 1 |
P |
NAS-98-003.pdf |
jerry c yan |
performance data gathering and representation from fixed-size statistical data |
P |
NAS-98-012.pdf |
abdul waheed |
performance modeling and measurement of parallelized code for distributed shared memory multiprocessors |
P |
netperf.ps |
chris maeda |
networking performance for microkernels |
P |
Nexuslava-vg.ps |
george thiruvathukal /ian foster |
java interfaces to high performance communication systems |
P |
nfmdcs.ps |
patrick g sobalvarro |
dynamic coscheduling on workstation clusters |
P |
NI_support_SVM_cluster.ps |
angelos bilas |
network interface support for shared virtual memory on clusters |
P |
non-blocking-osdi.ps |
michael greenwald |
the synergy between non-blocking synchronization and operation system structure |
P |
numa-os.ps |
john chapin |
memory system performance of UNIX on CC-NUMA mutiprocessors |
P |
oam.ps |
deborah a wallach |
optimistic active messages: a mechanism for scheduling communication with computation |
P |
optimal_bs_DistMem.ps |
ramesh subramonian |
optimal broadcast in a distributed memory model of parallel computation |
P |
optimal_bs_logp.ps |
richard d karp |
optimal broadcast and summation in the LogP Model |
P |
origin200-1.ps |
harvey wasserman |
performance evaluation of the SGI origin2000: a memory-centric characterization of LANL ASCI applications |
P |
origin200-MenMod.ps |
olaf m lubeck |
developing and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap |
P |
origin200-slides.ps |
federico bassetti |
performance evaluation of the SGI origin2000: a memory-centric characterization of LANL ASCI applications |
P |
|
|
or single node performance : where? Oh where, has it gone ? |
P |
osiris.ps |
peter druschel |
experiences with a high-speed network adaptor : a software perspective |
P |
os-memorysys.ps |
j bradley chen |
the impact of operating system structure on memory system performance |
P |
OSSurvey.ps |
anand r tripathi |
trends in multiprocessor and distributed operating system designs |
P |
p2014.pdf |
steve chapin |
multiprocessor operating systems : harnessing the power |
P |
p2016.pdf |
jorg cordsen |
vote for peace : implementation and performance of a parallel operating system |
P |
p2028.pdf |
koen langendoen |
models for asynchronous message handling |
P |
p259-kay.pdf |
jonathan kay |
the importance of non-data touching processing overheads in TCP/IP |
P |
p298-bruck.pdf |
jehoshua bruck |
efficient algorithms for all-to-all communications in multi-port message-passing systems |
P |
p31.ps |
db skillicorn m danelutto |
optimising data-parallel programs using the BSP cost model |
P |
p584-benveniste.pdf |
caroline benveniste |
parallel simulation of the IBM SP2 interconnection network |
P |
p7.ps |
wf McColl |
the BSP approach to architecture independent parallel programming |
P |
p701.ps |
barry f smith |
an interface for efficient vector scatters and gathers on parallel machines |
P |
pack_loss.pdf |
israel cidon |
analysis of packet loss processes in high-speed networks |
P |
pack_sw.pdf |
ag waters |
fast packet switching : an overview |
P |
paper_ubench_sc97.ps |
cristina hristea |
measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks 1 |
P |
PARKBENC.ps |
roger hockney |
public international benchmarks for parallel computers |
P |
pdp97.ps |
giovanni chiola |
Gamma: a low-cost network of workstations based on active messages |
P |
camera.ps |
y tanaka |
performance improvement by overlapping computation and communication on SMP clusters |
P |
PE_pro_farm.pdf |
alan s wagner |
performance models for the processor farm paradigm |
P |
PE_pro_wormN.pdf |
lionel m ni |
performance evaluation of switch-based wormhole networks |
P |
pedroso97.ps |
hernani pedroso |
web-based metacomputing with JET |
P |
perf_eval_mpi_clust.ps |
natawut nupairoj |
performance evaluation of some MPI implementations on workstation clusters |
P |
perf_eval_sw_wormh.pdf |
lionel m ni |
performance evaluation of switch-based wormhole networks |
P |
perfmodel.ps |
jurgen brchm |
performance modeling for SPMD message-passing programs |
P |
philippsen_97.ps |
michael philippsen |
javaparty-transparent remote objects in java |
P |
pipeline.ps |
randolph y wang |
modeling communication pipeline latency |
P |
planet.pdf |
inder gopal |
network transparency:the plaNET approach |
P |
pnode_cps.ps |
g lannello |
performance analysis of distributed memory computers with parallel node architecture |
P |
proc_mig_iss.ps |
alberto zubiri |
design issues on process migration |
P |
profile_myrinet.ps |
ilia gilderman |
profiling the communication layers performance of the myrinet gigabit LAN |
P |
prop.ps |
rolf riesen |
using kernel extensions to decrease the latency of user-level communication primitives |
P |
PUPA.ps |
manish verma |
pupa: a low-latency communication suystem for fast ethernet |
P |
pupa_draft.ps |
manish verma |
alow latency communication subsystem (in preparation ) |
P |
questions&answersBSP.ps |
db skillicorn m danelutto |
questions and answers about BSP |
P |
queueing_theory.ps |
georgios y lazarou |
continuous-time markov chains and queueing theory |
P |
random_delay.ps |
vikram s adve |
the influence of random delays on parallel execution times |
P |
REAL.ps |
eric grosse |
real inferno |
P |
revguide.pdf |
|
solaris 2.6 reviewer's guide |
P |
Roda_PACT98.pdf |
roda j rodriguez c |
breaking the barriers : two models for MPI programming |
P |
Roda_PDP99.pdf |
roda jl sande f |
the collective computing model |
P |
RPMark95.pdf |
|
RPMark tm 95 |
P |
SALE95_PERFORMANCE.ps |
james d salehi |
the performance impact of scheduling for cache affinity in parallel newtork processing |
P |
sale96_Effectiveness-TON.ps |
james d salehi |
the effectiveness of affinity-based scheduling in multiprocessor networking ( extended version ) |
P |
SC94-paper.ps |
mike barnett |
building a high-performance collective communication library |
P |
sc97.ps |
steven s.lumetta |
multi-protocol active messages on a cluster of SMP's(to appear in the proceedings of SC97) |
P |
sc97ninf.ps |
atsuko tukefusu |
multi-client LAN/WAN performance analysis of ninf : a high-performance global computing system |
P |
sc98USC.pdf |
soichiro araki |
user-space communication : a quantitative study |
P |
scaleableserversap.ppt |
|
|
P |
sccs-0544.ps |
sanjay ranka |
irregular personalized communication on distributed memory machines |
P |
sch_comm.pdf |
|
|
P |
sched_comm_smp.ps |
babak falsafi |
scheduling communication on an SMP node parallel machine |
P |
SCIENTIFIC.COMP.ARCH.FM.ps |
william e johnson |
rationale and strategy for a 21st century scientific computing architecture: the case for using commercial |
P |
|
|
symmetric multiprocessors as supercomputers |
P |
sensitivity-bw-lat-97.ps |
rajeev barua |
the sensitivity of communication mechanisms to bandwidth and latency |
P |
sigcomm96.ps |
david mosberger |
analysis of techniques to improve protocol processing latency |
P |
sigmetrics.ps |
remzi h arpaci |
ther interaction of parallel and sequential workloads on a network of workstations |
P |
sigmetrics97-paper.ps |
aaron b brown |
operating system benchmarking in the wake of lmbench: a case study of the performance of NetBSD on the |
P |
|
|
x86 architecture |
P |
sivaram_PPS98.pdf |
rajeev sivaram |
HIPIQS: a high-performance switch architecture using input queuing |
P |
smp.ps |
|
|
P |
SMP-OSF.ps |
jeffrey m denbam |
DEC OSF/1 version 3.0 symmetric multiprocessing implementation |
P |
softw_VMMC.ps |
cezary dubnicki |
software support for virtual memory-mapped communication |
P |
solarissmo.ps |
|
|
P |
sort.ps |
andrea c dusseau |
fast parallel sorting under LogP: experience with the CM-5 |
P |
Sort_Logp.ps |
andrea carol dusseau |
modeling parallel sorts with LogP on the CM-5 |
P |
sosp.pdf |
thorsten von eicken |
U-Net : a user-level network interface for parallel and distributed computing |
P |
sosp16.ps |
armando fox |
cluster-based scalable network services |
P |
SOSP95-oschar.ps |
mendel rosenblum |
the impact of architectural trends on operating system performance |
P |
sosp97.ps |
hermann hartig |
the performance of u- kernel-based systems |
P |
spaa96.ps |
robert d blumofe |
an analysis of dag-consistent distributed shared-memory algorithms |
P |
spinlock.pdf |
anna r karlin |
empirical studies of competitive spinning for a shared-memory multiprocessor |
P |
splc96_pap.ps |
vasilios georgitsis |
message passing performance on SP systems |
P |
SRC-1997-016a.ps |
jennifer m anderson |
continuous profiling : where have all the cycles gone ? |
P |
stability.ps |
jonathan md hill |
stability of communication performance in practice : from the cray T3E to networks of workstations |
P |
stat9514.ps.gz |
r alexander |
modelling self-similar network traffic |
P |
stott_FTCS97.pdf |
david t stott |
dependability analysis of a commerical high-speed network |
P |
super95.ps |
andrew s tanenbaum |
a comparision of three microkernels |
P |
sw_lan.pdf |
wenjian Qiao |
network planning and trning in switch-based LANs |
P |
tab.gif |
|
|
P |
tcpip.ps |
stephen r donaldson |
predictable communication on unpredictable networks : implementing BSP over TCP/IP |
P |
Texas.pdf |
david patterson |
intelligent RAM(IRAM): chips that remember and compute |
P |
tezuka-hpcn97.ps |
hiroshi tezuka |
PM: an operating system coordinated high performance communication library |
P |
tezuka-ipps98.ps |
hiroshi tezuka |
pin-down cache : a virtual memory management technique for zero-copy communication |
P |
TOMPI.ps |
erik d demaine |
a threads-only MPI implementation for the development of parallel programs |
P |
Tools94.ps |
daniel a reed |
experimental analysis of parallel systems : techniques and open problems |
P |
top500_9806.ps |
jack j dongarra |
TOP500 supercomputer sites 11th edition |
P |
tr_95-12.ps |
swamy s kocherlakota |
predicting the performance of a wormhole-routed multicomputer with non-uniform communication |
P |
|
|
technical report CPS-95-12 |
P |
tcpip.ps |
stephen r donaldson |
predictable communication on unpredictable networks : implementing BSP over TCP/IP |
P |
TR44.ps |
matt jacunski |
all-to-all broadcast on switch-based clusters of workstations |
P |
TR93-04.ps |
michael a pagels |
cache and TLB effectiveness in the processing of network data |
P |
tr95-02.ps |
dr roberto togneri |
parallel program analysis on workstation clusters : speedup profiling and latency hiding |
P |
TR95-273.ps |
saurab nog |
a performance comparison of TCP/IP and MPI on FDDI, fast ethernet and ethernet |
P |
tr96015.ps |
hiroshi tezuka |
PM: a high-performance communication library for multi-user parallel environments |
P |
TR96-03.ps |
david mosberger |
analysis of techniques to improve protocol processing latency |
P |
TR-96-04-01.ps |
yong yan |
an effective and practical performance prediction model for parallel computing on non-dedicated |
P |
|
|
heterogeneous NOW |
P |
tr97006.ps |
hiroshi tezuka |
pin-down cache : a virtual memory management technique for zero-copy communication |
P |
TR-97-1.ps |
xing du |
characterizing communication interactions of parallel and sequential jobs on networks of workstations |
P |
tr97-11.ps |
c greg plaxton |
accessing nearby copies of replicated objects in a distributed environment |
P |
TR-97-3.ps |
xing du |
coordinating parallel processes on networks of workstations |
P |
transbsp.ps |
db skillicorn |
multiprogramming BSP programs |
P |
transpose_99.ps |
christina christara |
an efficient transposition algorithm for distributed memory computers |
P |
trapeze.ps |
kenneth g yocum |
cut-through delivery in trapeze : an exercise in low-latency messaging |
P |
two-case-delivery.pdf |
kenneth mackenzie |
exploiting two-case delivery for fast protected messaging |
P |
ucb.ps |
brent n chun |
virtual network transport protocols for myrinet |
P |
udp.ps |
stephen r donaldson |
predictable communication on unpredictable networks : implementing BSP over TCP/IP and UDP/IP |
P |
unetmm.pdf |
matt welsh |
incorporating memory management into user-level network interfaces |
P |
unet-sle.ps |
david oppenheimer |
user customization of virtual network interfaces with U-Net/SLE |
P |
unixc-30.pdf |
budi rahardjo |
summary of UNIX commands |
P |
Unrau_etal_OSDI94.ps |
ronald c unrau |
experiences with locking in a NUMA multiprocessor operating system kernel |
P |
usenix96.ps |
bj murphy |
an analysis of process and memory models to support high-speed networking in a UNIX environment |
P |
usenix-w93.ps |
kevin fall |
exploiting in -kernel data paths to improve I/O throughput and CPU availability |
P |
user_level_protc |
chris maeda |
protocol service decomposition for high-performance networking |
P |
util_profile_partitioning_MP.ps |
john d evans |
using utilization profiles in allocation and partitioning for multiprocessor systems |
P |
uw-cse-93-03-01.ps |
chandramohan a thekkath |
implementing network protocols at user level |
P |
UW-CSE-94-07-04.PS |
chandramohan a thekkath |
separating data and control transfer in distributed operating systems |
P |
via.ps |
dave dunning |
the virtual interface architecture |
P |
vinoski.ps |
steve vinoski |
CORBA: integrating diverse applications within distributed heterogeneous environments |
P |
webos.pdf |
amin vahdat |
webOS: operating system services for wide area applications |
P |
Wilton_Vranesic_SPDP.ps |
steven je wilton |
architectural support for block transfers in a shared-memory multiprocessor |
P |
wks.ps |
v boudet |
algorithmic issues for ( distributed ) heterogeneous computing platforms |
P |
wp-solaris2_6.pdf |
|
sun solaris operating environment |
P |
WRL-TN-16.ps |
jeffrey c mogul |
the effect of context switches on cache performance |
P |
wucs-95-06.ps |
raman gopalakrishna |
real-time upcalls: a mechanism to provide real-time processing guarantees |
P |
wucs-96-11.ps |
r gopalakrishnan |
efficient user space protocol implementations with QoS guarantees using real-time upcalls |
P |
Yang_ICOIN98.pdf |
muh-rong yang |
the design of a very large high performance gigabit switch with shared buffers |
P |
YATE95_NETWORKING.PS |
david j yates |
networking support for large scale multiprocessor servers |
P |
ZHAN95_CALL_ADM.PS |
zhi-li zhang |
call admission control schemes under generalized processor sharing scheduling |
P |
zounds.ps |
ron minnich |
zounds: zero overhead unified network Dsm system |
P |
AroraLM94.ps.gz |
S. Arora |
On-line Algorithms for Path Selection in a Nonblocking Network |