2017 NERSC Summer Student Projects (CS Summer Student Program)

📁
Students
💼
NE-NERSC
📅
83459 Requisition #

NERSC 2017 Summer Student Project Descriptions

Lawrence Berkeley National Lab, Berkeley, CA

http://nersc.gov


Are you an exceptional engineer who likes working on truly challenging projects?  Are you passionate about learning and open minded about the way that networks are built? Do you have a passion for organizing and visualizing data to aid in the understanding and development of scientific solutions? Consider spending your summer with the research and development team for Berkeley Lab’s NERSC Division.  


More than 5,000 scientists use NERSC to perform basic scientific research across a wide range of disciplines, including climate modeling, research into new materials, simulations of the early universe, analysis of data from high energy physics experiments, investigations of protein structure, and a host of other scientific endeavors.  


NERSC is known as one of the best-run scientific computing facilities in the world. It provides some of the largest computing and storage systems available anywhere, but what distinguishes the center is its success in creating an environment that makes these resources effective for scientific research. NERSC systems are reliable and secure, and provide a state-of-the-art scientific development environment with the tools needed by the diverse community of NERSC users. NERSC offers scientists intellectual services that empower them to be more effective researchers. 


Summer Student Projects

FILLED: Performance analysis of the NERSC burst buffer workload to accelerate data-intensive discovery

NERSC's flagship system, Cori, is presently the fifth fastest supercomputer in the world with over 700,000 CPU cores and over 1 PB of memory.  To support the data needs of a system this fast, Cori's burst buffer is also one of the world's fastest file systems and is capable of over 1.5 TB/second. Because flash is a relatively new technology in supercomputing, NERSC's users are still discovering new ways to use this burst buffer to accelerate their data-intensive science, and NERSC is still fine-tuning the configuration of the burst buffer to deliver the best performance benefit.


To this end, the student assistant will explore the performance data being collected from the burst buffer hardware and identify opportunities for optimization.  The main duties include:


  • Working closely with the NERSC burst buffer team to develop tools that interface with NERSC's ElasticSearch-based Data Collect

  • Applying machine learning and other statistical analyses to build an understanding of how NERSC's 6,000 users interact with the burst buffer today

  • Translating this understanding into recommendations on how users can modify their workflows to most effectively utilize the burst buffer

  • Providing feedback and guidance to NERSC staff on how to configure the burst buffer's default settings to best suit the needs of its users


The qualifications include:


  • Familiarity with Linux environments is strongly preferred

  • Interest in statistical analysis techniques (including machine learning) or parallel I/O are essential

  • Familiarity with (or interest in learning) Python and libraries relevant to data analytics (including scikit-learn, pandas, and matplotlib) are highly beneficial

  • Undergraduate or graduate student in computer science, mathematics, or a related field.  Ambitious high schools students will also be considered.


––––––––––

FILLED: Understanding application performance on manycore processors


NERSC's flagship system, Cori, is presently the fifth fastest supercomputer in the world and uses 68-core, 272-thread Intel Knights Landing processors. Optimizing applications to effectively utilize such a large degree of parallelism has been a multi-year effort that has resulted in a suite of applications that are now running at scale on Cori. NERSC's Advanced Technologies Group (ATG) will begin using these modernized applications to define the performance targets for the next generation of supercomputers, and it is critical that we develop an understanding of what architectural features will be most important on NERSC's next system. Performance analysis of these extreme-scale applications requires extreme-scale profiling tools and insightful data analysis.


The student assistant will work NERSC ATG to use and develop the Integrated Performance Monitoring (IPM) profiling suite to improve our understanding of application performance. The project will focus on one or more of the following areas according to the assistant's strengths and interests:


Project 1. IPM modernization and feature development. The IPM library collects performance data from many sources, including intercepting MPI, OpenMP and POSIX I/O function calls, accessing the /proc file system and using PAPI to measure hardware performance counters. For this project area, the student assistant's primary duties include:


  • Enhancing IPM to ensure it provides reliable and complete coverage of modern applications that may use new API calls (from MPI-3, OpenMP-3 and OpenMP-4), multi-threaded MPI (MPI_THREAD_MULTIPLE), and mixed-language (C+Fortran) MPI

  • Exploring the addition of new data sources to IPM, e.g. monitoring MSR registers on Intel Knights Landing to obtain power usage, or improve the usage of current data sources, e.g. supporting PAPI multiplexing to measure more performance counters in a single application run.


Project 2. Tools to analyze IPM data. The IPM library writes its performance data to an XML file, and at present, researchers produce performance plots using a Perl script and perform custom analysis using ad-hoc Bash and Python scripts. For this project area, the student assistant's primary duties include:


  • Developing a new Python-based analysis package which will make use of modern analytical packages such as Matplotlib and NumPy

  • Demonstrating this package as a tool to assist in exploratory data analysis on IPM outputs and generating performance plots and summary statistics from the data

  • Designing an extensible interface to enable custom analysis using higher-level data analytics libraries including scikit-learn and Caffe.


Project 3. Analysis of exemplar applications. The performance of the NERSC Exascale Science Applications Program (NESAP) applications will be studied to give insight about how NERSC's next system should be architected to ensure scientific productivity. For this project area, the student assistant's primary duties include:


  • Compiling and running the applications on Cori and collecting performance data with IPM

  • Comparing and contrasting aspects of performance, such as load imbalance, optimal MPI/OpenMP balance per node, memory footprint, fraction of serial work, and use of vector instructions to inform which architectural features would be most beneficial on NERSC's next system


For all project areas, the desired qualifications include:


  • Familiarity with C/Python and software engineering practices

  • Strong foundational knowledge of computer architecture

  • Experience with MPI and related profiling tools

  • Senior undergraduate or graduate student in computer science or a related field


Please indicate which project area(s) are of greatest interest on your cover letter.


––––––––––


Project Title: Compression of neurophysiology data


This would be a collaborative project with UCSF to explore the compressibility of neurophysiological data. Labs are generating on the order of 6TB per day. For this project area, the student assistant's primary duties include:

  • Running different compression algorithms and then assessing the following measures of feature extraction quality: Sampling rate (30kHz,20kHz, etc.,) and Bit-depth(16bit,12bit, etc)

––––––––––


Project Title: Usage and Performance Monitoring and Plotting with Elastic


Project description: Build tools to monitor and plot usage and performance data from large databases, networks, security systems, storage systems, web sites, and other critical NERSC systems using the open-source Elastic Stack. This is a great opportunity to work directly with large systems in a supercomputing center and build practical experience in data analysis and visualization.


Desired skills/background: data parsing and processing, JSON, programming and scripting, text indexing and mining


––––––––––

Project Title: Automating Linux Installation with The Foreman

Project description: With a fully automated process for installing Linux, new servers can be unboxed and be put into service in under an hour instead of taking days or even weeks. The same process can speed recovery from accidental failures or major disasters. This project will involve learning about the existing Linux installation system (The Foreman) and adding improvements to make it truly automatic and turnkey. This is a great opportunity to learn more about server and Linux internals and build practical experience in systems administration for large-scale server deployments.


Desired skills/background: Linux installation and configuration, networking (basic concepts, DHCP, VLANs), programming and scripting, virtual machines


––––––––––

Project Title: Deploying Scalable Web Services with Mesos and Kubernetes

Project description: Docker containers are an innovative new technology to run applications within miniature isolated environments, similar to virtual machines. Mesos and Kubernetes are frameworks that allow these containers to be assembled together to create full software systems, such as a web application with a database backend. This project will involve researching Mesos and/or Kubernetes and learning how to build small test systems using these tools. This is a great opportunity to learn more about Docker and to get experience with application development with modern methods that are becoming very popular in industry.


Desired skills/background: Docker containers, networking (basic concepts), programming and scripting, web development and web servers (Apache, nginx)


––––––––––

Project Title: Building and Enhancing REST APIs to High-Performance Computing (HPC) Management Systems

Project description: Researchers who use the giant supercomputers at NERSC have to manage their use very carefully; they need to keep track of millions of compute hours and terabytes of storage to make sure it’s not wasted. The web-based tools they use to track these details access central systems at NERSC via APIs. This project will involve coding enhancements to these APIs to fix bugs or provide better management capabilities to users. This is a great opportunity to learn more about the internal operations of a supercomputing center and to apply coding skills to practical problems.


Desired skills/background: web programming and scripting (JavaScript, Perl, PHP, Python)


––––––––––


How to Apply 

Students interested in the program must apply on line. Due to the high level of interest in our program, applications will be accepted only through the online application process. 

Complete an online profile, and please provide the following:
  • Your skills and relevant experience 
  • Your interest in the program 
  • Educational information  (note: you must be enrolled into a full-time academic program at an accredited college or university)
  • List your references (name, contact information, relationship to you)
If selected as a finalist, you will be invited to complete a separate job submission that includes reference, citizenship, and voluntary EEO information.

You will be contacted only if you are being considered for selection for this program. We hope to hear from you soon! 

NOTE: You may choose to apply to specific projects in which you're interested. If you do not see a project you are interested in, you are invited to apply to the Computing Sciences Summer Student Program



Equal Employment Opportunity: Berkeley Lab is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status. Berkeley Lab is in compliance with the Pay Transparency Nondiscrimination Provision under 41 CFR 60-1.4.  Click here to view the poster and supplement: "Equal Employment Opportunity is the Law."

Previous Job Searches

My Profile

Create and manage profiles for future opportunities.

Go to Profile

My Submissions

Track your opportunities.

My Submissions

Similar Listings

HR-Human Resources

Bay Area, California

📁 Students

Requisition #: 83882

SN-Scientific Networking

Bay Area, California

📁 Students

Requisition #: 83452

HR-Human Resources

Bay Area, California

📁 Students

Requisition #: 83892

Equal Employment Opportunity: Berkeley Lab is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status. Berkeley Lab is in compliance with the Pay Transparency Nondiscrimination Provision under 41 CFR 60-1.4. Click here to view the poster and supplement: "Equal Employment Opportunity is the Law."

 

Privacy & Security Notice | LBNL is an E-verify Employer | Contact Us


The Lawrence Berkeley National Laboratory provides accommodation to otherwise qualified internal and external applicants who are disabled or become disabled and need assistance with the application process. Internal and external applicants that need such assistance may contact the Lawrence Berkeley National Laboratory to request accommodation by telephone at 510-486-7635, by email to accommodation@lbl.gov or by U.S. mail at EEO/AA Office, One Cyclotron Road, MS90R-2121, Berkeley, CA 94720. These methods of contact have been put in place ONLY to be used by those internal and external applicants requesting accommodation.