FAQ¶
For general questions about HPC2N, suggestions to improve the FAQ or the website, or any other non-support related matters, please email:
- info@hpc2n.umu.se
Do NOT use this email address for anything technical or support related. Those questions should go to support, which you can contact through the support form at SUPR:
Or directly by emailing:
- support@hpc2n.umu.se
User accounts and projects¶
Q: I have forgotten my user password. What should I do?
A: Go here to reset your password. For this to work you need a SUPR account, and your HPC2N user account have to be connected to it (automatic for newer accounts).
Remember to change your password when you have logged in a gain with the new, temporary password.
Q: Where can I see how much CPU time my project used?
A: The information on SUPR is most correct. Go to the SUPR project page and pick your project, then scroll down to see your CPU time usage.
You can also use the command projinfo -p PROJECT-ID -v
(change PROJECT-ID to your own project id), but the information there is not as correct. For more information about projinfo
, look in the section about Project information.
Q: What are typical errors when typing a wrong password during login?
A:
- If you are trying to login with SSH from a terminal (Linux or macOS), a wrongly typed password will just result in the terminal asking for password again, without an error message. NOTE: if you type a wrong password too many times your IP will be banned for a period of time.
- If you are trying to login with puTTY, you will get the error message “
Access denied
“. Here is an example (the exact look may vary depending on your version of PuTTY)
- Sometimes you will get the error:
ssh: connect to host kebnekaise.hpc2n.umu.se port 22: Network is unreachable
This usually means that you have typed the wrong password or username enough times that your IP address has been blocked. You can either wait for 24 hours until it automatically unblocks or email us with your IP so we can unblock it.
- If you are logging in through thinlinc, a pop-up window will tell you if your login or password is wrong
Q: What are other common errors at attempted login and what do they mean?
A:
ssh: Could not resolve hostname kebnekaise.hpc2n.umu.se: Name or service not known
This means there was some network error
Q: What are common errors when resetting password through SUPR?
A:
When I am trying to reset my password I am getting this error message.
Your user account has been deactivated at HPC2N.
.Please contact support@hpc2n.umu.se to reactivare your user account.
Include your username at HPC2N in the mail.
This means you are not currently a member of a project. You either need to apply for one or have your PI add you to a project. You can find more information about this here: Apply for HPC resources of a new project.
HPC2N systems¶
Q: What is the CPU Architecture of the cluster?
A: Look at the Kebnekaise hardware page.
Q: Why can’t I login with SSH Key-Based Authentication?
A: This method of authentification is explicitly disabled on HPC2N’s systems. If you want to use passwordless authentification, you can access HPC2N’s systems through GSSAPI-aware SSH clients. GSSAPI allows you to type your password once when obtaining your Kerberos ticket, and while that ticket is valid you don’t have to retype your password. There is a little more information about this on our login/password page.
Q: Can I access the compute nodes with ssh?
A: No, we do not allow this, mainly since nodes can be shared by different user’s jobs.
Batch system and batch jobs¶
Q: What is the maximum time a job can run?
A: A job can run for up to the number of allocated core hours per month divided by five. However, the maximum number of (walltime) hours any job can run is 168 (or 7 days). For more information see our batch system webpage.
Q: Can I log in to computation nodes to see how my jobs are running?
A: No, we do not allow this, mainly since nodes can be shared by different user’s jobs.
Q: What combination of nodes and cores should I use for a multi-threaded application?
A: At HPC2N we only allow processes of one user to run on a particular node. That way we prevent a situation in which a user with multi-threaded application (which runs as one process, and is thus treated by the batch system, but actually uses multiple processors) competes with other users’ ordinary processes. Supposing you want to run m multi-threaded processes on n processors you need to make sure that each process is allocated to exactly one node:
-
ask the batch system for n processors (cores=n) and add the flag
–tasks-per-node=1
. That will “eat up” the node memory leaving no more space for any other task; See hardware for amount of memory available per node on the systems. -
See the Submit file design page for more information on how to allocate memory.
-
For more complex configurations please contact HPC2N support: support@hpc2n.umu.se.
Q: When submitting a job without specifying a project account, I get an error:
sbatch: error: You must supply an account (-A ...) sbatch: error: Batch job submission failed: Unspecified error
A: There is no default project. You must specify a valid project in your submit file (using the #SBATCH -A
directive).
To apply for a project please see rules described in SUPR under the rounds. You can find more information here.
Q: I got “Unable to allocate resources: Job violates accounting/QOS policy” when I submit a job.
A: This is most likely because the project you are trying to use has expired. You can check the status of your project with:
If you got a new project update your submit file or else you can apply for a new one.
Q: I submitted my job, and when I look at the status, it says “Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions”.
A: This message simply means that your job is in queue, waiting for nodes/cores to become available. The job will start when there are free nodes/cores.
Q: My job is pending and I got “Reason=AssociationResourceLimit” or “Reason=AssocMaxCpuMinutesPerJobLimit”
A: This is because your currently running jobs allocates your entire footprint allowance for your project. The job will start when enough of your running jobs have finished that you are below the limit.
Note
Another possibility is that your job is requesting more resources (more core hours) than your allocation permits.
Remember:
CORES-REQUESTED x WALLTIME = TOTAL-CORE-HOURS-REQUESTED
Note
On Kebnekaise, if you are asking for more than 28 cores, you are accounted for a whole number of nodes, rounded up (Ex. 29 cores -> 2 nodes). See the Batch system policy page for more information on this.
Q: I am used to using the PBS batch system. What are the main differences between that and Slurm (which is used at HPC2N)?
A: There are a number of differences between Slurm and more common systems like PBS. The most important ones are:
- No need to
‘cd $PBS_O_WORKDIR’
. - In Slurm your batchjob starts to run in the directory from which you submitted the script. You do not have to change to that directory with
‘cd PBS_O_WORKDIR’
like you do in PBS. - No need to manually export environment
- The environment variables defined in your shell at the time you submit your script, will be exported to your batch job (in PBS you have to use the flag
‘-V’
to achieve this). This also means any modules you have loaded before submission will be passed along bysrun
andsbatch
. - Location of output files
- The output and error files are created in their final location immediately, not waiting to be moved until completion, like in PBS. This means you can examine the output and error files from your job while it is running, and they are being created.
Comparison of some common commands in Slurm and in PBS / Maui.
Action | Slurm | PBS | Maui |
---|---|---|---|
Get information about the job | scontrol show job JOBID | qstat -f JOBID | checkjob |
Display the queue information | squeue | qstat | showq |
Delete a job | scancel JOBID | qdel | |
Submit a job | srun/sbatch/salloc | qsub | |
Display how many processors are currently free | showbf | ||
Display the expected start time for a job | squeue –start –job JOBID | showstart JOBID | |
Display information about available queues/partitions | sinfo/sshare | qstat -Qf |
Q: How can I control affinity for MPI tasks and OpenMP threads?
A: You can use mpirun’s binding options or srun’s –cpu_bind
option to control the mpi task placement, or hwloc-bind
(from the hwloc module) or numactl
.
GPU/CUDA problems¶
Q: Why can’t I access the Kebnekaise GPU partition? / I get an error like: “sbatch: error: batch job submission failed: Invalid account or account/partition combination specified”
A: The GPU partition and the CPU partition of Kebnekaise became separate resources per 1. January 2023. You need to apply for the GPU part separately to be able to access it.
File systems¶
Q: I accidentally deleted a file. How do I restore it?
A: Your home directory ($HOME) and subdirectories of it are backed up nightly. To request retrieval of files you need to contact support@hpc2n.umu.se. Files removed >30 days ago are irretrivably lost.
Files in /proj/nobackup/ are not backed up and cannot be recovered if deleted.
Compiling and compilers¶
Q: I need to use a specific compiler version with MPI. Which modules should I add?
A: Add the wanted compiler toolchain, with MPI (foss, intel, etc. See our “Installed compilers” page for more information).
For example:
or
Where you add the desired version, which you can find with ml spider foss
or ml spider intel
Read more about modules here.
Parallel Software¶
Q: Can I disable usage of Infiniband by OpenMPI?
A: Use parameter -mca btl ‘^openib’ with mpiexec. Keep in mind that the option is for testing purposes only as your communication would otherwise interfere with other gigabit Ethernet traffic (especially file system traffic).
Q: How can I get access to restrictively licensed software?
A: We need to get a confirmation from a license holder that you can use the software along with a license number and/or complete license name.
Q: Should I use mpirun or srun
A: Both should work interchangeably, though mpirun may not always work with standard input (mpirun prog < file
) and Intel MPI.