File Systems¶
This section provides information about the file systems at HPC2N. Since your home directory is per default quite small, you should keep files needed for your jobs in your project storage.
There are also instructions on how to compress and archive your data and other files.
The section ‘Filetransfer’ gives some information about how to do file transfers here at HPC2N.
Overview¶
Project storage | $HOME | /scratch | |
---|---|---|---|
Recommended for batch jobs | Yes | No (size) | Yes |
Backed up | No | Yes | No |
Accessible by the batch system | Yes | Yes | Yes (node only) |
Performance | High | High | Medium |
Default readability | Group only | Owner | Owner |
Permission management | chmod, chgrp, ACL | chmod, chgrp, ACL | N/A for batch jobs |
Notes | This is the storage your group gets allocated through the storage projects |
Your home directory | Per node |
$HOME¶
This is your home-directory (pointed to by the $HOME
variable). It has a quota limit of 25GB per default. Your home directory is backed up regularly.
Since the home directory is quite small, it should not be used for most production jobs. These should instead be run from project storage directories.
To find the path to your home directory, either run pwd just after logging in, or do the following:
It is not generally possible to get more space in your home directory. You should generally use project storage instead. If you need more of that, the PI in your project should apply for it.
However, if you really need more space in your home directory, have your PI contact support@hpc2n.umu.se and include a good explanation of what you need the extra space for.
Project storage¶
Project storage is where a project’s members have the majority of their storage. It is applied for through SUPR, as a storage project. While storage projects needs to be applied for separately, they are usually linked to a compute project.
This is where you should keep your data and run your batch jobs from. It offers high performance when accessed from the nodes making it suitable for storage that are to be accessed from parallel jobs, and your home directory (usually) has too little space.
Project storage is located below /proj/nobackup/ in the directory name selected during the creation of the proposal.
Note
The project storage is not intended for permanent storage and there is NO BACKUP of /proj/nobackup.
Quota¶
The size of the storage depends on the allocation. There are small, medium, and large storage projects, each with their own requirements. You can read about this on SUPR. The quota limits are specific for the project as such, there are no user level quotas on that space.
There are actually 4 quota limits for the project storage space. Soft and hard limit for disk usage and soft and hard limit for the number of files. The hard limits are really hard limits. You can never go above them. You can be above the soft limit for a grace period, but after the grace period the soft limit will behave as a hard limit until you have gone below the soft limit again.
Misc¶
It is recommended to use the project’s storage directory for the projects data. Layout structure in that project directory is the responsibility of the project itself.
Note
- For the PI, make sure to add any user in SUPR that should be granted access to the storage space to the storage project.
- The storage project PI can link one or several compute projects to the storage project, thereby allowing users in the compute project access to the storage project without the PI having to explicitly handle access to the storage project.
/scratch¶
Our recommendation is that you use the project storage instead of /scratch when working on Compute nodes or Login nodes.
On the computers at HPC2N there is a directory called /scratch. It is a small local area split between the user using the node and it can be used saving (temporary) files you create or need during your computations. Please do not save files in /scratch you don’t need when not running jobs on the machine, and please make sure your job removes any temporary files it creates.
When anybody need more space than available on /scratch, we will remove the oldest/largest files without any notices.
Note
There is NO backup of /scratch.
The size of /scratch depends on the type of nodes and that size is split between the number of cores that your job has on the node.
- Kebnekaise, standard compute nodes: ~170 GB
- Kebnekaise, GPU nodes: ~170 GB
- Kebnekaise, Largemem nodes: ~350 GB
SweStore - Nationally Accessible Storage¶
For data archiving and long-term storage we recommend our users to use the SweStore Nationally Accessible Storage. This is a robust, flexible and expandable long term storage system aimed at storing large amounts of data produced by various Swedish research projects.
For more information, see the documentation for SweStore available at docs.swestore.se.
Archiving and compressing¶
There are a number of options for archiving and compressing directories and files at HPC2N.
Note that in the below examples, $
and b-an01 [~]$
are bash
prompts from the terminal and you should not write these.
tar (more information)¶
This program saves many files together into a single archive file, and it also restores individual files from the archive. Automatic archive compression/decompression options exists, as well as special features that allow tar
to be used for incremental and full backups. The command tar –help
will give the format (defaults to gnu). This is generally only important for files larger than 8 GB.
Examples¶
Archive a file
It adds the file “myfile.txt” to the tar archive myfile.tar, without any compression.
Archive and compress a file
It adds the file “myfile.txt” to the tar archive myfile.tar and then does gzip compression.
List contents of a tar archive file
Extract contents of a tar archive file
In this case there were only 1 file in the tar archive, if there had been more, all would have been extracted here.
Archive and compress all files in a directory to a single tar archive file
Archive and compress all files of a certain type to a single tar archive file
In this example, all that are .c, and only those in the current directory and below.
gzip (more information)¶
Compression utility designed as a replacement for compress, with much better compression and no patented algorithms. The standard compression system for all GNU software.
Examples¶
bzip2 (more information)¶
Strong, lossless data compressor based on the Burrows-Wheeler transform. Also available as a library.
Examples¶
zip (more information)¶
Simple compression and file packaging utility. Note that the maximum size limit of a zip file is 4GB and if this size limit is exceeded, the file becomes prone to corruption. This further leads to failure of the extraction process and inaccessibility of your data.
Zip examples¶
Uncompressing myfile.zip
If the file already exists, zip will ask if you want to replace or rename
Compress all files in one directory to a single archive file
Compress all files of a certain type in the current directory (and in directories under this) to a single archive file
In this example case for all .c files
Archiving/compressing on Windows¶
There are a number of Windows programs using the same formats. These are a few of the more popular ones:
- 7-Zip. Free Windows software package that can handle all the above formats.
- WinZip. Commercial Windows software package that can handle all the above formats.
- WinRAR. Commercial Windows software package that can handle all the above formats.
File transfer¶
There are several possible ways to transfer files and data to and from HPC2N’s systems.
The examples below mostly covers Linux and macOS. If you are transferring from a Windows system, then we have some information in the File transfers section under the Windows connection guide.
Note that in the below examples, $
and b-an01 [~]$
are
bash
prompts from the terminal and you should not write these.
Jump to specific section: [ FTP | SCP | SFTP | LFTP | rsync ]
FTP - NOT PERMITTED! ¶
FTP (File Transfer Protocol) is a simple data transfer mechanism. FTP is the original program for data transfer, but it was not designed for secure communications. FTP exists on the systems, but HPC2N does not permit connections using FTP because of the security problems. There are several modern FTP clients which support either SFTP or SCP which are similar, secure protocols for file transfer. Use one of those methods instead of FTP.
SCP ¶
SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH (Secure SHell) protocol. You may use SCP to connect to any system where you have SSH (log-in) access. There are some graphical file transfer programs which offers SCP as protocol and it is also a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.
Command-line usage
From local system to a remote system
Example:
From a remote system to a local system
Example:
Recursive directory copy from a local system to a remote system
Installation
Linux / Solaris / AIX / HP-UX / Unix
The “scp” command line program should already be installed.
Microsoft Windows
macOS / Mac OS X
The “scp” command line program should already be installed. You may start a local terminal window from “Applications->Utilities”.
SFTP ¶
SFTP (SSH File Transfer Protocol or sometimes called Secure File Transfer Protocol) is a network protocol that provides file transfer over a reliable data stream. You may use SFTP to connect to most of HPC2N’s systems. SFTP is a command -line program on most Unix, Linux, and Mac OS X systems. It is also available as a protocol choice in some graphical file transfer programs. SFTP has more features than SCP and allows for other operations on remote files, such as remote directory listing, and it is also possible to resume interrupted transfers. Note, however, that command-line SFTP cannot recursively copy directory contents. If you need to do so, you must either use SCP or a graphical SFTP client.
Command-line usage
Examples
From a local system to a remote system
enterprise-d [~]$ sftp user@kebnekaise.hpc2n.umu.se
Connecting to kebnekaise.hpc2n.umu.se...
user@kebnekaise.hpc2n.umu.se's password:
sftp> put file.c C/file.c
Uploading file.c to /home/u/user/C/file.c
file.c 100% 1 0.0KB/s 00:00
sftp> put -P irf.png pic/
Uploading irf.png to /home/u/user/pic/irf.png
irf.png 100% 2100 2.1KB/s 00:00
sftp>
From a remote system to a local system
The following two flags can be useful
- -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
- -P: optional, preserve file attributes and permissions
Regarding buffer size; in order to find a optimal buffer size, use the following formula:
or
RTT = round trip time
, which you get from ping, since delay is about ping average time/2
.
Example
The data link’s capacity = 1 GB, so optimal buffer size should be 2.653 ms x (1 GB/8 bit) = 331625 bytes
.
See http://fasterdata.es.net/TCP-tuning/ for more information.
Installation
Linux / Solaris / AIX / HP-UX / Unix
The “sftp” command line program should already be installed.
Microsoft Windows
Mac OS X
- The “sftp” command-line program should already be installed. You may start a local terminal window from “Applications->Utilities”.
- MacSFTP
LFTP ¶
LFTP is a command-line file-transfer program for Linux and Unix systems. FTP, HTTP, FISH, SFTP, HTTPS and FTPS protocols. LFTP has additional features not provided by SFTP such as bandwidth throttling, transfer queues, and parallel transfers. It may be used interactively or scripted. Every operation in LFTP is reliable, that is any non-fatal error is handled and the operation is retried automatically. So if downloading breaks, it will be restarted from the point automatically.
In order to connect over SFTP to our resources, the username and hostname shall be prefixed by sftp://
LFTP has shell-like command syntax allowing you to launch several commands in parallel in background (&). It is also possible to group commands within () and execute them in background. All background jobs are executed in the same single process. You can bring a foreground job to background with ^Z (c-z) and back with command wait' (or
fg’ which is alias to wait'). To list running jobs, use command
jobs’. With parallel transfers LFTP can be much faster than SCP or SFTP, so its use is encouraged when possible.
LFTP is simply a client, so it is not needed on the remote machine involved in a transfer (the remote system need only support SFTP).
Examples
Retrieve and compress
The first command retrieves the file from the ftp server and passes its contents to gzip which in turn stores the compressed data to file.gz. Other commands show how to start commands or command groups in the background.
LFTP has a built in mirror which can download or update a whole directory tree. There is also reverse mirror (mirror -R) which uploads or updates a directory tree on server.
More interactive examples
Transfer a directory and all contents from a remote system to a local ssytem, using 5 connections in parallel
Transfer a directory and all contents from a local system to a remote system, using 8 connections in parallel
Batch usage
rsync ¶
rsync
is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license.
Note
Rerunning rsync
will continue the transfer of all files/directories that have not yet been completely transferred. If you have a large file that was partially transferred, it will restart the transfer of that unless you had included the flag --partial
.
Where source
is the source directory, either on a local system, local disk, or remote system and destination
is the destination directory on local or remote system or disk.
rsync
has many useful flags, which you can find with the man rsync
commmand. Here we will only cover the most common:
- -r: recursive
- -a: archive. It syncs recursively and preserves symbolic links, modification times, groups/owners, and permissions. Equivalent to -rlptgoD.
- -v: verbose
- -n: Check before running
- -z: Compress before transfer
- -P: Combines flags –progress (progress bar for tranfer) and –partial (resume interrupted tranfer of a file).
- --no-o: Do not preserve owner. Default unless you use -a. Useful if you have a different username on the remote and local system.
- --no-perms: Do not preserve permissions. Default unless you added -a.
- --no-links: Do note preserve symbolic links. Default unless you are using -a.
Note
- Rerunning
rsync
will continue the transfer of all files/directories that have not yet been completely transferred. If you have a large file that was partially transferred, it will restart the transfer of that unless you had included the flag--partial
. - When preserving modification times, upon rerun
rsync
will only update files that are new or have been modified since the previous run.
Examples
Recursively syncronize the files from a local source directory to another local destination directory
Recursively sync files from one remote directory to a local directory. Also preserve symbolic links and time stamps, and allows resume of partially transferred files on restart.
Recursively sync files from one local dir to another. Also preserve symbolic links, owners, permissions, and modification times
Recursively sync a local directory to a remote destination directory, preserving owners, permission, modification times, and symbolic links
Recursively sync a remote directory to a local directory, while preserving owners, permissions, modification times, and symbolic links
Recursively sync a remote directory to a local directory, preserving owners, permissions, modification times, and symbolic links. Also compress before transfer, show progress bar, and allow to continue transferring a file that was not completed when connection was broken