I am trying to remote read a netcdf file.
I used Paramiko package to read my file, like this:
import paramiko from netCDF4 import Dataset client = paramiko.SSHClient() client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) client.connect(hostname=’hostname’, username=’usrname’, password=’mypassword’) sftp_client = client.open_sftp() ncfile = sftp_client.open('mynetCDFfile') b_ncfile = ncfile.read() # **** nc = Dataset('test.nc', memory=b_ncfile)
But the run speed of ncfile.read()
is VERY SLOW.
So my question is: Is there any alternative way to read a netcdf file remotely, or is there any approach to speed up paramiko.sftp_file.SFTPFile.read()
?
Calling SFTPFile.prefetch
should increase the read speed:
ncfile = sftp_client.open('mynetCDFfile') ncfile.prefetch() b_ncfile = ncfile.read()
Another option is enabling read buffering, using bufsize
parameter of SFTPClient.open
:
ncfile = sftp_client.open('mynetCDFfile', bufsize=32768) b_ncfile = ncfile.read()
(32768
is a value of SFTPFile.MAX_REQUEST_SIZE
)
Similarly for writes/uploads:
Writing to a file on SFTP server opened using pysftp "open" method is slow.
Yet another option is to explicitly specify the amount of data to read (it makes BufferedFile.read
take a more efficient code path):
ncfile = sftp_client.open('mynetCDFfile') b_ncfile = ncfile.read(ncfile.stat().st_size)
If none of that works, you can download the whole file to memory instead:
Use pdfplumber and Paramiko to read a PDF file from an SFTP server
Obligatory warning: Do not use AutoAddPolicy
this way – You are losing a protection against MITM attacks by doing so. For a correct solution, see Paramiko "Unknown Server".