Ssh Packet Type



As I’m delivering my Linux Troubleshooting training soon, I am going to blog about some typical issues and techniques we’ll troubleshoot in the class too.

I’ll start from a relatively simple problem - logging in to a server via SSH always takes 10 seconds. The delay seems to be pretty constant, there don’t seem to be major network problems and the server is not overloaded. Yet, remote logins always take 10 seconds.

If you’be been around, you probably already know a couple of likely causes for this, but I want to approach this problem systematically. How to troubleshoot such issues without relying on lucky guesses or having previous experience with usual suspects. You never know, next time the root cause may be different - or you have to troubleshoot a completely different application.

Anyway, let’s get started, I’ll use the time command to measure how long it takes to SSH to my server and run the hostname command there:

After my key is sent to Mina and accepted with Server accepts key (and Bitbucket logs the acceptance), the AIX openssh client sends packet type 106, then the key is rejected. The workaround is to set an option in my /.ssh/config file: AllowPKCS12keystoreAutoOpen no. Then I can git clone successfully. The SSH client sends a packet to the server and hangs for 10 seconds (send packet: type 50) The server response arrives and everything proceeds normally (receive packet: type 51) So, we are waiting for 10 seconds for the server to get back to us with some kind of response (after successfully establishing a TCP connection). Authenticated to fractal.mit.edu (18.53.0.111:22). Debug1: channel 0: new client-session debug3: sshsession2open: channelnew: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com debug3: send packet: type 80 debug1: Entering interactive session. Debug1: pledge: network debug3. RFC 4253 SSH Transport Layer Protocol January 2006 The 'twofish-cbc' or 'twofish256-cbc' cipher is Twofish in CBC mode, with 256-bit keys as described TWOFISH. This is a block cipher with 16-byte blocks. The 'twofish192-cbc' cipher is the same as above, but with a 192-bit key.

The command above takes a bit over 10 seconds every time I execute it. We can’t immediately assume that this must be a server side thing (clients can also have problems), so let’s drill down deeper.

I’ll use “application instrumentation” built in to the SSH client itself, namely the -vvv “very verbose” flag. As there’s a lot of output, I’ve removed some irrelevant sections and replaced them with “…” and inline comments are in italic:

So, the SSH verbose mode log is telling us that:

  1. The TCP-level connection is successfully established
  2. The SSH application-level handshake continues
  3. The SSH client sends a packet to the server and hangs for 10 seconds (send packet: type 50)
  4. The server response arrives and everything proceeds normally (receive packet: type 51)

So, we are waiting for 10 seconds for the server to get back to us with some kind of response (after successfully establishing a TCP connection). This doesn’t seem to be a network problem, but rather a server-side issue with sshd.

One might raise the question of whether to trust the SSH client logs, maybe there’s something going on under the hood that we don’t see from the client logs and it’s not a server problem. This is a valid question, but in the interest of brevity, I choose to trust the application logs for now and take a look into the server next.

So let’s log in to the server and look into the SSH daemon there:

The above output is the “normal” state for my server, where no-one is waiting to log in. Now let’s run the earlier SSH command from the client side again and see what happens:

A couple of new processes have appeared. Looks like sshd creates a new process for the incoming connection and helpfully renames the process “comment” to sshd: [accepted] so we could see in what stage this process currently is (PostgreSQL also does this and it makes getting some visibility inside the app from OS level much easier). However, we still don’t know why one (or both) processes hang for this 10 seconds. Let’s drill even deeper and measure what these server side processes are doing during a login operaton.

Now is the time to say that there are multiple ways to dig down deeper, multiple tools that can be used. I will start from one of the oldest Linux tracing tools available, called strace. It is a great tool for quickly tracing the system calls issued by a process and optionally its child processes too.

Note that strace is not a 100% safe tool for use on critical processes in production.It uses the Linux ptrace() system call under the hood that allows peeking inside otherprocesses and even modifying their memory and registers. However, there’s a smallchance of messing up the signal handling or crashing that busy process while strace runs.Additionally, strace will greatly slow downthe traced process while enabled. Nevertheless, it is often the quickest and most practicaltool for troubleshooting what a process is up to, especially in non-production. It is also one of the few options you havewithout requiring root privileges for installing or accessing more advanced tools.I will write about other techniques in upcoming articles.

I will run the following command on the main SSH daemon process (pid 10635). I will have to run it as root as the target process is owned by root. You can just man strace to learn more about it, so here I will explain only the command line options that I’ll use below:

  • -r: Print elapsed time between system call starts (in the 2nd column in the output below)
  • -T: Print the elapsed times (durations) of syscalls (the rightmost fields like “<0.000364>')
  • -f: Follow and trace child processes too
  • -p: Attach to PID 10635 and trace that

Ok, let’s do some tracing!

The select syscall above (waiting for data from file descriptors) took 3.49 seconds to complete simply because it took me over 3 seconds to run the SSH command from my other terminal window after starting tracing. Then accept accepted a connection on AF_INET type (TCP or UDP) socket from the IP 192.168.0.179.

The clone() syscall above essentially forks a new child process. It creates threads within an existing process address space too, but in this case there’s no CLONE_VM flag specified, so we get a whole new process with its own virtual address space. Since there’s more than one process/task to trace now, strace starts prefixing the output line with the corresponding PID (to be precise, it’s the linux task ID, but let’s talk about that some other day).

The execve() above is where the new child process starts executing the /usr/bin/sshd binary how it wants and allocates some memory too (brk(), mmap() with the MAP_PRIVATE|MAP_ANONYMOUS flags). And now we are finally getting to the interesting part, given that we are troubleshooting SSH login delay. All other system calls have been executed very quickly, but if you look into the highlighted timing columns below, you’ll see that there are 2 poll() syscalls that each took around 5 seconds before timing out:

You’ll need to look into the rightmost highlighted numbers (like “<5.005176>') to see the duration of a given system call as the highlighted column in the left attributes the elapsed time to the next event after the previous system call. So in both cases, the poll([{fd=4, events=POLLIN}], 1, 5000) syscalls were waiting for 5 seconds each. So, POLLIN means that they were trying to read something from somewhere when they ended up with a timeout. Luckily the first argument fd=4 tells us where we tried to read from - file descriptor number 4.

All we need to do now is walk backward in the trace to find what that file descriptor #4 actually pointed to in that process at the time. You’d just scroll upwards in the file and eyeball & search for syscalls like open(), socket(), dup() that return file descriptors. Or with bigger trace files, just search for = 4 backward in the file to find the immediately preceding syscall that created and returned file descriptor 4 as a result. You’d need to make sure that you find the closest preceding syscall that returns 4 as a file descriptor number (not some other random syscall that happens to return a number 4 with a different meaning). man syscalls and man <syscall_name> provide all the info you need.

I’m pasting the top 2 lines from the previous output here again for reference (formatted fit on screen):

When I searched for = 4 backward, starting from the slow poll() syscall in the trace, I found the socket() syscall highlighted above. It has created an “internet” socket (AF_INET) and it’s not a TCP socket, but a UDP one (due to the SOCK_DGRAM flag). The next connect() system call associates this socket’s connection with IP address 1.2.3.4 & port 53. Because of the stateless nature of UDP “connections”, just running connect() for UDP sockets will not actually check if the target host exists and is responding. That’s why the connect() syscall “succeeded” in just 19 microseconds.

So I could ping the host 1.2.3.4, but this isn’t fully systematic as our SSH Daemon is not timing out when trying to ping the server. Additionally, some servers (or firewalls) may have ping ICMP packet traffic blocked, so it may look like the whole server is down, but it’s just the ping packets that don’t go through.

We could just google what the port 53 typically means, but there’s plenty of information stored within Linux itself:

It’s the DNS service port! It’s a DNS lookup! Apparently, the SSH Daemon is trying to talk to a DNS server (on IP 1.2.3.4) whenever my test user is trying to log in. Many of you probably knew from the start that a usual suspect causing such problems is a reverse DNS lookup where the daemon wants to find the fully qualified domain name of whatever client IP address is connected to it. This is useful for logging purposes (but less useful for security & whitelisting as DNS could be hijacked/spoofed).

Let’s use nslookup to do a DNS lookup using IP 1.2.3.4:

Indeed, the request hung for 10 seconds and then timed out. So this server is down (or there’s some network or configuration issue going on). Let’s check what is the default nameserver for my server where the SSH daemon runs:

Ok, “someone” has configured this server’s default nameserver to IP 1.2.3.4. And since it’s not responding whenever SSH Daemon attempts its reverse lookup, we have this problem. So all I need to do is to configure my server to use a functioning nameserver. In my case I changed the nameserver value to 192.168.0.1 and problem solved:

You may have seen such problems in your environment in past, typically it happens when the enterprise networking folks change some firewall rules and a server in one region can’t access a DNS server (in another region) anymore. Or when your computer’s VPN/DHCP client messes around with your nameserver settings on connect (but doesn’t change them back when disconnecting from the VPN).

Appendix: Analyzing Large Trace Files

If you experiment with strace, you’ll quickly see that it writes a lot of output. You can ask strace to trace only specific system calls of interest, however in our scenario, we didn’t know which syscall type took time in advance. So, full tracing with some postprocessing may be needed in order to find the longest-taking system calls out of a big tracefile. Here are a couple of examples of writing the output trace to a file (-o /tmp/x.trc) and how to extract the top syscall durations from the tracefile:

Now we have the trace output written to a file. The 2nd column shows the time between when the trace lines were written (between syscall starts):

So we could just use sort -k 2 to sort this text based on column 2, also treat the values as numeric when sorting (-n):

The above output shows you the trace lines that had the biggest time since the syscall before them started (thanks to the strace -r option). So the close(4) syscall itself didn’t take 5 seconds, but the previous syscall before that (and whatever userland processing, scheduling delays etc) took 5 seconds. You would need to open the .trc file with an editor and search for the time taken (or use grep -B 1) to see what came just before these lines in the trace.

Another approach that uses the actual syscall duration (the rightmost field in trace output written thanks to the strace -T option) would require a bit of text processing with AWK:

The command above sorts the trace by the last field of each line, the individual system call duration. Again, you could use the precise duration value (like 5.005108) to find the location of interest in the full tracefile for detailed analysis. Or make AWK or grep print out the tracefile line number somewhere too. Note that the 10+ second syscalls in the top of the output are due to me taking 10 seconds before issuing the demo SSH command in another terminal window (sshd had been just idle, waiting for anyone to connect and talk to it when I started tracing 10 seconds before).

Summary

This concludes the systematic troubleshooting example in this article. strace is just one of the potential tools you could use, I will cover some other options in the future. Often it is “cheaper” to start from checking for a couple of usual suspects (from experience), before starting to trace stuff. However, not all problems fall into the “usual suspects” category, so you need a systematic troubleshooting approach to be ready to deal with any issue.

If you want to get notified of new troubleshooting articles, you can follow me on Twitter - or sign up for weekly updates by email on the Connect page.

Thu 08 March 2018| article

Why read this?

As part of meeting the Accounting component of the AAA (Authorization, Authentication and Accounting) framework, each event and action on the server and/or the client-side are recorded by SFTPPlus.These events have an associated Event ID which is also publicly searchable both on our website and on the internal documentation included in the software package that you have downloaded.

System and network administrators touching on logs - be it in the most verbose format or not - may find this article describing the breakdown of such logs helpful.

For this example, we will be touching on SFTPPlus SFTP transfers from both the client-side and server-side only.Please do not hesitate to get in touch with us if you are interested in learning more about other file transfer protocols.

This article was written as of SFTPPlus version 3.31.0.

SFTPPlus SFTP Server-side Perspective

Initial configuration notes

If you are currently evaluating SFTPPlus, please follow our documentation to learn more about how you can configure your database and event handlers to suit your specifications.

Read more about configuring databases with SFTPPlus.

Read more about configuring event handlers.These provide further ways to configure SFTPPlus to create logging actions based on the events recorded.

Even if you are an existing customer, you can follow our documentation links above in order to refresh your knowledge on configuring SFTPPlus version 3.For those on legacy versions, please consult the documentation relevant to your version.

Example logs from SFTPPlus

The following are snippets when logging in for the first time from a GUI client to an SFTPPlus 3.30.0 SFTP server.

A new connection has been made to the service sftp-1.Knowing the service name is useful in case there are multiple other SFTP services running:

The following are authentication methods associated with the server and confirmation of which methods are not active.There may be more methods, depending on how many of these are set up and enabled.To simplify the login process, please make sure to disable all unused authentication methods.:

The following logs list out a successful authentication of user using the ssh-key:

The following log message confirms the type of permissions allowed for the account and an active transfer that is already running:

The following confirms that the user has logged into and now has access to the folder as the root ('/') folder:

SFTPPlus SFTP Client-side Perspective

Initial configuration notes

If you are currently evaluating SFTPPlus, please follow our client side documentation.

The SFTPPlus Client software utilizes the command-line client-shell to access remote file servers using the interactive shell.

Even if you are an existing customer, you can follow our documentation links above in order to refresh your knowledge on configuring SFTPPlus version 3.For those on legacy versions, please consult the documentation relevant to your version.

Example logs from SFTPPlus

Let's connect with SFTPPlus Client using the SFTP protocol on port 10022.The following log details the UUID of the sftp service and confirms the connections:

On the event that the SFTP connections fails, the log will state a number of details.The event ID is 30073. The event will communicat the host key algorithm that is in use to identify the server-side, the cipher used to receive data, the HMAC for both sent and received data, key exchange algorithm, cipher used for sent data and the name of the location associated for this event.Below is an example of the event that has been emitted has part of this new SFTP connection.:

Providing that the SFTP connection succeeds, supported actions are logged as either a success like below:

Type

Or error details are caught with an explanation message as to why:

SFTPPlus SFTP Exchange - Detailed Verbose OpenSSH Logs

Initial configuration notes

Following from that, you can use the built-in the client-side or server-side software that you are utilizing.SFTPPlus offers logging functionalities both for the client-side and server-side.Network administrators using other software, such as sftp -vvv, for client or server may wish to use additional logging functionalities.

Ssh Packet Type 80

Example with sftp -vvv output

These lines mean that SSH protocol 2.0 is being utilized with the version of OpenSSH:

This line indicates which protocol version is in use service-side and which version:

This indicates which algorithms are preferred.You may opt to only select the strongest availability supported in your system first.In this case the ordering is logical as it moves from the more secure algorithm down to a less secure algorithm.:

These are the key exchange algorithms that are available.:

These are the host key algorithms.: Download axioo driver.

These are the ciphers used from client to server (ctos) and from server to client (stoc):

These are the ciphers used from client to server (ctos) and from server to client (stoc):

These are the compression algorithms used from client to server (ctos) and from server to client (stoc):

This is the key exchange initialized proposal from the host server:

These are the key exchange algorithms used from server to client and client to server:

This is the SSH version 2 key exchange Diffie-Hellman Group Exchange request.This specifies the size of the SSH prime moduli being calculated by the SFTP server as indicated in the SFTPPlus /configuration/ file.When you first initialize SFTPPlus version 3, the Time Type Tests Tries Size Generator Modulus is generated and saved in ssh-service.moduli.This file contains primes ranging in size from 1023 to 8191 bits.An example of the contents for the .moduli file is below:

In the following example below, a SSH moduli prime from 2048 to 8192 bits are used.Specifically, a moduli with a range from 4092 to 8192 are sent for the SSH message key exchange Diffie-Hellman group exchange request as indicated on debug1 line below (SSH2_MSG_KEX_DH_GEX_REQUEST(2048<8192<8192))Once sent, the server uses the moduli file, the same file that was initialized as part of the SFTPPlus installation steps, in order to crack the shared secret.The server provides its host key back to the client along with the algorithm used as indicated by the final line as Server host key: ssh-rsa SHA256:hdSfa7gb2O984malHerkwerj3m20dHb6Yuwl0&hbxFj.

See the rest of the output below:

The client then checks to see if the host key is located within the known_hosts file:

A few more steps occur to verify this server host name and port:

This is the server rekey interval:

The following are SSH keys found:

The following are authentication methods that can continue, the preferred authentication order, remaining preferred:

The server will go through the exchange to authenticate until the final preferred method is reached - the password method.Upon success, the client enters an interactive session with the server.

There will also be additional verbose logs after entering an interactive session, such as a brief snippet below:

Evaluating SFTPPlus MFT

The features listed in this article are just a selected fewout of many integration and configuration options that are available today.Feel free to talk to the Support team about your requirements with file transfer software.

SFTPPlus MFT Server supports FTP, Explicit FTPS, Implicit FTPS, SFTP, SCP, AS2,HTTP, and HTTPS.

Ssh Packet Type 51

SFTPPlus MFT is available as an on-premise solution supported on Windows,Linux, and macOS.

Ssh Packet Type 99

It is also available on the cloud as Docker containers, AWS or Azure instancesand many other cloud providers.

Ssh Packet Type 31

Request a trial using the form below.