Proceso Interno de Parseo de Peticiones HTTP en Apache Tomcat

Una vez finalizada la interacción de entrada/salida (E/S) a nivel de red, Apache Tomcat procede a interpretar los datos crudos de la petición. Este flujo se inicia en el método process de la clase AbstractProtocol, el cual es ejecutado por un hilo proveniente del pool de threads del servidor. En esta etapa, se instancia o recupera un objeto Http11Processor encargado de transformar el flujo de bytes en estructuras de datos HTTP comprensibles.

Inicialización del Procesador de Protocolo

El punto de entrada para el procesamiento de la conexión gestiona el ciclo de vida del procesador. Si no existe un procesador activo en el wrapper del socket, se intenta obtener uno de la caché de objetos reciclados o, en su defecto, se crea una nueva instancia.

public SocketState process(SocketWrapperBase<S> socketWrapper, SocketEvent eventStatus) {
    // ... configuración inicial omitida ...
    Processor activeProcessor = (Processor) socketWrapper.takeCurrentProcessor();

    try {
        if (activeProcessor == null) {
            activeProcessor = processorCache.poll();
        }
        if (activeProcessor == null) {
            activeProcessor = getProtocolHandler().buildProcessor();
            registerProcessor(activeProcessor);
        }

        SocketState currentState;
        do {
            currentState = activeProcessor.process(socketWrapper, eventStatus);
        } while (currentState == SocketState.UPGRADING);
        
        return currentState;
    } catch (Exception e) {
        // Manejo de excepciones de procesamiento
        return SocketState.CLOSED;
    }
}

protected Processor buildProcessor() {
    return new Http11Processor(this, webAdapter);
}

Extracción de la Línea de Petición

La línea de solicitud sigue el formato MÉTODO URI VERSIÓN_HTTP. El parser omite inicialmente cualquier línea en blanco, extrae el verbo HTTP, delimita la URI (incluyendo la cadena de consulta si existe) y finalmente identifica la versión del protocolo. Toda esta operación se realiza leyendo directamente del ByteBuffer de NIO mediante un mecanismo de llenado (fill).

boolean extractRequestLine(boolean isKeepAlive, int connTimeout, int keepAliveTimeout) throws IOException {
    if (!isParsingLine) return true;

    // Fase 1: Descartar líneas vacías iniciales
    if (parsePhase < 2) {
        do {
            if (buffer.position() >= buffer.limit()) {
                if (isKeepAlive) socketWrapper.setReadTimeout(keepAliveTimeout);
                if (!readMoreData(false)) {
                    parsePhase = 1;
                    return false;
                }
                socketWrapper.setReadTimeout(connTimeout);
            }
            if (req.getStartTimeNanos() < 0) req.setStartTimeNanos(System.nanoTime());
            currentByte = buffer.get();
        } while (currentByte == Constants.CR || currentByte == Constants.LF);

        buffer.position(buffer.position() - 1);
        lineStartPos = buffer.position();
        parsePhase = 2;
    }

    // Fase 2: Extraer el Método HTTP
    if (parsePhase == 2) {
        boolean foundSpace = false;
        while (!foundSpace) {
            if (buffer.position() >= buffer.limit() && !readMoreData(false)) return false;
            int currentPos = buffer.position();
            currentByte = buffer.get();

            if (currentByte == Constants.SP || currentByte == Constants.HT) {
                foundSpace = true;
                req.method().setBytes(buffer.array(), lineStartPos, currentPos - lineStartPos);
            } else if (!HttpParser.isToken(currentByte)) {
                throw new IllegalArgumentException("Método HTTP inválido detectado");
            }
        }
        parsePhase = 3;
    }

    // Fase 3 y 4: Omitir espacios y extraer la URI
    if (parsePhase == 3) {
        // Lógica para saltar espacios en blanco...
        parsePhase = 4;
    }

    if (parsePhase == 4) {
        boolean foundSpace = false;
        int uriEnd = 0;
        while (!foundSpace) {
            if (buffer.position() >= buffer.limit() && !readMoreData(false)) return false;
            int currentPos = buffer.position();
            currentByte = buffer.get();

            if (currentByte == Constants.SP || currentByte == Constants.HT) {
                foundSpace = true;
                uriEnd = currentPos;
            } else if (currentByte == Constants.QUESTION && queryPos == -1) {
                queryPos = currentPos;
            }
        }

        if (queryPos >= 0) {
            req.queryString().setBytes(buffer.array(), queryPos + 1, uriEnd - queryPos - 1);
            req.requestURI().setBytes(buffer.array(), lineStartPos, queryPos - lineStartPos);
        } else {
            req.requestURI().setBytes(buffer.array(), lineStartPos, uriEnd - lineStartPos);
        }
        parsePhase = 5;
    }

    // Fase 5 y 6: Omitir espacios y extraer la Versión del Protocolo
    if (parsePhase == 5) {
        // Lógica para saltar espacios en blanco...
        parsePhase = 6;
    }

    if (parsePhase == 6) {
        while (!isEndOfLine) {
            if (buffer.position() >= buffer.limit() && !readMoreData(false)) return false;
            int currentPos = buffer.position();
            currentByte = buffer.get();

            if (currentByte == Constants.LF || (prevByte == Constants.CR && currentByte == Constants.LF)) {
                uriEnd = currentPos - 1;
                isEndOfLine = true;
            }
        }
        if (uriEnd - lineStartPos > 0) {
            req.protocol().setBytes(buffer.array(), lineStartPos, uriEnd - lineStartPos);
        }
        parsePhase = 7;
    }

    if (parsePhase == 7) {
        isParsingLine = false;
        parsePhase = 0;
        return true;
    }
    throw new IllegalStateException("Fase de parseo inesperada: " + parsePhase);
}

Procesamiento de las Cabeceras

Las cabeceras HTTP siguen la estructura Nombre: Valor. El algoritmo lee byte a byte hasta encontrar el carácter dos puntos (:), encapsula el nombre en un objeto MimeHeaderField y continúa leyendo el valor hasta encontrar un salto de línea (\r\n). También incluye lógica para manejar cabeceras multilínea.

private HeaderParseStatus processHeaders() throws IOException {
    if (buffer.position() >= buffer.limit() && !readMoreData(false)) {
        return HeaderParseStatus.NEED_MORE_DATA;
    }

    prevByte = currentByte;
    currentByte = buffer.get();

    if (currentByte == Constants.LF || (currentByte == Constants.CR && prevByte != Constants.CR)) {
        if (currentByte == Constants.CR) {
            // Esperar LF
        } else {
            return HeaderParseStatus.DONE;
        }
    } else {
        buffer.position(buffer.position() - (prevByte == Constants.CR ? 2 : 1));
    }

    // Fase 1: Lectura del Nombre de la Cabecera
    if (headerState == HeaderParsePosition.HEADER_START) {
        headerInfo.startIdx = buffer.position();
        headerState = HeaderParsePosition.HEADER_NAME;
    }

    while (headerState == HeaderParsePosition.HEADER_NAME) {
        if (buffer.position() >= buffer.limit() && !readMoreData(false)) return HeaderParseStatus.NEED_MORE_DATA;

        int currentPos = buffer.position();
        currentByte = buffer.get();

        if (currentByte == Constants.COLON) {
            headerState = HeaderParsePosition.HEADER_VALUE_START;
            headerInfo.valueObj = headers.addValue(buffer.array(), headerInfo.startIdx, currentPos - headerInfo.startIdx);
            headerInfo.startIdx = buffer.position();
            break;
        } else if (!HttpParser.isToken(currentByte)) {
            return skipCurrentLine(false);
        }

        if (currentByte >= Constants.A && currentByte <= Constants.Z) {
            buffer.put(currentPos, (byte) (currentByte - Constants.LC_OFFSET));
        }
    }

    // Fase 2: Lectura del Valor de la Cabecera
    while (headerState == HeaderParsePosition.HEADER_VALUE_START ||
           headerState == HeaderParsePosition.HEADER_VALUE ||
           headerState == HeaderParsePosition.HEADER_MULTI_LINE) {

        if (headerState == HeaderParsePosition.HEADER_VALUE_START) {
            while (true) {
                if (buffer.position() >= buffer.limit() && !readMoreData(false)) return HeaderParseStatus.NEED_MORE_DATA;
                currentByte = buffer.get();
                if (currentByte != Constants.SP && currentByte != Constants.HT) {
                    headerState = HeaderParsePosition.HEADER_VALUE;
                    buffer.position(buffer.position() - 1);
                    break;
                }
            }
        }

        if (headerState == HeaderParsePosition.HEADER_VALUE) {
            boolean lineEnded = false;
            while (!lineEnded) {
                if (buffer.position() >= buffer.limit() && !readMoreData(false)) return HeaderParseStatus.NEED_MORE_DATA;
                currentByte = buffer.get();

                if (currentByte == Constants.LF) {
                    lineEnded = true;
                } else if (currentByte != Constants.SP && currentByte != Constants.HT) {
                    buffer.put(headerInfo.writeIdx++, currentByte);
                    headerInfo.lastCharIdx = headerInfo.writeIdx;
                }
            }
            headerInfo.writeIdx = headerInfo.lastCharIdx;
            headerState = HeaderParsePosition.HEADER_MULTI_LINE;
        }

        if (headerState == HeaderParsePosition.HEADER_MULTI_LINE) {
            if (buffer.position() >= buffer.limit() && !readMoreData(false)) return HeaderParseStatus.NEED_MORE_DATA;
            byte nextByte = buffer.get(buffer.position());
            if (nextByte != Constants.SP && nextByte != Constants.HT) {
                headerState = HeaderParsePosition.HEADER_START;
                break;
            } else {
                buffer.put(headerInfo.writeIdx++, nextByte);
                headerState = HeaderParsePosition.HEADER_VALUE_START;
            }
        }
    }

    headerInfo.valueObj.setBytes(buffer.array(), headerInfo.startIdx, headerInfo.lastCharIdx - headerInfo.startIdx);
    headerInfo.reset();
    return HeaderParseStatus.HAVE_MORE_HEADERS;
}

Lectura del Cuerpo de la Petición

Para acceder al payload de la solicitud, se obtiene un BufferedReader a través del objeto request. Internamente, Tomcat envuelve el búfer de entrada nativo (inputBuffer) dentro de un CoyoteReader para facilitar la lectura de texto caracter por caracter o línea por línea.

BufferedReader reqReader = httpRequest.getReader();
String textLine;
while ((textLine = reqReader.readLine()) != null) {
    processPayloadLine(textLine);
}

La implementación de getReader asegura que no se esté utilizando simultáneamente el flujo de bytes crudo (InputStream) y configura la codificación de caracteres adecuada según el contexto de la aplicación.

public BufferedReader getReader() throws IOException {
    if (isUsingStream) {
        throw new IllegalStateException("No se puede obtener el reader tras usar el input stream");
    }

    if (internalRequest.getCharacterEncoding() == null) {
        Context appContext = getContext();
        if (appContext != null && appContext.getRequestCharacterEncoding() != null) {
            setCharacterEncoding(appContext.getRequestCharacterEncoding());
        }
    }

    isUsingReader = true;
    payloadBuffer.validateConverter();

    if (cachedReader == null) {
        cachedReader = new CoyoteReader(payloadBuffer);
    }
    return cachedReader;
}

El método readLine subyacente gestiona la lectura en bloques, buscando los separadores de línea y concatenando fragmentos si el tamaño de la línea excede la capacidad del búfer temporal.

public String readLine() throws IOException {
    if (charBuffer == null) {
        charBuffer = new char[MAX_LINE_SIZE];
    }

    String finalLine = null;
    int currentIndex = 0;
    int terminatorIndex = -1;
    StringBuilder chunkAggregator = null;

    while (terminatorIndex < 0) {
        mark(MAX_LINE_SIZE);
        while (currentIndex < MAX_LINE_SIZE && terminatorIndex < 0) {
            int readCount = read(charBuffer, currentIndex, MAX_LINE_SIZE - currentIndex);
            if (readCount < 0) {
                if (currentIndex == 0 && chunkAggregator == null) {
                    return null;
                }
                terminatorIndex = currentIndex;
                break;
            }
            for (int i = currentIndex; i < currentIndex + readCount && terminatorIndex < 0; i++) {
                if (charBuffer[i] == LINE_SEPARATOR[0] || charBuffer[i] == LINE_SEPARATOR[1]) {
                    terminatorIndex = i;
                }
            }
            currentIndex += readCount;
        }

        if (terminatorIndex < 0) {
            if (chunkAggregator == null) {
                chunkAggregator = new StringBuilder();
            }
            chunkAggregator.append(charBuffer);
            currentIndex = 0;
        } else {
            reset();
            skip(terminatorIndex + 1);
        }
    }

    if (chunkAggregator == null) {
        finalLine = new String(charBuffer, 0, terminatorIndex);
    } else {
        chunkAggregator.append(charBuffer, 0, terminatorIndex);
        finalLine = chunkAggregator.toString();
    }

    return finalLine;
}

Etiquetas: apache-tomcat http-parsing java-nio coyote http11processor

Publicado el 6-21 20:33

Friki Work

Proceso Interno de Parseo de Peticiones HTTP en Apache Tomcat

Inicialización del Procesador de Protocolo

Extracción de la Línea de Petición

Procesamiento de las Cabeceras

Lectura del Cuerpo de la Petición

Etiquetas populares