Abstract: "Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to ``fool'' LLMs into responding to harmful questions. Early-stage jailbreaking attacks require access to model internals or significant human efforts. Mo...