FlowPipe: LLM-conditioned Generative Flow Networks for automated data preparation pipeline construction
FlowPipe is a new framework that frames ML data preparation pipeline synthesis as conditional probabilistic flow generation over a directed acyclic graph, using Conditional Generative Flow Networks (C-GFlowNets) with a Trajectory Balance objective. LLM-derived semantic priors are injected into the policy via Feature-wise Linear Modulation (FiLM), and a failure-aware flow objective steers search away from invalid states. Evaluated on 74 real-world datasets across two benchmark suites, FlowPipe improves accuracy by 11.96% on average over SOTA baselines and achieves 12.5x faster training convergence. The work addresses long-standing limitations in automated data pipeline construction including weak credit assignment and inefficient exploration.
Related guides (1)
Related events (8)
FMLM+ introduces Posterior Refinement for fast non-autoregressive language generation
Researchers introduce FMLM+, a framework combining Flow Map Language Models with masking-style noise schedules to enable joint sequence generation with per-token global consistency scoring. The key contribution is Posterior Refinement, an inference-time self-correction strategy that matches discrete baseline performance with 32x fewer neural function evaluations (NFEs). The approach improves the speed-quality tradeoff over both Masked Diffusion Models and standard FLMMs across multiple benchmarks, addressing longstanding factorization error problems in non-autoregressive generation.
NF-CoT: Latent reasoning with normalizing flows preserves autoregressive LLM advantages
Researchers propose NF-CoT, a latent reasoning framework that replaces discrete chain-of-thought token streams with continuous intermediate states modeled by normalizing flows embedded inside an LLM backbone. The approach uses a TARFlow-style normalizing flow head alongside the standard language model head, enabling exact likelihoods, KV-cache-compatible left-to-right decoding, and policy-gradient optimization in latent space. On code-generation benchmarks, NF-CoT improves pass rates over both explicit CoT and prior latent-reasoning baselines while reducing intermediate reasoning cost. The work addresses a key limitation of existing latent reasoning methods, which typically sacrifice probabilistic tractability or autoregressive compatibility.
EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering
EdgeFlow augments Vision Language Models with deterministically extracted Canny edge maps as structural priors to improve flowchart-to-Mermaid conversion in industrial requirements engineering, requiring no annotated training data or fine-tuning. Evaluated on IndusReqFlow, a real-world industrial dataset, it achieves +17.39 pp node-level F1 and +16.94 pp edge-level F1 over off-the-shelf VLMs. Cross-dataset evaluation on a synthetic benchmark shows no significant gains, highlighting the gap between synthetic and industrial benchmarks for VLM-based RE tools.
Flowise: open-source visual builder for AI agents gains traction on GitHub
Flowise is an open-source TypeScript project for building AI agents and LLM workflows through a visual drag-and-drop interface. The repository has accumulated over 53,000 GitHub stars with 107 new stars on the day of observation. It represents a no-code/low-code approach to agent and chain construction in the LangChain ecosystem.
FlowEdit: Lifelong pronunciation adaptation for flow-matching TTS via associative memory
FlowEdit is a new framework enabling lifelong pronunciation correction in frozen flow-matching text-to-speech systems without retraining model weights. Corrections are stored as token-level perturbations in text embedding space within a Modern Hopfield Network, retrieved at inference via soft attention with fuzzy morphological matching. On a curated benchmark of 312 multilingual proper nouns across 18 language families, the method reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline, with each correction completing in ~15 seconds on a single GPU.
PipelineRL: ServiceNow's Pipeline-Based Reinforcement Learning Framework for LLMs
ServiceNow introduces PipelineRL, a reinforcement learning training framework for large language models published via the Hugging Face blog. The post describes a pipeline-based approach to RL training, likely addressing throughput and efficiency challenges in RLHF or similar post-training workflows. As a tier-2 source with minimal body content, the technical depth is unclear but the topic is relevant to alignment and training infrastructure.
Large Language Gibbs: MCMC-based structured probabilistic inference using LLM conditionals
Researchers propose Large Language Gibbs, a structured inference scheme that uses an LLM's conditional token distributions as transition operators in a Gibbs sampling (MCMC) loop, iteratively resampling individual variables rather than generating outputs in a single autoregressive pass. The approach targets order-dependent biases in standard generation and aims to produce a stationary distribution reflecting a coherent compromise across all local conditionals. It is evaluated on synthetic distributions, consistent reasoning tasks, and Bayesian structure learning, showing MCMC-based inference is a practical alternative to one-pass generation for structured probabilistic tasks.
Langflow: Visual AI Agent and Workflow Builder Trending on GitHub
Langflow is an open-source Python framework for building and deploying AI-powered agents and workflows, currently accumulating 148,425 total GitHub stars with 155 new stars today. It provides a visual interface for composing LLM-based pipelines and agent workflows. The continued traction signals ongoing community interest in low-code/visual tooling for AI agent construction.
