4arXiv cs.LG (Machine Learning)·10h ago

FlowPipe: LLM-conditioned Generative Flow Networks for automated data preparation pipeline construction

FlowPipe is a new framework that frames ML data preparation pipeline synthesis as conditional probabilistic flow generation over a directed acyclic graph, using Conditional Generative Flow Networks (C-GFlowNets) with a Trajectory Balance objective. LLM-derived semantic priors are injected into the policy via Feature-wise Linear Modulation (FiLM), and a failure-aware flow objective steers search away from invalid states. Evaluated on 74 real-world datasets across two benchmark suites, FlowPipe improves accuracy by 11.96% on average over SOTA baselines and achieves 12.5x faster training convergence. The work addresses long-standing limitations in automated data pipeline construction including weak credit assignment and inefficient exploration.

Agent and Tool Ecosystem Feature-wise Linear Modulation Trajectory Balance Conditional Generative Flow Networks FlowPipe

Related guides (1)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·16h ago·source ↗

FMLM+ introduces Posterior Refinement for fast non-autoregressive language generation

Researchers introduce FMLM+, a framework combining Flow Map Language Models with masking-style noise schedules to enable joint sequence generation with per-token global consistency scoring. The key contribution is Posterior Refinement, an inference-time self-correction strategy that matches discrete baseline performance with 32x fewer neural function evaluations (NFEs). The approach improves the speed-quality tradeoff over both Masked Diffusion Models and standard FLMMs across multiple benchmarks, addressing longstanding factorization error problems in non-autoregressive generation.

Frontier Model Releases Inference Economics Posterior Refinement Flow Map Language Models FMLM++2 more

6arXiv · cs.CL·19d ago·source ↗

NF-CoT: Latent reasoning with normalizing flows preserves autoregressive LLM advantages

Researchers propose NF-CoT, a latent reasoning framework that replaces discrete chain-of-thought token streams with continuous intermediate states modeled by normalizing flows embedded inside an LLM backbone. The approach uses a TARFlow-style normalizing flow head alongside the standard language model head, enabling exact likelihoods, KV-cache-compatible left-to-right decoding, and policy-gradient optimization in latent space. On code-generation benchmarks, NF-CoT improves pass rates over both explicit CoT and prior latent-reasoning baselines while reducing intermediate reasoning cost. The work addresses a key limitation of existing latent reasoning methods, which typically sacrifice probabilistic tractability or autoregressive compatibility.

Inference Economics Alignment and RLHF TARFlow NF-CoT Latent Reasoning with Normalizing Flows

4arXiv · cs.AI·28d ago·source ↗

EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering

EdgeFlow augments Vision Language Models with deterministically extracted Canny edge maps as structural priors to improve flowchart-to-Mermaid conversion in industrial requirements engineering, requiring no annotated training data or fine-tuning. Evaluated on IndusReqFlow, a real-world industrial dataset, it achieves +17.39 pp node-level F1 and +16.94 pp edge-level F1 over off-the-shelf VLMs. Cross-dataset evaluation on a synthetic benchmark shows no significant gains, highlighting the gap between synthetic and industrial benchmarks for VLM-based RE tools.

Evaluation and Benchmarking Enterprise Deployment Patterns Mermaid Canny edge detection Vision-Language Models +3 more

4Github Trending·3d ago·source ↗

Flowise: open-source visual builder for AI agents gains traction on GitHub

Flowise is an open-source TypeScript project for building AI agents and LLM workflows through a visual drag-and-drop interface. The repository has accumulated over 53,000 GitHub stars with 107 new stars on the day of observation. It represents a no-code/low-code approach to agent and chain construction in the LangChain ecosystem.

Agent and Tool Ecosystem FlowiseAI Flowise

5arXiv · cs.AI·5d ago·source ↗

FlowEdit: Lifelong pronunciation adaptation for flow-matching TTS via associative memory

FlowEdit is a new framework enabling lifelong pronunciation correction in frozen flow-matching text-to-speech systems without retraining model weights. Corrections are stored as token-level perturbations in text embedding space within a Modern Hopfield Network, retrieved at inference via soft attention with fuzzy morphological matching. On a curated benchmark of 312 multilingual proper nouns across 18 language families, the method reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline, with each correction completing in ~15 seconds on a single GPU.

Inference Economics Enterprise Deployment Patterns Modern Hopfield Network FlowEdit FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

4Hugging Face Blog·1mo ago·source ↗

PipelineRL: ServiceNow's Pipeline-Based Reinforcement Learning Framework for LLMs

ServiceNow introduces PipelineRL, a reinforcement learning training framework for large language models published via the Hugging Face blog. The post describes a pipeline-based approach to RL training, likely addressing throughput and efficiency challenges in RLHF or similar post-training workflows. As a tier-2 source with minimal body content, the technical depth is unclear but the topic is relevant to alignment and training infrastructure.

Training Infrastructure Agent and Tool Ecosystem ServiceNow AI PipelineRL Hugging Face +1 more

5arXiv · cs.CL·6d ago·source ↗

Large Language Gibbs: MCMC-based structured probabilistic inference using LLM conditionals

Researchers propose Large Language Gibbs, a structured inference scheme that uses an LLM's conditional token distributions as transition operators in a Gibbs sampling (MCMC) loop, iteratively resampling individual variables rather than generating outputs in a single autoregressive pass. The approach targets order-dependent biases in standard generation and aims to produce a stationary distribution reflecting a coherent compromise across all local conditionals. It is evaluated on synthetic distributions, consistent reasoning tasks, and Bayesian structure learning, showing MCMC-based inference is a practical alternative to one-pass generation for structured probabilistic tasks.

Evaluation and Benchmarking Inference Economics Gibbs Sampling Large Language Gibbs

3Github Trending·1mo ago·source ↗

Langflow: Visual AI Agent and Workflow Builder Trending on GitHub

Langflow is an open-source Python framework for building and deploying AI-powered agents and workflows, currently accumulating 148,425 total GitHub stars with 155 new stars today. It provides a visual interface for composing LLM-based pipelines and agent workflows. The continued traction signals ongoing community interest in low-code/visual tooling for AI agent construction.

Agent and Tool Ecosystem Langflow