[USRP-users] RFNoC and multiple block outputs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[USRP-users] RFNoC and multiple block outputs

Martin Braun via USRP-users
Guys, I'm running into some strangeness here and as I've blown a couple of days on debugging I figure I'll cry out for help now. Sorry it's kind of a long one. Help would be much appreciated.

I've written a block which just does complex-to-magnitude and normalization. There are two outputs; one is the magnitude output of an CORDIC which is doing the complex to magnitude conversion. The phase output of that CORDIC goes through a second CORDIC running as a phase modulator, converting that phase into a normalized output, which becomes the second output of the block. There's a nonblocking squelch in there too but that's just simple combinational logic and doesn't affect the streams. Both CORDICs are just Xilinx IP blocks.

The block runs fine on the testbench. The block also runs fine when I test it live:
pasted1
Expected outputs, great. The problem first occurred when testing it by running its outputs back to radio blocks:
In this configuration, the flowgraph runs for ten or twenty seconds, after which the first radio stops. Confusing. Illuminatingly, I was able to reproduce similar behavior when modifying the first flowgraph to throttle both outputs:
pasted3
This runs for ten seconds or so after which "timeout on channel 0" is printed. So, the "magnitude" output of the block stalls, probably when backpressure is applied. I'm having trouble figuring out why this is the case, largely because I can't catch the block failing in simulation.

I'm handling tuser roughly the same as the addsub block does. I'm not using axi_wrapper or cvita_modify_hdr. There's a single input and two outputs, so the frame handling goes like this:

  chdr_deframer chdr_deframer (
    .clk(ce_clk), .reset(ce_rst), .clear(1'b0),
    .i_tdata(str_sink_tdata), .i_tlast(str_sink_tlast), .i_tvalid(str_sink_tvalid), .i_tready(str_sink_tready),
    .o_tdata(m_axis_data_tdata), .o_tuser(m_axis_data_tuser), .o_tlast(m_axis_data_tlast), .o_tvalid(m_axis_data_tvalid), .o_tready(m_axis_data_tready));

  split_stream_fifo #(.WIDTH(128), .ACTIVE_MASK(4'b0011)) tuser_splitter (
    .clk(ce_clk), .reset(ce_rst), .clear(1'b0),
    .i_tdata(m_axis_data_tuser), .i_tlast(1'b0), .i_tvalid(m_axis_data_tvalid & m_axis_data_tlast), .i_tready(),
    .o0_tdata(out_tuser_pre[0]), .o0_tlast(), .o0_tvalid(), .o0_tready(s_axis_data_tlast[0] & s_axis_data_tready[0]),
    .o1_tdata(out_tuser_pre[1]), .o1_tlast(), .o1_tvalid(), .o1_tready(s_axis_data_tlast[1] & s_axis_data_tready[1]),
    .o2_tready(1'b1), .o3_tready(1'b1));

  assign s_axis_data_tuser[0] = { out_tuser_pre[0][127:96], src_sid[0], next_dst_sid[0], out_tuser_pre[0][63:0] };
  assign s_axis_data_tuser[1] = { out_tuser_pre[1][127:96], src_sid[1], next_dst_sid[1], out_tuser_pre[1][63:0] };

  chdr_framer #(.SIZE(10)) chdr_framer_0 (
        .clk(ce_clk), .reset(ce_rst), .clear(clear_tx_seqnum[0]),
        .i_tdata(s_axis_data_tdata[0]), .i_tuser(s_axis_data_tuser[0]), .i_tlast(s_axis_data_tlast[0]), .i_tvalid(s_axis_data_tvalid[0]), .i_tready(s_axis_data_tready[0]),
        .o_tdata(str_src_tdata[0]), .o_tlast(str_src_tlast[0]), .o_tvalid(str_src_tvalid[0]), .o_tready(str_src_tready[0]));

  chdr_framer #(.SIZE(10)) chdr_framer_1 (
        .clk(ce_clk), .reset(ce_rst), .clear(clear_tx_seqnum[1]),
        .i_tdata(s_axis_data_tdata[1]), .i_tuser(s_axis_data_tuser[1]), .i_tlast(s_axis_data_tlast[1]), .i_tvalid(s_axis_data_tvalid[1]), .i_tready(s_axis_data_tready[1]),
        .o_tdata(str_src_tdata[1]), .o_tlast(str_src_tlast[1]), .o_tvalid(str_src_tvalid[1]), .o_tready(str_src_tready[1]));

I think that should do it. The signal processing part of the block looks like this:

in0 -> complex2mag -> split_complex q-> mult -> round&clip -> out0
                                    i-> phasemod -> out1

So I'm wondering if my problem is in the non-matched signal paths. They will have different latencies, but they both produce data at a 1:1 rate. What am I missing here? I'd appreciate someone else's feedback as to where I should start looking.

Thanks!
Nick


_______________________________________________
USRP-users mailing list
[hidden email]
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
Reply | Threaded
Open this post in threaded view
|

Re: [USRP-users] RFNoC and multiple block outputs

Martin Braun via USRP-users
Solution was under my nose, as usual. split_complex has no buffering and can only be used on matched paths -- using a split_stream_fifo to do the same thing and everything is good.

Thanks to all who waded through the above.

--n

On Tue, Oct 4, 2016 at 3:15 PM Nick Foster <[hidden email]> wrote:
Guys, I'm running into some strangeness here and as I've blown a couple of days on debugging I figure I'll cry out for help now. Sorry it's kind of a long one. Help would be much appreciated.

I've written a block which just does complex-to-magnitude and normalization. There are two outputs; one is the magnitude output of an CORDIC which is doing the complex to magnitude conversion. The phase output of that CORDIC goes through a second CORDIC running as a phase modulator, converting that phase into a normalized output, which becomes the second output of the block. There's a nonblocking squelch in there too but that's just simple combinational logic and doesn't affect the streams. Both CORDICs are just Xilinx IP blocks.

The block runs fine on the testbench. The block also runs fine when I test it live:
pasted1
Expected outputs, great. The problem first occurred when testing it by running its outputs back to radio blocks:
In this configuration, the flowgraph runs for ten or twenty seconds, after which the first radio stops. Confusing. Illuminatingly, I was able to reproduce similar behavior when modifying the first flowgraph to throttle both outputs:
pasted3
This runs for ten seconds or so after which "timeout on channel 0" is printed. So, the "magnitude" output of the block stalls, probably when backpressure is applied. I'm having trouble figuring out why this is the case, largely because I can't catch the block failing in simulation.

I'm handling tuser roughly the same as the addsub block does. I'm not using axi_wrapper or cvita_modify_hdr. There's a single input and two outputs, so the frame handling goes like this:

  chdr_deframer chdr_deframer (
    .clk(ce_clk), .reset(ce_rst), .clear(1'b0),
    .i_tdata(str_sink_tdata), .i_tlast(str_sink_tlast), .i_tvalid(str_sink_tvalid), .i_tready(str_sink_tready),
    .o_tdata(m_axis_data_tdata), .o_tuser(m_axis_data_tuser), .o_tlast(m_axis_data_tlast), .o_tvalid(m_axis_data_tvalid), .o_tready(m_axis_data_tready));

  split_stream_fifo #(.WIDTH(128), .ACTIVE_MASK(4'b0011)) tuser_splitter (
    .clk(ce_clk), .reset(ce_rst), .clear(1'b0),
    .i_tdata(m_axis_data_tuser), .i_tlast(1'b0), .i_tvalid(m_axis_data_tvalid & m_axis_data_tlast), .i_tready(),
    .o0_tdata(out_tuser_pre[0]), .o0_tlast(), .o0_tvalid(), .o0_tready(s_axis_data_tlast[0] & s_axis_data_tready[0]),
    .o1_tdata(out_tuser_pre[1]), .o1_tlast(), .o1_tvalid(), .o1_tready(s_axis_data_tlast[1] & s_axis_data_tready[1]),
    .o2_tready(1'b1), .o3_tready(1'b1));

  assign s_axis_data_tuser[0] = { out_tuser_pre[0][127:96], src_sid[0], next_dst_sid[0], out_tuser_pre[0][63:0] };
  assign s_axis_data_tuser[1] = { out_tuser_pre[1][127:96], src_sid[1], next_dst_sid[1], out_tuser_pre[1][63:0] };

  chdr_framer #(.SIZE(10)) chdr_framer_0 (
        .clk(ce_clk), .reset(ce_rst), .clear(clear_tx_seqnum[0]),
        .i_tdata(s_axis_data_tdata[0]), .i_tuser(s_axis_data_tuser[0]), .i_tlast(s_axis_data_tlast[0]), .i_tvalid(s_axis_data_tvalid[0]), .i_tready(s_axis_data_tready[0]),
        .o_tdata(str_src_tdata[0]), .o_tlast(str_src_tlast[0]), .o_tvalid(str_src_tvalid[0]), .o_tready(str_src_tready[0]));

  chdr_framer #(.SIZE(10)) chdr_framer_1 (
        .clk(ce_clk), .reset(ce_rst), .clear(clear_tx_seqnum[1]),
        .i_tdata(s_axis_data_tdata[1]), .i_tuser(s_axis_data_tuser[1]), .i_tlast(s_axis_data_tlast[1]), .i_tvalid(s_axis_data_tvalid[1]), .i_tready(s_axis_data_tready[1]),
        .o_tdata(str_src_tdata[1]), .o_tlast(str_src_tlast[1]), .o_tvalid(str_src_tvalid[1]), .o_tready(str_src_tready[1]));

I think that should do it. The signal processing part of the block looks like this:

in0 -> complex2mag -> split_complex q-> mult -> round&clip -> out0
                                    i-> phasemod -> out1

So I'm wondering if my problem is in the non-matched signal paths. They will have different latencies, but they both produce data at a 1:1 rate. What am I missing here? I'd appreciate someone else's feedback as to where I should start looking.

Thanks!
Nick


_______________________________________________
USRP-users mailing list
[hidden email]
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com