Recent changes to the PJSIP registrar resulted in tests failing due to
missing AOR_CONTACT_ADDED test events. The reason for this was that the
user_agent string had junk values in it, resulting in being unable to
generate the event.
I'm going to be honest here, I have no idea why this was happening. Here
are the steps needed for the user_agent variable to get messed up:
* REGISTER is received
* First contact in the REGISTER results in a contact being removed
* Second contact in the REGISTER results in a contact being added
* The contact, AOR, expiration, and user agent all have to be passed as
format parameters to the creation of a string. Any subset of those
parameters would not be enough to cause the problem.
Looking into what was happening, the thing that struck me as odd was
that the user_agent variable was meant to be set to the value of the
User-Agent SIP header in the incoming REGISTER. However, when removing a
contact, the user_agent variable would be set (via ast_strdupa inside a
loop) to the stored contact's user_agent. This means that the
user_agent's value would be incorrect when attempting to process further
contacts in the incoming REGISTER.
The fix here is to use a different variable for the stored user agent
when removing a contact. Correcting the behavior to be correct also
means the memory usage is less weird, and the issue no longer occurs.
ASTERISK-25929 #close
Reported by Joshua Colp
Change-Id: I7cd24c86a38dec69ebcc94150614bc25f46b8c08
At shutdown it is possible for modules to be unloaded that wouldn't
normally be unloaded. This allows the environment to be cleaned up.
The res_pjsip_transport_management module did not have the unload
logic in it to clean itself up causing the res_pjsip module to not
get unloaded. As a result the res_pjsip monitor thread kept going
processing traffic and timers when it shouldn't.
Change-Id: Ic8cadee131e3b2c436a81d3ae8bb5775999ae00a
We have to setup the channel roles after the bridge class push is called
because the bridge class push callback may have set roles on the incoming
channel. Since we have already partially pushed the channel into the
bridge and reversing what we have already done could be problematic, the
only thing we can do is press on to complete pushing the channel into the
bridge.
* Ignore any channel role setup errors after pushing the channel into a
bridge. The channel may behave incorrectly in the bridge but we can no
longer abort the push at this time.
Change-Id: I08a97082b729052ee65cdca6bb730cf1289ede00
The scheduler thread that kills idle TCP connections was not registering
with PJProject properly and causing assertions if PJProject was built in
debug mode.
This change registers the thread with PJProject the first time that the
scheduler callback executes.
AST-2016-005
Change-Id: I5f7a37e2c80726a99afe9dc2a4a69bdedf661283
There are several places that do scheduled tasks or periodic housecleaning,
each with its own implementation:
* res_pjsip_keepalive has a thread that sends keepalives.
* pjsip_distributor has a thread that cleans up expired unidentified requests.
* res_pjsip_registrar_expire has a thread that cleans up expired contacts.
* res_pjsip_pubsub uses ast_sched directly and then calls ast_sip_push_task.
* res_pjsip_sdp_rtp also uses ast_sched to send keepalives.
There are also places where we should be doing scheduled work but aren't.
A good example are the places we have sorcery observers to start registration
or qualify. These don't work when changes are made to a backend database
without a pjsip reload. We need to check periodically.
As a first step to solving these issues, a new ast_sip_sched facility has
been created.
ast_sip_sched wraps ast_sched but only uses ast_sched as a scheduled queue.
When a task is ready to run, ast_sip_task_pusk is called for it. This ensures
that the task is executed in a PJLIB registered thread and doesn't hold up the
ast_sched thread so it can immediately continue processing the queue. The
serializer used by ast_sip_sched is one of your choosing or a random one from
the res_pjsip pool if you don't choose one.
Another feature is the ability to automatically clean up the task_data when the
task expires (if ever). If it's an ao2 object, it will be dereferenced, if
it's a malloc'd object it will be freed. This is selectable when the task is
scheduled. Even if you choose to not auto dereference an ao2 task data object,
the scheduler itself maintains a reference to it while the task is under it's
control. This prevents the data from disappearing out from under the task.
There are two scheduling models.
AST_SIP_SCHED_TASK_PERIODIC specifies that the invocations of the task occur at
the specific interval. That is, every "interval" milliseconds, regardless of
how long the task takes. If the task takes longer than the interval, it will
be scheduled at the next available multiple of interval. For exmaple: If the
task has an interval of 60 secs and the task takes 70 secs (it better not),
the next invocation will happen at 120 seconds.
AST_SIP_SCHED_TASK_DELAY specifies that the next invocation of the task should
start "interval" milliseconds after the current invocation has finished.
Also, the same ast_sched facility for fixed or variable intervals exists. The
task's return code in conjunction with the AST_SIP_SCHED_TASK_FIXED or
AST_SIP_SCHED_TASK_VARIABLE flags controls the next invocation start time.
One res_pjsip.h housekeeping change was made. The pjsip header files were
added to the top. There have been a few cases lately where I've needed
res_pjsip.h just for ast_sip calls and had compiles fail spectacularly because
I didn't add the pjsip header files to my source even though I never referenced
any pjsip calls.
Finally, a few new convenience APIs were added to astobj2 to make things a
little easier in the scheduler. ao2_ref_and_lock() calls ao2_ref() and
ao2_lock() in one go. ao2_unlock_and_unref() does the reverse. A few macros
were also copied from res_phoneprov because I got tired of having to duplicate
the same hash, sort and compare functions over and over again. The
AO2_STRING_FIELD_(HASH|SORT|CMP)_FN macros will insert functions suitable for
aor_container_alloc into your source.
This facility can be used immediately for the situations where we already have
a thread that wakes up periodically or do some scheduled work. For the
registration and qualify issues, additional sorcery and schema changes would
need to be made so that we can easily detect changed objects on a periodic
basis without having to pull the entire database back to check. I'm thinking
of a last-updated timestamp on the rows but more on this later.
Change-Id: I7af6ad2b2d896ea68e478aa1ae201d6dd016ba1c
"Idle" here means that someone connects to us and does not send a SIP
request. PJProject will not automatically time out such connections, so
it's up to Asterisk to do it instead.
When we receive an incoming TCP connection, we will start a timer
(equivalent to transaction timer D) waiting to receive an incoming
request. If we do not receive a request in that timeframe, then we will
shut down the TCP connection.
ASTERISK-25796 #close
Reported by George Joseph
AST-2016-005
Change-Id: I7b0d303e5d140d0ccaf2f7af562071e3d1130ac6
Due to some ignored return values, Asterisk could crash if processing an
incoming REGISTER whose contact URI was above a certain length.
ASTERISK-25707 #close
Reported by George Joseph
Patches:
0001-res_pjsip-Validate-that-URIs-don-t-exceed-pjproject-.patch
AST-2016-004
Change-Id: I0ed3898fe7ab10121b76c8c79046692de3a1be55
Fix off nominal crash where we could not setup the channel to process
frames for the softmix bridge technology because of allocation failure.
Change-Id: Ic307a8386e46bf551e48fcd1eb97276714d56372
From the patch submitted to Teluu on 4/12/2016
<<<<<<<<<
The wholesale stripping of '[]' from header parameters causes issues if
something (like a port) occurs after the final ']'.
'[2001🅰️:b]' will correctly parse to '2001🅰️:b'
'[2001🅰️:b]:8080' will correctly parse to '2001🅰️:b' but the scanner is left
with ':8080' and parsing stops with a syntax error.
I can't even find a case where stripping the '[]' is a good thing anyway. Even
if you continued to parse and resulted in a string that looks like this...
'2001🅰️🅱️8080', it's not valid.
This came up in Asterisk because Kamailio sends us a Contact with an alias
URI parameter that has an IPv6 address in it like this:
Contact: <sip:1171@127.0.0.1:5080;alias=[2001:1:2::3]~43691~6>
which should be legal but causes a syntax error because of the characters
after the final ']'. Even if it didn't, the '[]' should still not be stripped.
I've run the Asterisk Test Suite for PJSIP (252 tests) many of which are IPv6
enabled. No issues were caused by removing the code that strips the '[]'.
>>>>>>>>>>>
ASTERISK-25123 #close
Reported-by: Anthony Messina
Change-Id: I5cb33f4ebf07ee1f2b26d07caae715e2ec65595a
The test_voicemail_notify_endl test checks the end-of-line
characters of an email message to confirm that they are consistent.
The test wrongfully assumed that reading from the email message
into a buffer will always result in more than 1 character being
read. This is incorrect. If only 1 character was read the test
would go outside of the buffer and access other memory causing
a crash.
The test now checks to ensure that 2 or more characters are read
in ensuring the test stays within the buffer.
ASTERISK-25874 #close
Change-Id: Ic2c89cea6e90f2c0bc2d8138306ebbffd4f8b710
If try to move message to Cust1 (number 5)
the function 'save_to_folder' tries to create Greeting folder instead of Cust1.
This patch fixed it by setting GREETINGS_FOLDER = -1
ASTERISK-24927 #close
Change-Id: I03d1a761894bcc2d130ec9b003bbcddc28e25c51
* Added Useragent and RegExpire headers to AMI Event
ContactStatusDetail with associated documentation.
ASTERISK-25903 #close
Change-Id: If3d121e943e588d016ba51d4eb9c6a421a562239
Failed registration using PJSIP/Realtime if one of the codec name
in allow/disallow option is wrong or contains space.
This patch strip codec name.
ASTERISK-25914
Change-Id: Ifdf02de94e5ddbce305640f6f0666084a3b9283d
Contact expiration can occur in several places: res_pjsip_registrar,
res_pjsip_registrar_expire, and automatically when anyone calls
ast_sip_location_retrieve_aor_contact. At the same time, res_pjsip_registrar
may also be attempting to renew or add a contact. Since none of this was locked
it was possible for one thread to be renewing a contact and another thread to
expire it immediately because it was working off of stale data. This was the
casue of intermittent registration/inbound/nominal/multiple_contacts test
failures.
Now, the new named lock functionality is used to lock the aor during contact
expire and add operations and res_pjsip_registrar_expire now checks the
expiration with the lock held before deleting the contact.
ASTERISK-25885 #close
Reported-by: Josh Colp
Change-Id: I83d413c46a47796f3ab052ca3b349f21cca47059
There's a bug in pjproject's sip_parser where the ":" wasn't correctly
interpreted. This is causing IPv6 addresses in the "received" parameter of the
Via header to cause a syntax check failure.
This patch was submitted to Teluu on 4/10/2016.
ASTERISK-25910 #close
Reported-by: Anthony Messina
Change-Id: Ic7e4c4aa14ded61860401ec349f5177568c4d922
Locking some objects like sorcery objects can be tricky because the underlying
ao2 object may not be the same for all callers. For instance, two threads that
call ast_sorcery_retrieve_by_id on the same aor name might actually get 2
different ao2 objects if the underlying wizard had to rehydrate the aor from a
database. Locking one ao2 object doesn't have any effect on the other even if
those objects had locks in the first place.
Named locks allow access control by keyspace and key strings. Now an "aor"
named "1000" can be locked and any other thread attempting to lock "aor" "1000"
will wait regardless of whether the underlying ao2 object is the same or not.
Mutex and rwlocks are supported.
This capability will initially be used to lock an aor when multiple threads may
be attempting to prune expired contacts from it.
Change-Id: If258c0b7f92b02d07243ce70e535821a1ea7fb45
The first available transport of the appropriate type is used now.
This patch adds new config option 'transport' for outbound-publish.
If transport is set then outbound PUBLISH requests will use this transport.
ASTERISK-25901 #close
Change-Id: Ib389130489b70e36795b0003fa5fd386e2680151
BLF pickup isn't working on Cisco SPA and Snom phones
if the direction="recipient" attribute is missing in 'dialog' tag.
This patch adds direction="recipient" if extension state is
Ringing.
ASTERISK-24601 #close
Change-Id: I5b2c097ca29fd59e92ba237ca5d397cb1b0bcd8c
Sometimes uw-imap function 'mail_fetchbody' returns huge len
which then pass to uw-imap function 'rfc822_base64'.
uw-imap tries to allocate huge memory and abort() on fail.
This patch check the len.
If the len more than max size (128 Mbytes) log error.
This patch also set variables len, newlen to avoid uninizialezed len.
This patch also check pointer returned by rfc822_base64.
ASTERISK-25899 #close
Change-Id: I4a0e7d655f11abef6a5224e2169df6d5c1f1caca
Because SQLite doesn't support full ALTER capabilities, alembic scripts
require batch operations. However, that capability wasn't available until
0.7.0 which some distributions haven't reached yet. Therefore, the batch
operations introduced in commit 86d6e44cc (review 2319) have been reverted
and SQLite is unsupported again, for now anyway.
Tested the full upgrade and downgrade on MySQL/Mariadb and Postgresql.
ASTERISK-25890 #close
Reported-by: Harley Peters
Change-Id: I82eba5456736320256f6775f5b0b40133f4d1c80
When shutting down, the PJSIP sorcery is destroyed. The registrar
expiration module queries the PJSIP sorcery to determine what
to expire. As there was no synchronization between termination
of the expiration thread and the unloading of the module it was
possible for the thread to try to access the PJSIP sorcery after
it had been destroyed.
This change ensures that the thread is shut down before allowing
the module to be considered unloaded.
Change-Id: I69fd239edbaaf160c2d37ae00d3ac06e5596fe8b
The problem is ast_frdup() does not copy whole frame.subclass for voice,
video and image frames, only the format is copied. For video frames, the
subclass structure contains the .frame_ending flag used to put the RTP
marker where it needs to be.
ASTERISK-25894 #close
Change-Id: I812ca90e84ed5d4f473b997d0dd0d3c5a915fe33
Some SIP devices indicate hold/unhold using deferred SDP reinvites. In
other words, they provide no SDP in the reinvite.
A typical transaction that starts hold might look something like this:
* Device sends reinvite with no SDP
* Asterisk sends 200 OK with SDP indicating sendrecv on streams.
* Device sends ACK with SDP indicating sendonly on streams.
At this point, PJMedia's SDP negotiator saves Asterisk's local state as
being recvonly.
Now, when the device attempts to unhold, it again uses a deferred SDP
reinvite, so we end up doing the following:
* Device sends reinvite with no SDP
* Asterisk sends 200 OK with SDP indicating recvonly on streams
* Device sends ACK with SDP indicating sendonly on streams
The problem here is that Asterisk offered recvonly, and by RFC 3264's
rules, if an offer is recvonly, the answer has to be sendonly. The
result is that the device is not taken off hold.
What is supposed to happen is that Asterisk should indicate sendrecv in
the 200 OK that it sends. This way, the device has the freedom to
indicate sendrecv if it wants the stream taken off hold, or it can
continue to respond with sendonly if the purpose of the reinvite was
something else (like a session timer refresher).
The fix here is to alter the SDP negotiator's state when we receive a
reinvite with no SDP. If the negotiator's state is currently in the
recvonly or inactive state, then we alter our local state to be
sendrecv. This way, we allow the device to indicate the stream state as
desired.
ASTERISK-25854 #close
Reported by Robert McGilvray
Change-Id: I7615737276165eef3a593038413d936247dcc6ed